[CWB] cwb-scan-corpus
Simon Meier-Vieracker
simon.meier-vieracker at tu-dresden.de
Fri Nov 1 12:51:09 CET 2019
Hi Stefan,
I get the following error:
Error: non-integer offset in key '?pos+0!=/\$\./'.
I think I have version v3.4.13, at least the README file tells me something about v3.4.13. How can I check which version I really do have installed?
Best, Simon
> Am 01.11.2019 um 11:01 schrieb Stefan Evert <stefanML at collocations.de>:
>
> Hi Simon,
>
> first: you should probably enclose the POS constraints in single quotes, so your shell doesn't get confused by them. I also find the -f option more convenient than redirecting the output of cwb-scan-corpus (and I often save to a .gz file because I'm chronically short on disk space).
>
>> On 1 Nov 2019, at 09:14, Simon Meier-Vieracker <simon.meier-vieracker at tu-dresden.de> wrote:
>>
>> I succeed in filtering out >all< pos-tags starting with '$' like this:
>>
>> cwb-scan-corpus CORPUS lemma+0 lemma+1 lemma+2 ?pos+0=/[^\$].+/ ?pos+1=/[^\$].+/ ?pos+2=/[^\$].+/ > trigrams.txt
>>
>> But still this is not exactly what I want, because I only want to filter out '$.'
>
> Make sure you have a sufficiently recent version of CWB installed (v3.4.11 or newer should suffice) and use negated constraints:
>
> cwb-scan-corpus -f trigrams.txt CORPUS lemma+0 lemma+1 lemma+2 '?pos+0!=/\$\./' '?pos+1!=/\$\./' '?pos+2!=/\$\./'
>
> Best,
> Stefan
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
----------
Dr. Simon Meier-Vieracker
Technische Universität Dresden
Institut für Germanistik
Vertretung der Professur für Angewandte Linguistik
01062 Dresden
simon.meier-vieracker at tu-dresden.de
More information about the CWB
mailing list