[CWB] cwb-scan-corpus

Simon Meier-Vieracker simon.meier-vieracker at tu-dresden.de
Fri Nov 1 12:51:09 CET 2019


Hi Stefan,

I get the following error:

Error: non-integer offset in key '?pos+0!=/\$\./'.

I think I have version v3.4.13, at least the README file tells me something about v3.4.13. How can I check which version I really do have installed?

Best, Simon


> Am 01.11.2019 um 11:01 schrieb Stefan Evert <stefanML at collocations.de>:
> 
> Hi Simon,
> 
> first: you should probably enclose the POS constraints in single quotes, so your shell doesn't get confused by them.  I also find the -f option more convenient than redirecting the output of cwb-scan-corpus (and I often save to a .gz file because I'm chronically short on disk space).
> 
>> On 1 Nov 2019, at 09:14, Simon Meier-Vieracker <simon.meier-vieracker at tu-dresden.de> wrote:
>> 
>> I succeed in filtering out >all< pos-tags starting with '$' like this:
>> 
>> cwb-scan-corpus CORPUS lemma+0 lemma+1 lemma+2 ?pos+0=/[^\$].+/ ?pos+1=/[^\$].+/ ?pos+2=/[^\$].+/ > trigrams.txt
>> 
>> But still this is not exactly what I want, because I only want to filter out '$.'
> 
> Make sure you have a sufficiently recent version of CWB installed (v3.4.11 or newer should suffice) and use negated constraints:
> 
> 	cwb-scan-corpus -f trigrams.txt CORPUS lemma+0 lemma+1 lemma+2 '?pos+0!=/\$\./' '?pos+1!=/\$\./' '?pos+2!=/\$\./'
> 
> Best,
> Stefan
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb

----------

Dr. Simon Meier-Vieracker

Technische Universität Dresden
Institut für Germanistik
Vertretung der Professur für Angewandte Linguistik
01062 Dresden

simon.meier-vieracker at tu-dresden.de



More information about the CWB mailing list