[CWB] cwb-scan-corpus
Stefan Evert
stefanML at collocations.de
Fri Nov 1 11:01:45 CET 2019
Hi Simon,
first: you should probably enclose the POS constraints in single quotes, so your shell doesn't get confused by them. I also find the -f option more convenient than redirecting the output of cwb-scan-corpus (and I often save to a .gz file because I'm chronically short on disk space).
> On 1 Nov 2019, at 09:14, Simon Meier-Vieracker <simon.meier-vieracker at tu-dresden.de> wrote:
>
> I succeed in filtering out >all< pos-tags starting with '$' like this:
>
> cwb-scan-corpus CORPUS lemma+0 lemma+1 lemma+2 ?pos+0=/[^\$].+/ ?pos+1=/[^\$].+/ ?pos+2=/[^\$].+/ > trigrams.txt
>
> But still this is not exactly what I want, because I only want to filter out '$.'
Make sure you have a sufficiently recent version of CWB installed (v3.4.11 or newer should suffice) and use negated constraints:
cwb-scan-corpus -f trigrams.txt CORPUS lemma+0 lemma+1 lemma+2 '?pos+0!=/\$\./' '?pos+1!=/\$\./' '?pos+2!=/\$\./'
Best,
Stefan
More information about the CWB
mailing list