[CWB] cwb-scan-corpus
Simon Meier-Vieracker
simon.meier-vieracker at tu-dresden.de
Fri Nov 1 09:14:47 CET 2019
Ah, sorry, I did it all wrong, with the command described in my previous email I queried for trigramms matching the condition of three punctuation marks…
But I want it the other way round.
I succeed in filtering out >all< pos-tags starting with '$' like this:
cwb-scan-corpus CORPUS lemma+0 lemma+1 lemma+2 ?pos+0=/[^\$].+/ ?pos+1=/[^\$].+/ ?pos+2=/[^\$].+/ > trigrams.txt
But still this is not exactly what I want, because I only want to filter out '$.'
Best, Simon
Am 01.11.2019 um 08:57 schrieb Meier-Vieracker, Simon <simon.meier-vieracker at tu-dresden.de<mailto:simon.meier-vieracker at tu-dresden.de>>:
Hi,
I am trying to access frequency informations (trigrams) with cwb-scancorpus.
It works fine with this command:
cwb-scan-corpus CORPUS lemma+0 lemma+1 lemma+2 > trigrams.txt
However, I would like to filter out sentence-ending punctuations as tagged with '$.'
I tried something like
cwb-scan-corpus CORPUS lemma+0 lemma+1 lemma+2 ?pos+0=/\$\./ ?pos+1=/\$\./ ?pos+2=/\$\./ > trigrams.txt
but then I get no results. I do have to escape special characters like '$', I guess? What am I doing wrong?
Thanks in advance!
Simon
----------
Dr. Simon Meier-Vieracker
Technische Universität Dresden
Institut für Germanistik
Vertretung der Professur für Angewandte Linguistik
01062 Dresden
simon.meier-vieracker at tu-dresden.de<mailto:simon.meier-vieracker at tu-dresden.de>
-------------- n�chster Teil --------------
Ein Dateianhang mit HTML-Daten wurde abgetrennt...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20191101/e7760932/attachment.html>
More information about the CWB
mailing list