[CWB] cwb-scan-corpus

Simon Meier-Vieracker simon.meier-vieracker at tu-dresden.de
Fri Nov 1 09:14:47 CET 2019


Ah, sorry, I did it all wrong, with the command described in my previous email I queried for trigramms matching the condition of three punctuation marks…
But I want it the other way round.

I succeed in filtering out >all< pos-tags starting with '$' like this:

cwb-scan-corpus CORPUS lemma+0 lemma+1 lemma+2 ?pos+0=/[^\$].+/ ?pos+1=/[^\$].+/ ?pos+2=/[^\$].+/ > trigrams.txt

But still this is not exactly what I want, because I only want to filter out '$.'

Best, Simon

Am 01.11.2019 um 08:57 schrieb Meier-Vieracker, Simon <simon.meier-vieracker at tu-dresden.de<mailto:simon.meier-vieracker at tu-dresden.de>>:

Hi,

I am trying to access frequency informations (trigrams) with cwb-scancorpus.

It works fine with this command:

cwb-scan-corpus CORPUS lemma+0 lemma+1 lemma+2 > trigrams.txt

However, I would like to filter out sentence-ending punctuations as tagged with '$.'
I tried something like

cwb-scan-corpus CORPUS lemma+0 lemma+1 lemma+2 ?pos+0=/\$\./ ?pos+1=/\$\./ ?pos+2=/\$\./ > trigrams.txt

but then I get no results. I do have to escape special characters like '$', I guess? What am I doing wrong?

Thanks in advance!
Simon



----------

Dr. Simon Meier-Vieracker

Technische Universität Dresden
Institut für Germanistik
Vertretung der Professur für Angewandte Linguistik
01062 Dresden

simon.meier-vieracker at tu-dresden.de<mailto:simon.meier-vieracker at tu-dresden.de>

-------------- n�chster Teil --------------
Ein Dateianhang mit HTML-Daten wurde abgetrennt...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20191101/e7760932/attachment.html>


More information about the CWB mailing list