[CWB] xml regions in cwb-lexdecode

Sat May 23 12:59:31 CEST 2020

Hi Stefan,

thanks, but you didn’t get me right. I don’t want to get the frequency of the lemma "SPD", but a frequency table of all lemmas in all texts (in my case: speeches") where the xml attribute "fraction" has the value "SPD".

To make it clear, the xml structure of my corpus (Plenarprotokolle Bundestag) is as follows:

> <corpus>
> <session nr="43">
> <speech speaker="Timon Gremmels" fraction="SPD">
> <p>
> …
> </p>
> </speech>
> <speech speaker="Renate Künast" fraction="Gruene">
> <p>
> …
> </p>
> </speech>
> </session>
> </corpus>

Best,
Simon

> Am 23.05.2020 um 12:15 schrieb Stefan Evert <stefanML at collocations.de>:
> 
> Hi Simon,
> 
> not with cwb-lexdecode, because that accesses the built-in frequency list for the entire corpus.
> 
> What you want is
> 
> 	cwb-scan-corpus -o freqlist.txt CORPUS lemma+0 '?lemma+0=/SPD/'
> 
> Best,
> Stefan
> 
>> On 23 May 2020, at 10:11, Simon Meier-Vieracker <simon.meier-vieracker at tu-dresden.de> wrote:
>> 
>> am I right that it is NOT possible to restrict cwb-lexdecode to certain xml regions as defined by xml attributes?
>> 
>> My corpus contains xml tags like
>> 
>>> <speech speaker="Timon Gremmels" fraction="SPD">
>> 
>> And I would like to generate a frequency list of all parts of the corpus tagged with speech_fraction="SPD"
>> 
>> It is possible to do this in CWB with this query:
>> 
>>> [] :: match.speech_fraction="SPD";
>>> count by lemma;
>> 
> 
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb