[CWB] xml regions in cwb-lexdecode
Simon Meier-Vieracker
simon.meier-vieracker at tu-dresden.de
Sat May 23 12:59:31 CEST 2020
Hi Stefan,
thanks, but you didn’t get me right. I don’t want to get the frequency of the lemma "SPD", but a frequency table of all lemmas in all texts (in my case: speeches") where the xml attribute "fraction" has the value "SPD".
To make it clear, the xml structure of my corpus (Plenarprotokolle Bundestag) is as follows:
> <corpus>
> <session nr="43">
> <speech speaker="Timon Gremmels" fraction="SPD">
> <p>
> …
> </p>
> </speech>
> <speech speaker="Renate Künast" fraction="Gruene">
> <p>
> …
> </p>
> </speech>
> </session>
> </corpus>
Best,
Simon
> Am 23.05.2020 um 12:15 schrieb Stefan Evert <stefanML at collocations.de>:
>
> Hi Simon,
>
> not with cwb-lexdecode, because that accesses the built-in frequency list for the entire corpus.
>
> What you want is
>
> cwb-scan-corpus -o freqlist.txt CORPUS lemma+0 '?lemma+0=/SPD/'
>
> Best,
> Stefan
>
>> On 23 May 2020, at 10:11, Simon Meier-Vieracker <simon.meier-vieracker at tu-dresden.de> wrote:
>>
>> am I right that it is NOT possible to restrict cwb-lexdecode to certain xml regions as defined by xml attributes?
>>
>> My corpus contains xml tags like
>>
>>> <speech speaker="Timon Gremmels" fraction="SPD">
>>
>> And I would like to generate a frequency list of all parts of the corpus tagged with speech_fraction="SPD"
>>
>> It is possible to do this in CWB with this query:
>>
>>> [] :: match.speech_fraction="SPD";
>>> count by lemma;
>>
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
More information about the CWB
mailing list