[CWB] Two questions
Stefan Evert
stefanML at collocations.de
Tue Dec 10 15:43:11 CET 2019
> On 10 Dec 2019, at 04:20, Hardie, Andrew <a.hardie at lancaster.ac.uk> wrote:
>
> If not, it's trickier, but still possible via workarounds. (And, in future, by actual features! as ever I have more ideas than time to implement)
PS: If you indexed the corpus yourself, you can also use the CWB command-line tools to get the desired frequency lists. E.g. to get a frequency list of lemma/pos combinations:
cwb-scan-corpus -o lemma_pos.txt CORPUS lemma+0 pos+0
Then load lemma_pos.txt into your tool of choice and sort either by POS tag or by lemma, then by frequency. If you just want a lemma frequency list for a particular POS, you can do e.g.
cwb-scan-corpus -o lemma_ADJ.txt CORPUS lemma+0 '?pos+0=/JJ.*/'
(assuming the Penn tagset).
Best,
Stefan
More information about the CWB
mailing list