[CWB] Two questions

Stefan Evert stefanML at collocations.de
Tue Dec 10 15:43:11 CET 2019



> On 10 Dec 2019, at 04:20, Hardie, Andrew <a.hardie at lancaster.ac.uk> wrote:
> 
> If not, it's trickier, but still possible via workarounds. (And, in future, by actual features! as ever I have more ideas than time to implement) 

PS: If you indexed the corpus yourself, you can also use the CWB command-line tools to get the desired frequency lists.  E.g. to get a frequency list of lemma/pos combinations:

	cwb-scan-corpus -o lemma_pos.txt CORPUS lemma+0 pos+0

Then load lemma_pos.txt into your tool of choice and sort either by POS tag or by lemma, then by frequency.  If you just want a lemma frequency list for a particular POS, you can do e.g.

	cwb-scan-corpus -o lemma_ADJ.txt CORPUS lemma+0 '?pos+0=/JJ.*/'

(assuming the Penn tagset).

Best,
Stefan


More information about the CWB mailing list