[CWB] Two questions

Graham Ranger graham.ranger at univ-avignon.fr
Mon Dec 9 14:43:35 CET 2019


Hello to all,
Three questions, one of which is not strictly cqpweb related, but I hope 
you might be able to help...
1) I've been using treetagger to tag French texts, but it performs 
fairly unsatisfactorily, with a strong tendency to decide that 
capitalisation (including the first words of sentences) means proper 
nouns... Of course, I can switch to lowercase everywhere, but this 
creates a whole load of alternative problems. I'd be very interested to 
hear if anyone has found good methods for reliable POS and lemma tagging 
of French, preferably generating a treetagger-type output, since cqpweb 
anticipates this format.
2) Is there a simple way in cqpweb of generating lemma / POS 
frequencies. For exemple, all the verbs / adjectives, etc. in a corpus, 
with totals grouped together by lemmata (i.e. not "is", "be", "are", 
etc. as different entries but just "be")? I haven't found a way as yet, 
but I'm sure there must be something.
3) A last question concerns pointers for encoding indications regarding 
speakers. I'd like to be able to include information on speaker sex, 
social category, age, etc. in a corpus of fiction, with a view to 
studying the stylistic correlations of an author in direct speech 
representation. Would this best be done as speaker attributes?
Thank you in advance for any answers, suggestions.
Best,
Graham.


More information about the CWB mailing list