[CWB] Two questions
Graham Ranger
graham.ranger at univ-avignon.fr
Mon Dec 9 14:43:35 CET 2019
Hello to all,
Three questions, one of which is not strictly cqpweb related, but I hope
you might be able to help...
1) I've been using treetagger to tag French texts, but it performs
fairly unsatisfactorily, with a strong tendency to decide that
capitalisation (including the first words of sentences) means proper
nouns... Of course, I can switch to lowercase everywhere, but this
creates a whole load of alternative problems. I'd be very interested to
hear if anyone has found good methods for reliable POS and lemma tagging
of French, preferably generating a treetagger-type output, since cqpweb
anticipates this format.
2) Is there a simple way in cqpweb of generating lemma / POS
frequencies. For exemple, all the verbs / adjectives, etc. in a corpus,
with totals grouped together by lemmata (i.e. not "is", "be", "are",
etc. as different entries but just "be")? I haven't found a way as yet,
but I'm sure there must be something.
3) A last question concerns pointers for encoding indications regarding
speakers. I'd like to be able to include information on speaker sex,
social category, age, etc. in a corpus of fiction, with a view to
studying the stylistic correlations of an author in direct speech
representation. Would this best be done as speaker attributes?
Thank you in advance for any answers, suggestions.
Best,
Graham.
More information about the CWB
mailing list