[CWB] Two questions
Serge Sharoff
S.Sharoff at leeds.ac.uk
Mon Dec 9 15:33:44 CET 2019
Graham,
wrt your Treetagger question, I've switched to udpipe recently as I need
compatible multilingual processing. The output is in the CONLL format, the same
as treetagger if you keep only the three columns (it can also output the
syntactic dependencies). The only drawback is the need to restore the document
boundaries after processing. In my experience, its French model is reasonably
accurate on general newspaper texts, but its performance on specialised texts
might suffer.
Best,
Serge
On Mon, 2019-12-09 at 14:43 +0100, Graham Ranger wrote:
> Hello to all,
> Three questions, one of which is not strictly cqpweb related, but I hope
> you might be able to help...
> 1) I've been using treetagger to tag French texts, but it performs
> fairly unsatisfactorily, with a strong tendency to decide that
> capitalisation (including the first words of sentences) means proper
> nouns... Of course, I can switch to lowercase everywhere, but this
> creates a whole load of alternative problems. I'd be very interested to
> hear if anyone has found good methods for reliable POS and lemma tagging
> of French, preferably generating a treetagger-type output, since cqpweb
> anticipates this format.
> 2) Is there a simple way in cqpweb of generating lemma / POS
> frequencies. For exemple, all the verbs / adjectives, etc. in a corpus,
> with totals grouped together by lemmata (i.e. not "is", "be", "are",
> etc. as different entries but just "be")? I haven't found a way as yet,
> but I'm sure there must be something.
> 3) A last question concerns pointers for encoding indications regarding
> speakers. I'd like to be able to include information on speaker sex,
> social category, age, etc. in a corpus of fiction, with a view to
> studying the stylistic correlations of an author in direct speech
> representation. Would this best be done as speaker attributes?
> Thank you in advance for any answers, suggestions.
> Best,
> Graham.
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
More information about the CWB
mailing list