[CWB] CWB and CoNLL format
Maarten Janssen
maartenpt at gmail.com
Wed Mar 3 12:14:35 CET 2021
Hi Stefan,
That looks nice, I just have some clarification questions (I will also just try it out, but that tends to take more time) - which have to do with the fact that I am working on several UD related tools and corpora in TEITOK, which will hence be CoNLL-U corpora in CWB
- In understand it takes empty lines as sentences, but does it also do doc and s attributes? And what does it use for the pattributes for the columns? (TEITOK uses what the standard describes: form, upos, xpos, feats, deprel, deps, head, and misc)
- TEITOK also uses <s> (since it comes from TEI) but in UD they use sent - what was the motivation behind <s>? (I am trying to find out from the UD community whether <s> would be acceptable)
- Is there also a CoNLL-U export, and if so, does that require anything special in the compiled corpus?
Maarten
More information about the CWB
mailing list