[CWB] encoder script for AnCora or DEFT Spanish treebanks?
John Hale
jthale at uga.edu
Thu Jun 20 14:13:23 CEST 2019
Hi, before reinventing the wheel I wanted to ask the CWB list whether anyone has already created an encoder script for the XML annotations used in the CLiC group’s Spanish corpora<http://clic.ub.edu/corpus>? This annotation system is also used in the DEFT Spanish treebank<https://catalog.ldc.upenn.edu/LDC2018T01> and documented fairly exhaustively in this English-language publication:
Soriano, B., O. Borrega, M. Taulé and M.A. Martí (2008) Guidelines,
3LB-WP-02-03, Universitat de Barcelona.
(http://clic.ub.edu/corpus/webfm_send/17)
It’s straightforward enough to thresh out the word (“wd”) attributes and morphology as positional attributes,
but my ambition is to encode the syntactic annotations as s-attributes as well, along the lines suggested in the CWB manual<http://cwb.sourceforge.net/files/CWB_Encoding_Tutorial/node7.html>.
with grateful for any tips you might have,
-john
SSN transport rule bypass code: 810-23-2567-984-015
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20190620/5a1f9633/attachment.html>
More information about the CWB
mailing list