Is there any easy way to transform the metadata format for the Wacky corpora so that they can
be used with the cqpWeb interface? We are trying to install a few of these corpora but I have
problems with some of the headings.<br /><br />When I try to index (encode) I get the
following errors:<br /><br />Malformed tag <source="10178"/>, inserted
literally (file /B_NFS_P/resources/corpora/written/data/de/sdewac/sdewac-v3.tagged, line
#633867).<br />Malformed tag <error="0.0185185185185185"/>, inserted literally
(file /B_NFS_P/resources/corpora/written/data/de/sdewac/sdewac-v3.tagged, line #633868).<br
/>Malformed tag <source="10183"/>, inserted literally (file
/B_NFS_P/resources/corpora/written/data/de/sdewac/sdewac-v3.tagged, line #633929).<br /><br
/>This obviously has to do with the labels year, source and error, which don't have the
necessary closing.<br /><br /><sentence><br /><year>="0"/><br
/><source="1403"/><br /><error="0.00869565217391304"/><br
/><s><br />Sie PPER Sie|sie<br
/>dürfen VMFIN dürfen<br /><br />I can do a few
transformations using PERL but I'm wondering whether there is something that could make this
easier and faster.<br /><br />___________________<br
/> andrés
chandía<br /><a target="_blank" href="http://www.chandia.net"><img border="0"
alt="chandia.net" src="http://www.chandia.net/sites/default/files/images/chandia.netd.png"
/></a><a target="_blank" href="https://twitter.com/andreschandia"><img
src="http://www.upf.edu/universitat/_img/ico_tw.png" alt="" /></a><br />administrador de<br
/><a href="http://parles.upf.edu">parles.upf.edu</a><br /><a
href="http://psicoaching.net">psicoaching.net</a><br /><a
href="http://koyaktumapuche.net">mapuche koyaktu</a><br /><a
href="http://corporacionkoyaktu.net">ong mapuche koyaktu</a><br /><span style="font-size:
18pt; color: rgb(79, 98, 40); font-family: Webdings;">P </span><span style="font-size: 10pt;
color: rgb(79, 98, 40);">No imprima innecesariamente. ¡Cuide el medio ambiente!</span>