<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
        {font-family:Verdana;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
        {font-family:Webdings;
        panose-1:5 3 1 2 1 5 9 6 7 3;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
span.EmailStyle17
        {mso-style-type:personal-reply;
        font-family:"Verdana","sans-serif";
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-family:"Calibri","sans-serif";
        mso-fareast-language:EN-US;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-GB" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D">&gt;&gt;</span> This obviously has to do with the labels year, source and error, which don't have the necessary closing<span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D">No, it&#8217;s because they are not XML, but only pseudo-XML: no attribute name is given. It&#8217;s illegal XML to link a value to the tag identifier with an =. There
 needs to be a separate attribute name. <o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D">The fact that the process gets 600K lines into the corpus before hitting this error suggests that this error may not be found in most of the corpus, maybe?
 So perhaps the earlier texts will give you an example of what this is <i>supposed</i> to look like.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D">Note that even if this is corrected, it does not necessarily mean it will work as you wish in CQPweb, as CWB in general and CQPweb in particular do not have
 unrestricted XML support.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D">best<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D">Andrew.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><b><span lang="EN-US" style="font-size:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;">From:</span></b><span lang="EN-US" style="font-size:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;"> cwb-bounces@sslmit.unibo.it [mailto:cwb-bounces@sslmit.unibo.it]
<b>On Behalf Of </b>Andres Chandia<br>
<b>Sent:</b> 27 January 2014 18:23<br>
<b>To:</b> Open source development of the Corpus WorkBench<br>
<b>Subject:</b> [CWB] WACKy corpora and cwb<o:p></o:p></span></p>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
<p class="MsoNormal">Is there any easy way to transform the metadata format for the Wacky corpora so that they can be used with the cqpWeb interface? We are trying to install a few of these corpora but I have problems with some of the headings.<br>
<br>
When I try to index (encode) I get the following errors:<br>
<br>
Malformed tag &lt;source=&quot;10178&quot;/&gt;, inserted literally (file /B_NFS_P/resources/corpora/written/data/de/sdewac/sdewac-v3.tagged, line #633867).<br>
Malformed tag &lt;error=&quot;0.0185185185185185&quot;/&gt;, inserted literally (file /B_NFS_P/resources/corpora/written/data/de/sdewac/sdewac-v3.tagged, line #633868).<br>
Malformed tag &lt;source=&quot;10183&quot;/&gt;, inserted literally (file /B_NFS_P/resources/corpora/written/data/de/sdewac/sdewac-v3.tagged, line #633929).<br>
<br>
This obviously has to do with the labels year, source and error, which don't have the necessary closing.<br>
<br>
&lt;sentence&gt;<br>
&lt;year&gt;=&quot;0&quot;/&gt;<br>
&lt;source=&quot;1403&quot;/&gt;<br>
&lt;error=&quot;0.00869565217391304&quot;/&gt;<br>
&lt;s&gt;<br>
Sie&nbsp;&nbsp;&nbsp; PPER&nbsp;&nbsp;&nbsp; Sie|sie<br>
dürfen&nbsp;&nbsp;&nbsp; VMFIN&nbsp;&nbsp;&nbsp; dürfen<br>
<br>
I can do a few transformations using PERL but I'm wondering whether there is something that could make this easier and faster.<br>
<br>
___________________<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;andrés chandía<br>
<a href="http://www.chandia.net" target="_blank"><span style="border:solid windowtext 1.0pt;padding:0cm;text-decoration:none"><img border="0" width="100" height="100" id="_x0000_i1025" src="cid:~WRD331.jpg" alt="Image removed by sender. chandia.net"></span></a><a href="https://twitter.com/andreschandia" target="_blank"><span style="border:solid windowtext 1.0pt;padding:0cm;text-decoration:none"><img border="0" width="100" height="100" id="_x0000_i1026" src="cid:~WRD331.jpg" alt="Image removed by sender."></span></a><br>
administrador de<br>
<a href="http://parles.upf.edu">parles.upf.edu</a><br>
<a href="http://psicoaching.net">psicoaching.net</a><br>
<a href="http://koyaktumapuche.net">mapuche koyaktu</a><br>
<a href="http://corporacionkoyaktu.net">ong mapuche koyaktu</a><br>
<span style="font-size:18.0pt;font-family:Webdings;color:#4F6228">P </span><span style="font-size:10.0pt;color:#4F6228">No imprima innecesariamente. ¡Cuide el medio ambiente!</span><o:p></o:p></p>
</div>
</body>
</html>