<div dir="ltr"><div><div><div><div>Ok guys, thank you for info. I will tell them not to use the script anymore.<br><br></div>Maybe I will generate a script later, when I have a little time to sit down and look for perl info, but for now I need to encode de corpus.<br></div></div><span id="result_box" class="" lang="en"><span class=""><br>What they</span> <span class="">told me</span> <span class="">they were doing</span> <span class="">so far?<br></span></span></div><div><span id="result_box" class="" lang="en"><span class="">- They take some texts and translations<br></span></span></div><div><span id="result_box" class="" lang="en"><span class="">- Put them in Déjàvu to align them<br></span></span></div><div><span id="result_box" class="" lang="en"><span class="">- Use a perl script to separte in two different texts (original and translation)<br></span></span></div><div><span id="result_box" class="" lang="en"><span class="">- Use TreeTagger to tag<br></span></span></div><div><span id="result_box" class="" lang="en"><span class="">- Join all the texts together to make just (cat original* > original.txt)<br><br></span></span>What I have?<br>- Like 10 .txt files in 2 languages. (original an translation)<span id="result_box" class="" lang="en"><span class=""><br></span></span></div><div><span id="result_box" class="" lang="en"><span class="">I'm in the step before to join the originals and translations<br><br></span></span></div><div><span id="result_box" class="" lang="en"><span class="">What I did?<br></span></span></div><div><span id="result_box" class="" lang="en"><span class="">- Join the texts. <a href="http://i.imgur.com/KN9Z7cY.png">http://i.imgur.com/KN9Z7cY.png</a><br></span></span></div><div><span id="result_box" class="" lang="en"><span class="">- Look the tags in them: <s> <text> <unknown> <a href="http://i.imgur.com/pTisyew.png">http://i.imgur.com/pTisyew.png</a> (I had to change <unknown> to unknown)<br></span></span></div><div><span id="result_box" class="" lang="en"><span class="">- Fill the form like this. <a href="http://i.imgur.com/lmDvLTy.png">http://i.imgur.com/lmDvLTy.png</a><br></span></span></div><div><span id="result_box" class="" lang="en"><span class="">- Get the error: <a href="http://i.imgur.com/RoY8sRu.png">http://i.imgur.com/RoY8sRu.png</a><br></span></span></div><div><span id="result_box" class="" lang="en"><span class=""><br></span></span></div><div><span id="result_box" class="" lang="en"><span class="">Is this the best way? Are we doing something wrong?<br><br></span></span></div><div><span id="result_box" class="" lang="en"><span class="">Thank you all.<br></span></span></div></div><div class="gmail_extra"><br><div class="gmail_quote">2015-12-17 8:25 GMT+01:00 Stefan Evert <span dir="ltr"><<a href="mailto:stefanML@collocations.de" target="_blank">stefanML@collocations.de</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><br>
> On 16 Dec 2015, at 21:34, Daniel Renau <<a href="mailto:alphak87@gmail.com">alphak87@gmail.com</a>> wrote:<br>
><br>
> Now my doubts are...<br>
> 1- Better modify the script to call the encoder with "-c utf8"?<br>
<br>
</span>Don't use the script from the command line, but rather write a small Perl script using the CWB::Encoder module. The command-line script you're running is basically the same thing, and just sets some parameters from command-line flags, others to immutable default values.<br>
<br>
With your own Perl script, you can then use the ->charset() method to encode a UTF-8 corpus. If you know a little Perl, it would also be easy to change the command-line script so that it accepts a new flag for setting the charset.<br>
<div class="HOEnZb"><div class="h5"><br>
Best,<br>
Stefan<br>
_______________________________________________<br>
CWB mailing list<br>
<a href="mailto:CWB@sslmit.unibo.it">CWB@sslmit.unibo.it</a><br>
<a href="http://devel.sslmit.unibo.it/mailman/listinfo/cwb" rel="noreferrer" target="_blank">http://devel.sslmit.unibo.it/mailman/listinfo/cwb</a><br>
</div></div></blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature">Un saludo, Dani.</div>
</div>