Content of the tiny corpus:<br /><br /><tt>
<pre>We want to index parallel corpora.<br />Queremos indexar corpus paralelos.<br /><tt>
<pre><tt>
<pre>This is a test of parallel corpora indexing in CQP.<br />Esta es una prueba de
indexaci&oacute;n de corpus paralelos en CQP.</pre>
</tt></pre>
</tt><tt>
<pre>I write down here a longer sentence to test this method,<br />Escribo aqu&iacute; una
frase m&aacute;s larga para probar este m&eacute;todo,</pre>
</tt></pre>
</tt><tt>
<pre>it seems to me that this would work.<br />me parece que esto funcionar&iacute;a.</pre>
</tt><br /><br /><br />On Thu, February 20, 2014 18:24, Josep M. Fontana wrote:<br /> <!--
begin sanitized html -->
<div style="background-color: #FFFFFF; color: #000000; " class="bodyclass">
<div class="moz-cite-prefix">Andr&eacute;s, jo no veig que funcioni. A no       ser que sigui
tan tiny, tan tiny que no hi hagis posat les       paraules que jo he intentat cercar. He
posat &quot;the man&quot; o       simplement &quot;the&quot; i no troba res.<br />       <br
/>       JM<br />     </div>
<blockquote type="cite"
cite="mid:31df2556d4a240a29db8105781e955b9.squirrel@mail.chandia.net">Dear Ray Wu<br />      
<br />       I did it the way you suggest, it is easy and clear: here my test       parallel
corpus: <a title="This external link will open in a new window" target="_blank"
href="http://parles.upf.edu/llocs/cqp/tr_en_es/" moz-do-not-send="true">parallel tiny        
test corpus         (english-spanish)</a> <br />       user and password: guest<br />      
<br />       <br />       <br />       On Sat,       February 15, 2014 10:05, Ray Wu wrote:<br
/>
<div style="line-height:1.7;color:#000000;font-size:14px;font-family:arial">Hi         all,<br
/>         Andrew is right. We made no modification to the code and simply         used the   
     translation-visualisation feature. It can be achieved like this:<br />
<div>
<p align="left" class="MsoNormal" style="margin-bottom:12.0pt;text-align:left;            
mso-pagination:widow-orphan"><span lang="EN-US" style="color:blue">Step 1: </span><span
lang="EN-US">Prepare a CQPweb-compatible corpus file              
&acirc;&euro;&oelig;test.txt&acirc;&euro; (in utf-8 format):<br />               <text
id="test"><br />                 The                 original language</text></span><span
lang="EN-US">&quot;&gt;<br />               The<br />               translated<br />          
    text<br />               .<br />               <br />               <br />              
<br />               <span style="color:blue">Step 2: </span>When               installing a
new corpus, go to configure the corpus by               specifying the info as required by    
          &acirc;&euro;&oelig;S-attributes (XML elements) -&gt; Use custom setup&acirc;&euro;
              as:<br />               <span style="color:red">0+trans</span><br />            
  (NB: Specify               &acirc;&euro;&oelig;P-attributes&acirc;&euro; as necessary if
your corpus is               different from               mine.)<br />               <span
style="mso-spacerun: yes"> </span><br />               <span style="color:blue">Step          
      3:</span> When everything done, go to &acirc;&euro;&oelig;Manage              
visualisations-&gt; Free               translation -&gt; Select XML element/attribute to get
the               translation from&acirc;&euro; and               choose
&acirc;&euro;&oelig;s_trans&acirc;&euro; to provide whole-sentence              
translation.</span></p>
Although it works, it certainly lacks some features provided           by cwb-align, for
instance, it           doesn't support the alignment of more than two languages. We          
are still finding ways to address           this issue.<br />           <br />           <span
style="color: rgb(128, 128, 128);"><span style="color:               rgb(128,              
128, 128);"><span style="color: rgb(0, 0, 0);"><span style="color: rgb(0, 0,                  
0);">Best,</span><br />                 <span style="color: rgb(0, 0,
0);">Ray</span></span><br />             </span></span></div>
<div id="divNeteaseMailCard"> </div>
<br />         At 2014-02-14 04:41:09,&quot;Hardie, Andrew&quot; wrote:<br />        
<blockquote id="isReplyContent" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex;          
BORDER-LEFT: #ccc 1px solid"> It looks to           me like they are using the
translation-visualisation           feature. This is really designed for interlinear field
data,           where you would have the original           language as the word p-attribute,
the morpheme gloss as the           primary annotation p-attribute,           and the free
translation as an annotated s-attribute. However,           I built it in such a way that     
     you can turn on translations without glossing. I think that's           what they've
done, putting one           corpus into the XML of the other. No reason why others          
shouldn't be able to use the same           trick.
<div><br />           </div>
<div>Worth noting once again that I never actually finished             work on the advanced  
          visualisations.</div>
<div><br />           </div>
<div>Best</div>
<div><br />           </div>
<div>Andrew.</div>
<br />           <br />           <br />           &quot;Josep M. Fontana&quot; <a title="This
external link will open in a new window" target="_blank" href="mailto:josepm.fontana@upf.edu"
class="moz-txt-link-abbreviated">josepm.fontana@upf.edu</a>&gt; wrote:<br />           <br /> 
         <br /><style type="text/css">-></style><font size="2"><span style="font-size:10pt;">
<div class="PlainText"><br />                 &gt;&gt;&gt; Is it possible right now to use the
CQPweb                 interface to exploit parallel corpora?<br />                
&gt;&gt;&gt; The question is: is the future here                 already?<br />               
 &gt; No.<br />                 &gt;<br />                 &gt; This is still planned, but I
have not had time                 to do it yet.<br />                 <br />                
OK, so this means that the people who did this had to do                 quite a              
  bit of <br />                 hacking:<br />                 <br />                 <a
target="_blank" href="http://124.193.83.252/cqp/" title="This external link will open in a new
window" moz-do-not-send="true">http://124.193.83.252/cqp/</a><br />                 <br />    
            If you notice, at the end there are a few parallel                 corpora. Now
the <br />                 access is                 restricted but I had been able to access
and it really                 seemed <br />                 to work well.<br />               
 <br />                 JM<br />                 <br />                 <br />                
<br />                 &gt;<br />                 &gt; best<br />                 &gt;<br />  
              &gt; Andrew.<br />                 &gt;<br />                 &gt; -----Original
Message-----<br />                 &gt;<br />                <br />From: <a target="_blank"
href="mailto:cwb-bounces@sslmit.unibo.it" title="This external link will open in a new window"
moz-do-not-send="true">cwb-bounces@sslmit.unibo.it</a>                 [<a target="_blank"
href="mailto:cwb-bounces@sslmit.unibo.it" title="This external link will open in a new window"
moz-do-not-send="true">mailto:cwb-bounces@sslmit.unibo.it</a>]                 On Behalf Of   
             Josep M. Fontana<br />                 &gt; Sent: 13 February 2014 17:11<br />   
             &gt; To: <a target="_blank" href="mailto:cwb@sslmit.unibo.it" title="This
external link will open in a new window" moz-do-not-send="true">cwb@sslmit.unibo.it</a><br /> 
               &gt; Subject: Re: [CWB] A                 question about the aligning using
cwb-encoding<br />                 &gt;<br />                 &gt; I just found this old      
          thread on alignment and this reminded me of something                 that I had
wanted to ask for a while. Is                 it possible right now to use the CQPweb
interface to                 exploit parallel corpora? We have                 parallel
corpora from translations between different                 languages (so the alignment is
already                 done) but these are using a very problematic and                
proprietary interface. We would like to move                 all of our corpora to the best
web interface there is,                 CQPweb, of course :-)<br />                 &gt;<br />
                &gt; I found a paper written by Andrew<br />                 &gt; (<a
target="_blank" href="http://www.lancaster.ac.uk/people/hardiea/cqpweb-paper.pdf" title="This
external link will open in a new window"
moz-do-not-send="true">http://www.lancaster.ac.uk/people/hardiea/cqpweb-paper.pdf</a>)        
        where he talks about using CQPweb with parallel corpora                 but as
something he was planning for                 the future: &quot;Other planned extensions
remain to be<br />                 &gt; implemented: support for                 concordancing
across parallel corpora;&quot;.<br />                 &gt;<br />                 &gt; The
question is: is the                 future here already?<br />                 &gt;<br />     
           &gt; JM<br />                 &gt;&gt;&gt; Some first sentences were               
 aligned as right pairs.<br />                 &gt;&gt;&gt; But the others were not.<br />    
            &gt;&gt;&gt; It                 seems to be related with statistical aligning
process.<br />                 &gt;&gt; You're absolutely right. cwb-align isn't a            
    particularly sophisticated sentence aligner, so it's                 likely to get some   
             cases wrong. You may be seeing particularly bad                 performance if
you're using the default                 parameter settings, which are intended for related   
             languages and are based on sentence length                 (in characters),
character n-gram counts and identical                 words.<br />                 &gt;&gt;<br
/>                 &gt;&gt;                 For Korean-English alignment, the best solution
might be                 to get a good bilingual word list and                 use that as the
only feature (dropping even sentence                 length).<br />                
&gt;&gt;<br />                 &gt;&gt;&gt; Actually I made two corpora so, that every        
        pair sentence should have the same                 sentence id like <s id="100"> or <s
id="10000">, in                     order to avoid the failure of statistical                 
   alignment.<br />                     &gt;&gt;&gt; I am working with 60000 sentences. And   
                 I manually aligned all                     sentences and put the information
into the xml tag                     &quot;s_id&quot;.<br />                    
&gt;&gt;&gt;<br />                     &gt;&gt;&gt; My question is how I can make useful      
              the manually created xml tag                     &quot;s_id&quot;?<br />        
            &gt;&gt; If these are only 1:1 alignments, you can                     use a trick
to                     smuggle them past cwb-align:<br />                     &gt;&gt;<br />  
                  &gt;&gt; cwb-align -V s_id -o                     alignment.txt CORPUS1
CORPUS2 s -C:1<br />                     &gt;&gt;<br />                     &gt;&gt; With
&quot;-V s_id&quot;,                     the manually aligned sentence pairs are taken as a   
                 pre-alignment, and the statistical aligner                     is only run
within each pair of pre-aligned regions.                     Since each of those contains just
a                     single sentence pair, it cannot further break up the                    
bead, so the original pre-aligment is                     passed through. Feature specs
shouldn't matter here,                     so you might as well just specify -C:1             
       to avoid unnecessary overhead. You can then proceed                     to
cwb-align-encode the generated file                     alignment.txt as usual.<br />         
           &gt;&gt;<br />                     &gt;&gt; If you have more complex alignments
(n:1                     or 1:n, 2:2, ...), you could add new XML regions,                    
e.g.<br />                     &gt;&gt;<br />                     &gt;&gt; <bead id="100"> ...
</bead><br />                     &gt;&gt;<br />                     &gt;&gt; and use -V
bead_id for the                     pre-alignment in cwb-align.<br />                    
&gt;&gt;<br />                     &gt;&gt;<br />                     &gt;&gt; If you have a
recent                     version of the CWB/Perl interface, the best strategy               
     is to use the cwb-align-import tool. You'll have to                     provide a
separate alignment file that lists the                     sentence IDs in source and         
           target corpus for each alignment bead. Complex                     alignments
require no special treatment with                     this tool. See &quot;perldoc
cwb-align-import&quot; for usage                     and format details.<br />                
    &gt;&gt;<br />                     &gt;&gt;<br />                     &gt;&gt; Best,<br />
                    &gt;&gt; Stefan<br />                     &gt;&gt;                    
_______________________________________________<br />                     &gt;&gt; CWB mailing
list<br />                     &gt;&gt;                     <a target="_blank"
href="mailto:CWB@sslmit.unibo.it" title="This external link will open in a new window"
moz-do-not-send="true">CWB@sslmit.unibo.it</a><br />                     &gt;&gt; <a
target="_blank" href="http://devel.sslmit.unibo.it/mailman/listinfo/cwb" title="This external
link will open in a new window"
moz-do-not-send="true">http://devel.sslmit.unibo.it/mailman/listinfo/cwb</a><br />            
        &gt; _______________________________________________<br />                     &gt;
CWB mailing list<br />                     &gt;                     <a target="_blank"
href="mailto:CWB@sslmit.unibo.it" title="This external link will open in a new window"
moz-do-not-send="true">CWB@sslmit.unibo.it</a><br />                     &gt; <a
target="_blank" href="http://devel.sslmit.unibo.it/mailman/listinfo/cwb" title="This external
link will open in a new window"
moz-do-not-send="true">http://devel.sslmit.unibo.it/mailman/listinfo/cwb</a><br />            
        &gt; _______________________________________________<br />                     &gt;
CWB mailing list<br />                     &gt;                     <a target="_blank"
href="mailto:CWB@sslmit.unibo.it" title="This external link will open in a new window"
moz-do-not-send="true">CWB@sslmit.unibo.it</a><br />                     &gt; <a
target="_blank" href="http://devel.sslmit.unibo.it/mailman/listinfo/cwb" title="This external
link will open in a new window"
moz-do-not-send="true">http://devel.sslmit.unibo.it/mailman/listinfo/cwb</a><br />            
        <br />                     _______________________________________________<br />      
              CWB mailing list<br />                     <a target="_blank"
href="mailto:CWB@sslmit.unibo.it" title="This external link will open in a new window"
moz-do-not-send="true">CWB@sslmit.unibo.it</a><br />                     <a target="_blank"
href="http://devel.sslmit.unibo.it/mailman/listinfo/cwb" title="This external link will open
in a new window"
moz-do-not-send="true">http://devel.sslmit.unibo.it/mailman/listinfo/cwb</a><br />            
      </s></s></div>
<s id="100"><s id="10000"> </s></s></span></font><s id="100"><s id="10000">
</s></s></blockquote>       </div>
<s id="100"><s id="10000"><br />           <br />           <span title="neteasefooter"><span
id="netease_mail_footer"></span></span></s></s>        <br />       <br />       <br />      
_______________________<br />                   andr&eacute;s       chand&iacute;a<br />      
<a title="This external link will open in a new window" target="_blank"
href="http://www.chandia.net" moz-do-not-send="true"><img border="0"
src="../images/sec_remove_eng.png" alt="chandia.net" moz-do-not-send="true" /></a><a
title="This external link will open in a new window" target="_blank"
href="https://twitter.com/andreschandia" moz-do-not-send="true"><img alt=""
src="../images/sec_remove_eng.png" moz-do-not-send="true" /></a><br />       administrador
de<br />       <a title="This external link will open in a new window" target="_blank"
href="http://parles.upf.edu" moz-do-not-send="true">parles.upf.edu</a><br />       <a
title="This external link will open in a new window" target="_blank"
href="http://psicoaching.net" moz-do-not-send="true">psicoaching.net</a><br />       <a
title="This external link will open in a new window" target="_blank"
href="http://koyaktumapuche.net" moz-do-not-send="true">mapuche         koyaktu</a><br />     
 <a title="This external link will open in a new window" target="_blank"
href="http://corporacionkoyaktu.net" moz-do-not-send="true">ong         mapuche koyaktu</a><br
/>       <span style="font-size:         18pt; color: rgb(79, 98, 40); font-family:
Webdings;">P </span><span style="font-size: 10pt;         color: rgb(79, 98, 40);">No imprima
innecesariamente. &iexcl;Cuide el         medio ambiente!</span>       <br />       <fieldset
class="mimeAttachmentHeader"></fieldset>       <br />
<pre wrap="">_______________________________________________ CWB mailing list <a title="This
external link will open in a new window" target="_blank" href="mailto:CWB@sslmit.unibo.it"
class="moz-txt-link-abbreviated">CWB@sslmit.unibo.it</a> <a title="This external link will
open in a new window" target="_blank" href="http://devel.sslmit.unibo.it/mailman/listinfo/cwb"
class="moz-txt-link-freetext">http://devel.sslmit.unibo.it/mailman/listinfo/cwb</a> </pre>
</blockquote>     <br />   </div>
<!-- end sanitized html --> <br /><br /><br />_______________________<br
/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;andr&eacute;s
chand&iacute;a<br /><a target="_blank" href="http://www.chandia.net"><img border="0"
alt="chandia.net" src="http://www.chandia.net/sites/default/files/images/chandia.netd.png"
/></a><a target="_blank" href="https://twitter.com/andreschandia"><img
src="http://www.upf.edu/universitat/_img/ico_tw.png" alt="" /></a><br />administrador de<br
/><a href="http://parles.upf.edu">parles.upf.edu</a><br /><a
href="http://psicoaching.net">psicoaching.net</a><br /><a
href="http://koyaktumapuche.net">mapuche koyaktu</a><br /><a
href="http://corporacionkoyaktu.net">ong mapuche koyaktu</a><br /><span style="font-size:
18pt; color: rgb(79, 98, 40); font-family: Webdings;">P </span><span style="font-size: 10pt;
color: rgb(79, 98, 40);">No imprima innecesariamente. &iexcl;Cuide el medio ambiente!</span>