<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <div class="moz-cite-prefix">Hi, <br>

      I don't know whether this helps, but I use positional attributes

      to encode word alignment, and then transform the output to reflect

      this function. Essentially, you can put anything into these

      positional attributes, also ranges, be they continuous or not. The

      challenge just moves on to transforming the output. <br>

      <br>

      My solution is to have CWB output, including word alignment in the

      positional attributes, as XML, and transform that using XSLT. Have

      a look here: <a class="moz-txt-link-freetext" href="http://www.parasolcorpus.org/KrakowMW/">http://www.parasolcorpus.org/KrakowMW/</a><br>

      <br>

      The interface is open source

      (<a class="moz-txt-link-freetext" href="https://bitbucket.org/rvwfels/paravoz2">https://bitbucket.org/rvwfels/paravoz2</a> ) , but we just found a

      bug which isn't repaired yet, so write to me for details if you

      want to try it out (essentially, you need to follow a certain

      naming convention when encoding the corpus). <br>

      <br>

      Best!<br>

      Ruprecht<br>

      <br>

      <br>

      Am 23.06.2015 um 18:24 schrieb Jose Manuel Martinez Martinez:<br>

    </div>

    <blockquote cite="mid:55898822.40408@gmail.com" type="cite">

      <meta http-equiv="content-type" content="text/html;

        charset=windows-1252">

      <font size="+1">Dear all,<br>

        <br>

        I've managed to import the alignment of two corpora at sentence

        level. I don't mind to document the process somehow for the

        encoding tutorial.<br>

        <br>

        However, I had came across with an error when trying to align

        structural attributes </font><font size="+1"><font size="+1">in

          a different corpus</font>.<br>

        <br>

        &gt; sh add_difficulties_align_test.sh <br>

        Generating keys for grid regions:<br>

          - TDC-AD-TEST ..... ok<br>

          - TDC-TT-TEST ..... ok<br>

        Processing .Error: alignment bead #4 is non-contiguous in

        TDC-TT-TEST<br>

            (keys: ep1_tr10_dif_3 ep1_tr10_dif_4)<br>

        <br>

        You can find attached a test data set to reproduce the issue. My

        question is, is there a way to overcome this error?<br>

        <br>

        This alignment is basically some kind of "word alignment",

        however I am not aligning all words, but only those words on the

        source text contained within a structural attribute, and I align

        them only with the structural attribute(s) containing the

        translation. Sometimes, depending on the source text unit, the

        translation is a non-contiguous rendering. See the example

        below, specially </font><font size="+1"><font size="+1">difficulty

          id="ep1_tr10_dif_3" in the source text and its translation </font></font><font

        size="+1"><font size="+1"><font size="+1">(difficulty

            id="ep1_tr10_dif_3"</font> </font>and </font><font

        size="+1"><font size="+1">difficulty id="ep1_tr10_dif_4"</font>).<br>

        <br>

        #-- source<br>

        <br>

        the<br>

        &lt;difficulty id="ep1_tr10_dif_2" type="unspec"&gt;<br>

        interbank<br>

        market<br>

        &lt;/difficulty&gt;<br>

        is<br>

        &lt;difficulty id="ep1_tr10_dif_3" type="unspec"&gt;<br>

        restarted<br>

        &lt;/difficulty&gt;<br>

        .<br>

        <br>

        #-- translation<br>

        <br>

        el<br>

        &lt;difficulty id="ep1_tr10_dif_2" type="unspec"&gt;<br>

        mercado<br>

        interbancario<br>

        &lt;/difficulty&gt;<br>

        &lt;difficulty id="ep1_tr10_dif_3" type="unspec"&gt;<br>

        vuelva<br>

        a<br>

        poner<br>

        &lt;/difficulty&gt;<br>

        se<br>

        &lt;difficulty id="ep1_tr10_dif_4" type="unspec"&gt;<br>

        en<br>

        marcha<br>

        &lt;/difficulty&gt;<br>

        .<br>

        <br>

        #-- alignment<br>

        <br>

        ep1_tr10_dif_2    ep1_tr10_dif_2<br>

        ep1_tr10_dif_3    ep1_tr10_dif_3 ep1_tr10_dif_4<br>

        <br>

        I also tried to wrap each work with an XML element like:<br>

        <br>

        &lt;token id="ep1_tr10_t_2"&gt;<br>

        mercado<br>

        &lt;/token&gt;<br>

        &lt;token id="ep1_tr10_t_3"&gt;<br>

        interbancario<br>

        &lt;/token&gt;<br>

        &lt;token id="ep1_tr10_t_4"&gt;<br>

        vuelva<br>

        &lt;/token&gt;<br>

        &lt;token id="ep1_tr10_t_5"&gt;<br>

        a<br>

        &lt;/token&gt;<br>

        &lt;token id="ep1_tr10_t_6"&gt;<br>

        poner<br>

        &lt;/token&gt;<br>

        &lt;token id="ep1_tr10_t_54"&gt;<br>

        se<br>

        &lt;/token&gt;<br>

        &lt;token id="ep1_tr10_t_7"&gt;<br>

        en<br>

        &lt;/token&gt;<br>

        &lt;token id="ep1_tr10_t_8"&gt;<br>

        marcha<br>

        &lt;/token&gt;<br>

        <br>

        So the tokens involved in the alignment have to be contiguous

        (not the structural elements). In the example given, this is

        trivial (one token more or less...), but I have other cases

        where elements appear much far apart and I don't want to include

        all the tokens in between.<br>

        <br>

        Although my case is a bit special, I don't think this is an

        infrequent scenario see Amoia et al. 2011 <a

          moz-do-not-send="true" class="moz-txt-link-freetext"

          href="http://www.aclweb.org/anthology/W11-4302">http://www.aclweb.org/anthology/W11-4302</a>.<br>

        <br>

        Any comments, hints, will be much appreciated.<br>

        <br>

        Cheers,<br>

        <br>

        jmm<br>

      </font> <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

CWB mailing list

<a class="moz-txt-link-abbreviated" href="mailto:CWB@sslmit.unibo.it">CWB@sslmit.unibo.it</a>

<a class="moz-txt-link-freetext" href="http://devel.sslmit.unibo.it/mailman/listinfo/cwb">http://devel.sslmit.unibo.it/mailman/listinfo/cwb</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>