<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=utf-8">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <font size="+1">Dear all,<br>

      <br>

      I've managed to import the alignment of two corpora at sentence

      level. I don't mind to document the process somehow for the

      encoding tutorial.<br>

      <br>

      However, I had came across with an error when trying to align

      structural attributes </font><font size="+1"><font size="+1">in a

        different corpus</font>.<br>

      <br>

      &gt; sh add_difficulties_align_test.sh <br>

      Generating keys for grid regions:<br>

        - TDC-AD-TEST ..... ok<br>

        - TDC-TT-TEST ..... ok<br>

      Processing .Error: alignment bead #4 is non-contiguous in

      TDC-TT-TEST<br>

          (keys: ep1_tr10_dif_3 ep1_tr10_dif_4)<br>

      <br>

      You can find attached a test data set to reproduce the issue. My

      question is, is there a way to overcome this error?<br>

      <br>

      This alignment is basically some kind of "word alignment", however

      I am not aligning all words, but only those words on the source

      text contained within a structural attribute, and I align them

      only with the structural attribute(s) containing the translation.

      Sometimes, depending on the source text unit, the translation is a

      non-contiguous rendering. See the example below, specially </font><font

      size="+1"><font size="+1">difficulty id="ep1_tr10_dif_3" in the

        source text and its translation </font></font><font size="+1"><font

        size="+1"><font size="+1">(difficulty id="ep1_tr10_dif_3"</font>

      </font>and </font><font size="+1"><font size="+1">difficulty

        id="ep1_tr10_dif_4"</font>).<br>

      <br>

      #-- source<br>

      <br>

      the<br>

      &lt;difficulty id="ep1_tr10_dif_2" type="unspec"&gt;<br>

      interbank<br>

      market<br>

      &lt;/difficulty&gt;<br>

      is<br>

      &lt;difficulty id="ep1_tr10_dif_3" type="unspec"&gt;<br>

      restarted<br>

      &lt;/difficulty&gt;<br>

      .<br>

      <br>

      #-- translation<br>

      <br>

      el<br>

      &lt;difficulty id="ep1_tr10_dif_2" type="unspec"&gt;<br>

      mercado<br>

      interbancario<br>

      &lt;/difficulty&gt;<br>

      &lt;difficulty id="ep1_tr10_dif_3" type="unspec"&gt;<br>

      vuelva<br>

      a<br>

      poner<br>

      &lt;/difficulty&gt;<br>

      se<br>

      &lt;difficulty id="ep1_tr10_dif_4" type="unspec"&gt;<br>

      en<br>

      marcha<br>

      &lt;/difficulty&gt;<br>

      .<br>

      <br>

      #-- alignment<br>

      <br>

      ep1_tr10_dif_2    ep1_tr10_dif_2<br>

      ep1_tr10_dif_3    ep1_tr10_dif_3 ep1_tr10_dif_4<br>

      <br>

      I also tried to wrap each work with an XML element like:<br>

      <br>

      &lt;token id="ep1_tr10_t_2"&gt;<br>

      mercado<br>

      &lt;/token&gt;<br>

      &lt;token id="ep1_tr10_t_3"&gt;<br>

      interbancario<br>

      &lt;/token&gt;<br>

      &lt;token id="ep1_tr10_t_4"&gt;<br>

      vuelva<br>

      &lt;/token&gt;<br>

      &lt;token id="ep1_tr10_t_5"&gt;<br>

      a<br>

      &lt;/token&gt;<br>

      &lt;token id="ep1_tr10_t_6"&gt;<br>

      poner<br>

      &lt;/token&gt;<br>

      &lt;token id="ep1_tr10_t_54"&gt;<br>

      se<br>

      &lt;/token&gt;<br>

      &lt;token id="ep1_tr10_t_7"&gt;<br>

      en<br>

      &lt;/token&gt;<br>

      &lt;token id="ep1_tr10_t_8"&gt;<br>

      marcha<br>

      &lt;/token&gt;<br>

      <br>

      So the tokens involved in the alignment have to be contiguous (not

      the structural elements). In the example given, this is trivial

      (one token more or less...), but I have other cases where elements

      appear much far apart and I don't want to include all the tokens

      in between.<br>

      <br>

      Although my case is a bit special, I don't think this is an

      infrequent scenario see Amoia et al. 2011

      <a class="moz-txt-link-freetext" href="http://www.aclweb.org/anthology/W11-4302">http://www.aclweb.org/anthology/W11-4302</a>.<br>

      <br>

      Any comments, hints, will be much appreciated.<br>

      <br>

      Cheers,<br>

      <br>

      jmm<br>

    </font>

  </body>

</html>