[CWB] Error cwb-align-import
Stefan Evert
stefanML at collocations.de
Wed Jun 24 09:00:28 CEST 2015
> I've managed to import the alignment of two corpora at sentence level. I don't mind to document the process somehow for the encoding tutorial.
Thanks, that would be really neat.
> You can find attached a test data set to reproduce the issue. My question is, is there a way to overcome this error?
Unfortunately not because ...
> This alignment is basically some kind of "word alignment", Sometimes, depending on the source text unit, the translation is a non-contiguous rendering.
… CWB's alignment attributes are designed for sentence-level alignment (e.g. in a translation memory) and thus ...
> So the tokens involved in the alignment have to be contiguous (not the structural elements). In the example given, this is trivial (one token more or less...), but I have other cases where elements appear much far apart and I don't want to include all the tokens in between.
… alignment beads can only link a contiguous range of tokens in the source corpus to another contiguous range in the target corpus. That's already a big improvement over early versions of CWB, which didn't even allow gaps between different beads or crossing alignments.
> Although my case is a bit special, I don't think this is an infrequent scenario see Amoia et al. 2011 http://www.aclweb.org/anthology/W11-4302.
Certainly, but these are applications that (alignment in) CWB hasn't been designed for.
CWB4, when it eventually arrives, will allow for much more flexible types of alignment.
Best,
Stefan
More information about the CWB
mailing list