[CWB] Parallel corpus alignment question
Graham Ranger -- UAPV
graham.ranger at univ-avignon.fr
Thu May 22 13:38:21 CEST 2025
Hello to all,
I'm currently trying to set up a parallel corpus including a source text
and four different translations.
The method I use to set up a parallel corpus is this (copied and adapted
from the cqp / cwb manuals):
To set up parallel corpora:
1) Get them installed on cqpweb with the different xml tags declared, etc.
2) Use cwb-align to generate an alignment file suffixed .align, i.e.
cwb-align -r /var/cqpweb/registry/ -o test.align TEST_EN TEST_FR s
This indicates the registry directory explicitly with the -r option.
3) Modify the registry files using nano to indicate the other aligned
corpus. Th
is means modifying /var/cqpweb/registry/"my_corpus" and appending
ALIGNED "other
_corpus".
4) Use cwb-align-encode to point to the alignment file. This need to be
done as
admin i.e. with su and using -d and -r options to point to the data and
registry
directories
The second command does the same thing backwards, i.e. reads the
alignments the
other way round, with the -R switch.
cwb-align-encode -d /var/cqpweb/index/test_en/ -r /var/cqpweb/registry/
test.ali
gn
cwb-align-encode -d /var/cqpweb/index/test_fr/ -r /var/cqpweb/registry/
-R test.
align
5) Test it out in cqpweb.
Now, my question is: can I set up a parallel corpus in such a way that a
search in the source will display all the aligned translations
simultaneously?
If so, is it just a question of following this how-to for each
source-target pair, and then declaring multiple alignments in cqpweb or
do I align all the text from the CLI?
I hope the question is clear and thank you in advance for any guidance.
Best,
Graham.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20250522/04c3c6cc/attachment.html>
More information about the CWB
mailing list