[CWB] Parallel corpus alignment question

Graham Ranger -- UAPV graham.ranger at univ-avignon.fr
Thu May 22 14:03:49 CEST 2025


And a follow-up question... could somebody tell me what the admin 
password is for cqpwebinabox? (I'm trying to do this on a VM with 
cqpwebinabox, before putting it on a public server.)
Thanks again!
Graham.


Le 22/05/2025 à 13:38, Graham Ranger -- UAPV a écrit :
> Hello to all,
> I'm currently trying to set up a parallel corpus including a source 
> text and four different translations.
> The method I use to set up a parallel corpus is this (copied and 
> adapted from the cqp / cwb manuals):
>
> To set up parallel corpora:
>
> 1) Get them installed on cqpweb with the different xml tags declared, etc.
> 2) Use cwb-align to generate an alignment file suffixed .align, i.e.
> cwb-align -r /var/cqpweb/registry/ -o test.align TEST_EN TEST_FR s
> This indicates the registry directory explicitly with the -r option.
> 3) Modify the registry files using nano to indicate the other aligned 
> corpus. Th
> is means modifying /var/cqpweb/registry/"my_corpus" and appending 
> ALIGNED "other
> _corpus".
> 4) Use cwb-align-encode to point to the alignment file. This need to 
> be done as
> admin i.e. with su and using -d and -r options to point to the data 
> and registry
>  directories
> The second command does the same thing backwards, i.e. reads the 
> alignments the
> other way round, with the -R switch.
> cwb-align-encode -d /var/cqpweb/index/test_en/ -r 
> /var/cqpweb/registry/ test.ali
> gn
> cwb-align-encode -d /var/cqpweb/index/test_fr/ -r 
> /var/cqpweb/registry/ -R test.
> align
> 5) Test it out in cqpweb.
>
> Now, my question is: can I set up a parallel corpus in such a way that 
> a search in the source will display all the aligned translations 
> simultaneously?
> If so, is it just a question of following this how-to for each 
> source-target pair, and then declaring multiple alignments in cqpweb 
> or do I align all the text from the CLI?
> I hope the question is clear and thank you in advance for any guidance.
> Best,
> Graham.
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20250522/c41838a9/attachment.html>


More information about the CWB mailing list