[CWB] Bilingual corpus alignment
Hardie, Andrew
a.hardie at lancaster.ac.uk
Tue Oct 4 14:13:46 CEST 2022
The name of an alignment attribute is the name of the corpus it “points at”.
so, when working in TEST-EN,
show +test-chn;
turns on display of the parallel text.
CQP manual, chapter 5.
best
Andrew.
From: cwb-bounces at sslmit.unibo.it <cwb-bounces at sslmit.unibo.it> On Behalf Of Austin Yang
Sent: 04 October 2022 01:47
To: Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it>
Subject: Re: [CWB] Bilingual corpus alignment
Hey Andrew and all of the community,
Thanks for the reply! Your reply is greatly appreciated!
This is my first time working with a bilingual corpus, so forgive me for my ignorance in advance.
I'm still a bit confused to what the alignment attribute. The alignment command is 'sudo cwb-align-import -r '/var/CQPweb/registry' -p test.algn'
Output: Use of uninitialized value $12_keys in split at /usr/local/bin/cwb-align-import line 119, <$fn> line3. Alignment TEST-EN => TEST-CHN has been created with 7 non-empty beads.
I tried the 'show + test.algn', however it doesn't seem to work, and the registry file doesn't seem to give much information in this regard.
Does it mean the alignment failed? Or I didn't set a designated alignment attribute?
Another kind of out of scope question is that assuming everything works out in cqp. Is it possible to upload and present the bilingual part (assume some queried 'Taiwan' it should show a English segment containing 'Taiwan' and a Chinese segment in the next line) in CQPweb?
Once again, any help is desperately needed and deeply appriciated!
Best,
Austin Yang (楊承洋)
MS in Cognitive Neuroscience, NCU
BS in Psychology, CYCU
On Mon, Oct 3, 2022 at 5:30 PM Hardie, Andrew <a.hardie at lancaster.ac.uk<mailto:a.hardie at lancaster.ac.uk>> wrote:
You need to
show +your_alignment_attribute
in CQP
best
Andrew.
From: cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it> <cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it>> On Behalf Of Austin Yang
Sent: 03 October 2022 02:44
To: cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>
Subject: [CWB] Bilingual corpus alignment
Dear all,
Recently I've encountered a problem using cwb's alignment encoding function.
"Problem" might not be the accurate word but, I used a different alignment tool and fitted into cwb's standard format, and ran the regedit and encode procedure. This created an alx file in the source language index file. The tutorial says "This procedure only creates an a-attribute in HOLMES-EN (source corpus), linking it to HOLMES-DE (target corpus).", but that's all I can find. I don't know how to use cqp/cwb to present sentence alignment (i.e. I imagine querying "Sherlock" in the source corpus, it will present both the English and Dutch sentence including "Sherlock"). The attachment shows the command and output. I'm not even sure if the alignment is successful or not. Any help or information that sheds some light to this situation will be greatly appreciated!
Best,
Austin Yang (楊承洋)
MS in Cognitive Neuroscience, NCU
BS in Psychology, CYCU
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it<mailto:CWB at sslmit.unibo.it>
http://liste.sslmit.unibo.it/mailman/listinfo/cwb<https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fliste.sslmit.unibo.it%2Fmailman%2Flistinfo%2Fcwb&data=05%7C01%7Chardiea%40live.lancs.ac.uk%7Cf42d11c1ab934137767b08daa5a20708%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C0%7C638004412997936887%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000%7C%7C%7C&sdata=gEKPK%2Bg%2Fk%2Fe86U1AyxPQ913TlWXtb6vgzbLKya%2BQ6FE%3D&reserved=0>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20221004/94170ca8/attachment.html>
More information about the CWB
mailing list