[CWB] Bilingual corpus alignment

Austin Yang austin.yang.2014 at gmail.com
Tue Oct 11 10:22:26 CEST 2022


Hello Andrew and all,
We absolutely appreciate your help! It definitely sped up our progression
in a very desperate time.
We have successfully presented bilingual context in a CQP query in
terminal.
While we tried to replicate the results on CQPweb following the
instructions on CQPweb manual chapter 8, there are still some unclear parts
from our end.
>From my understanding the first step is to install two separate bilingual
corpus (we only adjusted the P flag) on CQPweb. (Does the handle of the
courpus matter?)
Then we should create an a-attribute (e.g. test-chn in the first email) on
cwb.
Lastly, we scan the registry for newly-added alignments, and the
a-attribute should appear (e.g., test-chn in this context).
However the a-attribute never appear dispite declaring the a-attribute and
alignment process done before installing the two corpus.
I'm sure we are missing some crucial steps in this process.
I can't confindently answer the two questions:
"First, does a corpus by the name of that attribute exist in CQPweb?
Second, is the alignment already registered within CQPweb?"
Because the attribute is shown in the registry file but I'm not sure if it
exists in CQPweb (same with alignment).
Again, we cannot thank you enough for your help and desperately hope to
hear from you soon!


Best,
Austin Yang (楊承洋)
MS in Cognitive Neuroscience, NCU
BS in Psychology, CYCU


On Tue, Oct 4, 2022 at 8:18 PM Hardie, Andrew <a.hardie at lancaster.ac.uk>
wrote:

> PS, forgot to answer the CQPweb query – the answer is Yes. The ”how” in
> the CQPweb manual chapter 8, esp. sec 8.5 ff.
>
>
>
> best
>
>
>
> Andrew.
>
>
>
> *From:* cwb-bounces at sslmit.unibo.it <cwb-bounces at sslmit.unibo.it> *On
> Behalf Of *Hardie, Andrew
> *Sent:* 04 October 2022 13:14
> *To:* Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it
> >
> *Subject:* Re: [CWB] Bilingual corpus alignment
>
>
>
> The name of an alignment attribute is the name of the corpus it “points
> at”.
>
>
>
> so, when working in TEST-EN,
>
>
>
>           show +test-chn;
>
>
>
> turns on display of the parallel text.
>
>
>
> CQP manual, chapter 5.
>
>
>
> best
>
>
>
> Andrew.
>
>
>
> *From:* cwb-bounces at sslmit.unibo.it <cwb-bounces at sslmit.unibo.it> *On
> Behalf Of *Austin Yang
> *Sent:* 04 October 2022 01:47
> *To:* Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it
> >
> *Subject:* Re: [CWB] Bilingual corpus alignment
>
>
>
> Hey Andrew and all of the community,
>
> Thanks for the reply! Your reply is greatly appreciated!
>
> This is my first time working with a bilingual corpus, so forgive me for
> my ignorance in advance.
>
> I'm still a bit confused to what the alignment attribute. The alignment
> command is 'sudo cwb-align-import -r '/var/CQPweb/registry' -p test.algn'
>
> Output: Use of uninitialized value $12_keys in split at
> /usr/local/bin/cwb-align-import line 119, <$fn> line3. Alignment TEST-EN =>
> TEST-CHN has been created with 7 non-empty beads.
>
> I tried the 'show + test.algn', however it doesn't seem to work, and the
> registry file doesn't seem to give much information in this regard.
>
> Does it mean the alignment failed? Or I didn't set a designated alignment
> attribute?
>
> Another kind of out of scope question is that assuming everything works
> out in cqp. Is it possible to upload and present the bilingual part (assume
> some queried 'Taiwan' it should show a English segment containing 'Taiwan'
> and a Chinese segment in the next line) in CQPweb?
>
> Once again, any help is desperately needed and deeply appriciated!
>
>
>
>
>
> Best,
>
> Austin Yang (楊承洋)
>
> MS in Cognitive Neuroscience, NCU
>
> BS in Psychology, CYCU
>
>
>
>
>
> On Mon, Oct 3, 2022 at 5:30 PM Hardie, Andrew <a.hardie at lancaster.ac.uk>
> wrote:
>
> You need to
>
>
>
> show +your_alignment_attribute
>
>
>
> in CQP
>
>
>
> best
>
>
>
> Andrew.
>
>
>
> *From:* cwb-bounces at sslmit.unibo.it <cwb-bounces at sslmit.unibo.it> *On
> Behalf Of *Austin Yang
> *Sent:* 03 October 2022 02:44
> *To:* cwb at sslmit.unibo.it
> *Subject:* [CWB] Bilingual corpus alignment
>
>
>
> Dear all,
>
> Recently I've encountered a problem using cwb's alignment encoding
> function.
>
> "Problem" might not be the accurate word but, I used a different alignment
> tool and fitted into cwb's standard format, and ran the regedit and encode
> procedure. This created an alx file in the source language index file. The
> tutorial says "This procedure only creates an a-attribute in HOLMES-EN
> (source corpus), linking it to HOLMES-DE (target corpus).", but that's all
> I can find. I don't know how to use cqp/cwb to present sentence alignment
> (i.e. I imagine querying "Sherlock" in the source corpus, it will present
> both the English and Dutch sentence including "Sherlock"). The attachment
> shows the command and output. I'm not even sure if the alignment is
> successful or not. Any help or information that sheds some light to this
> situation will be greatly appreciated!
>
>
>
>
> Best,
>
> Austin Yang (楊承洋)
>
> MS in Cognitive Neuroscience, NCU
>
> BS in Psychology, CYCU
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
> <https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fliste.sslmit.unibo.it%2Fmailman%2Flistinfo%2Fcwb&data=05%7C01%7Chardiea%40live.lancs.ac.uk%7C36490c971395430b40ea08daa601f725%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C0%7C638004824624462088%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=eYfTrVy9fgflfhg1Zm6p%2BIH5Jt7kVmhrz%2FB9leH8Aao%3D&reserved=0>
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20221011/de8fce14/attachment.html>


More information about the CWB mailing list