[CWB] export corpus
Hardie, Andrew
a.hardie at lancaster.ac.uk
Mon Apr 24 12:52:19 CEST 2023
>> For some reason the export totally chokes when I try it from the CQPWeb interface but it takes just seconds via command line
That's the issue I'm investigating. I made some fairly extensive changes to exporting for 3.3, so it's not a surprise to me that odd bugs persist in 3.2.43.
I hope that if I zap the error in the 3.2. branch you should be able to update the code without risking destabilising your system.
Andrew
-----Original Message-----
From: cwb-bounces at sslmit.unibo.it <cwb-bounces at sslmit.unibo.it> On Behalf Of Josep M. Fontana
Sent: Monday, April 24, 2023 11:50 AM
To: cwb at sslmit.unibo.it
Subject: Re: [CWB] export corpus
Thank you Stephanie and Andrew,
This worked perfectly. Stephanie, you were right, the corpus did have some minimal XML annotation for the s-attributes that we were also interested in retrieving so this worked perfectly.
For some reason the export totally chokes when I try it from the CQPWeb interface but it takes just seconds via command line.
We are not running the last version of CQPWeb and we do need to update the versions of CWB related software but things are rather delicate right now in our server and minimal changes that affect dependencies elswhere tend to break things. So we are being a bit cautious until we can put some order in our installation.
JM
>> On 23 Apr 2023, at 19:23, Josep M. Fontana <josepm.fontana at upf.edu> wrote:
>>
>> The corpus for which we need to export the tokens plus the POS and lemma tags more urgently, though, is not in an XML format. It is actually the corpus Cristina developed in her thesis; the first corpus we ever installed using CWB. This was before we started using the CQPWeb interface.
> The fragments of this corpus which I saw many years ago did have some (minimal) XML annotation:
>
>> STRUCTURE doc
>> STRUCTURE doc_author # [annotations]
>> STRUCTURE doc_century # [annotations]
>> STRUCTURE doc_id # [annotations]
>> STRUCTURE doc_collection # [annotations]
> (but perhaps this was only in the subcorpora she had created for our joint work back then).
>
> If your version is similar, you'd want to export it with something
> like
>
> cwb-decode -C CORPUS -P word -P pos [...] -S
> doc+id+century+collection+author
>
> Best,
> Steph
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste/
> .sslmit.unibo.it%2Fmailman%2Flistinfo%2Fcwb&data=05%7C01%7Chardiea%40l
> ive.lancs.ac.uk%7C2887b2859b354a98eabc08db44b1ab9b%7C9c9bcd11977a4e9ca
> 9a0bc734090164a%7C0%7C0%7C638179302103006701%7CUnknown%7CTWFpbGZsb3d8e
> yJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C30
> 00%7C%7C%7C&sdata=%2FGXQfLqAPbVDFvVKk8yb4GYpRLMYlrLL9vng1e2S1Vs%3D&res
> erved=0
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
http://liste.sslmit.unibo.it/mailman/listinfo/cwb
More information about the CWB
mailing list