[CWB] export corpus
Josep M. Fontana
josepm.fontana at upf.edu
Sun Apr 23 19:23:38 CEST 2023
Thanks Stephanie. I realize that I might have introduced some confusion
when I mixed different things in my first message.
The specific corpus that I mentioned when I talked about the export
problems we were having was our installation of the XML BNC corpus. The
corpus for which we need to export the tokens plus the POS and lemma
tags more urgently, though, is not in an XML format. It is actually the
corpus Cristina developed in her thesis; the first corpus we ever
installed using CWB. This was before we started using the CQPWeb interface.
So, we didn't use the XML format for tags in that corpus. I have tried
to find .vrt files but I find none for that corpus. If those files only
exist for corpora with XML tags, then this is perhaps not surprising.
Is there any other way to export the text with the tags that doesn't
involve extracting that information from a .vrt file?
JM
> If you do it on the command-line rather than via CQPweb, make sure you have CWB v3.5 and read Sec. 8 of the Corpus Encoding Manual carefully to see how you can reconstruct nested XML tags and attribute-value pairs in the start tags (if they have been split up by cwb-encode).
>
> Best,
> Stephanie
>
>> On 23 Apr 2023, at 01:26, Josep M. Fontana <josepm.fontana at upf.edu> wrote:
>>
>> Thanks. We'll try that.
>>
>> JM
>>
>> On 22/4/23 23:48, Hardie, Andrew wrote:
>>> With cwb-decode.
>>>
>>> best
>>>
>>> Andrew
>>>
>>> From: cwb-bounces at sslmit.unibo.it <cwb-bounces at sslmit.unibo.it> On Behalf Of Andrés Chandía
>>> Sent: Thursday, April 20, 2023 6:23 PM
>>> To: Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it>
>>> Subject: [CWB] export corpus
>>>
>>> How do I export big corpus not compromising the machine resources?
>>> No data available in manuals...
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
More information about the CWB
mailing list