[CWB] export corpus
Josep M. Fontana
josepm.fontana at upf.edu
Mon Apr 24 12:49:38 CEST 2023
Thank you Stephanie and Andrew,
This worked perfectly. Stephanie, you were right, the corpus did have
some minimal XML annotation for the s-attributes that we were also
interested in retrieving so this worked perfectly.
For some reason the export totally chokes when I try it from the CQPWeb
interface but it takes just seconds via command line.
We are not running the last version of CQPWeb and we do need to update
the versions of CWB related software but things are rather delicate
right now in our server and minimal changes that affect dependencies
elswhere tend to break things. So we are being a bit cautious until we
can put some order in our installation.
JM
>> On 23 Apr 2023, at 19:23, Josep M. Fontana <josepm.fontana at upf.edu> wrote:
>>
>> The corpus for which we need to export the tokens plus the POS and lemma tags more urgently, though, is not in an XML format. It is actually the corpus Cristina developed in her thesis; the first corpus we ever installed using CWB. This was before we started using the CQPWeb interface.
> The fragments of this corpus which I saw many years ago did have some (minimal) XML annotation:
>
>> STRUCTURE doc
>> STRUCTURE doc_author # [annotations]
>> STRUCTURE doc_century # [annotations]
>> STRUCTURE doc_id # [annotations]
>> STRUCTURE doc_collection # [annotations]
> (but perhaps this was only in the subcorpora she had created for our joint work back then).
>
> If your version is similar, you'd want to export it with something like
>
> cwb-decode -C CORPUS -P word -P pos [...] -S doc+id+century+collection+author
>
> Best,
> Steph
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
More information about the CWB
mailing list