[CWB] export corpus
Stephanie Evert
stefanML at collocations.de
Sun Apr 23 22:12:43 CEST 2023
> On 23 Apr 2023, at 19:23, Josep M. Fontana <josepm.fontana at upf.edu> wrote:
>
> The corpus for which we need to export the tokens plus the POS and lemma tags more urgently, though, is not in an XML format. It is actually the corpus Cristina developed in her thesis; the first corpus we ever installed using CWB. This was before we started using the CQPWeb interface.
The fragments of this corpus which I saw many years ago did have some (minimal) XML annotation:
> STRUCTURE doc
> STRUCTURE doc_author # [annotations]
> STRUCTURE doc_century # [annotations]
> STRUCTURE doc_id # [annotations]
> STRUCTURE doc_collection # [annotations]
(but perhaps this was only in the subcorpora she had created for our joint work back then).
If your version is similar, you'd want to export it with something like
cwb-decode -C CORPUS -P word -P pos [...] -S doc+id+century+collection+author
Best,
Steph
More information about the CWB
mailing list