[CWB] File format of encoded cwb corpora
Serge Heiden
slh at ens-lyon.fr
Fri Jul 13 16:29:15 CEST 2012
For the various index files of CQP, to start I would recommend:
IMS Corpus Workbench "CQP Corpus Administrator’s Manual",
Oliver Christ, Universität Stuttgart, Institut für maschinelle Sprache, 1994
(p. 14 for a partial overview of index architecture)
A copy of which is here:
http://txm.sourceforge.net/doc/cwb/technical-manual.pdf
--slh
le 13/07/2012 16:08 Selon Hardie, Andrew:
> -----Original Message-----
> From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Stefan Evert
>
>>>>> There's no formal specification of the precise file format
> Arguably there should be, however, especially if we need to change it and thus have to deal with format versioning. Moreover, having obtained (and read) a copy of the "Managing Gigabytes" book, I personally don't think the book alone alone adequately documents the technical details of the binary format: for a full understanding of how CWB does it, the book has to be read alongside the indexing code.
>
> Yet another thing for the TODO list!
>
> Andrew.
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb
--
Dr. Serge Heiden, slh at ens-lyon.fr, http://textometrie.ens-lyon.fr
ENS de Lyon/CNRS - ICAR UMR5191, Institut de Linguistique Française
15, parvis René Descartes 69342 Lyon BP7000 Cedex, tél. +33(0)622003883
More information about the CWB
mailing list