[CWB] CQPweb index files

Stephanie Evert stefanML at collocations.de
Mon Mar 11 19:43:51 CET 2024


Hi Simon,

you need to keep both corpora: <corpus> is the actual corpus that you've pre-indexed with CWB, and <corpus>__freq is a database with per-text frequency information. CQPweb stores the data as a CWB corpus because at the time when CQPweb was developed, MySQL was so horribly slow in aggregating large frequency tables that we got a lot better performance by abusing CWB as a sort-of relational database.

Best,
Steph

> On 11 Mar 2024, at 17:08, Simon Meier-Vieracker <simon.meier-vieracker at tu-dresden.de> wrote:
> 
> Hi,
> 
> just to be sure about the index files on CQPweb:
> 
> Our usual workflow is to import the corpus to CWB with cwb-encode and then "install a corpus you have already indexed in CWB“. As I understand it, CQPweb then creates a new folder „corpus__freq“ where the index files which CQPweb needs are created.
> 
> Since we are running out of disk space on our server: Do we still need the normal CWB index files after having the corpus installed in CQPweb? 
> 
> Thanks in advance
> Simon_______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb



More information about the CWB mailing list