<br><br><div class="gmail_quote">2012/10/11 Stefan Evert <span dir="ltr"><<a href="mailto:stefanML@collocations.de" target="_blank">stefanML@collocations.de</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi Eva,<br>
<br>
Can you tell us exactly what operating system and version you're using, and how you have obtained and installed CWB? If you're using a pre-compiled binary, please tell us which version you've downloaded.<br>
<br></blockquote><div> </div><div>I'm using a server: Debian GNU/Linux 6.0<br>We downloaded the beta version (cwb-3.4.1 )<br>I have created several corpus in several languages, and I never got this problem.<br> <br> </div>
<blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
We also need to know exactly which commands you entered to index and compress the corpus, plus the output from each of these commands. Perhaps this will allow us to make a guess at the source of the error.<br>
<br></blockquote><div><br>the command I use is:<br> cat $SOURCEFILE | /usr/local/cwb-3.4.1/bin/cwb-encode -c utf8 -d $DATADIR -R $REGDIR/$CORPUSNAME -xsB -P lema -P pos -V s -S doc:0+type+title -S not:0+text<br><br>This is the output (after correcting the errors you mentioned):<br>
<br>=== Makeall: processing corpus LATIN ===<br>Registry directory: /B_NFS_P/resources/corpora/written/registry/<br>ATTRIBUTE word<br> - lexicon OK<br> - frequencies OK<br> - token stream OK (COMPRESSED)<br> - index OK (COMPRESSED)<br>
ATTRIBUTE lema<br> - lexicon OK<br> - frequencies OK<br> - token stream OK (COMPRESSED)<br> - index OK (COMPRESSED)<br>ATTRIBUTE pos<br> - lexicon OK<br> - frequencies OK<br> - token stream OK (COMPRESSED)<br>
- index OK (COMPRESSED)<br>========================================<br>COMPRESSING TOKEN STREAM of LATIN.word<br>- writing code descriptor block to /B_NFS_P/resources/corpora/written/data/latin/word.hcd<br>- writing compressed item sequence to /B_NFS_P/resources/corpora/written/data/latin/word.huf<br>
- writing sync (every 128 tokens) to /B_NFS_P/resources/corpora/written/data/latin/word.huf.syn<br>VALIDATING LATIN.word<br>- reading code descriptor block from /B_NFS_P/resources/corpora/written/data/latin/word.hcd<br>- reading compressed item sequence from /B_NFS_P/resources/corpora/written/data/latin/word.huf<br>
- reading sync (mod 128) from /B_NFS_P/resources/corpora/written/data/latin/word.huf.syn<br>!! You can delete the file </B_NFS_P/resources/corpora/written/data/latin/word.corpus> now.<br>COMPRESSING TOKEN STREAM of LATIN.lema<br>
Error: Huffman codes too long (33 bits, current maximum is 31 bits).<br> Please contact the CWB development team for assistance.<br>COMPRESSING INDEX of LATIN.word<br>- writing compressed index to /B_NFS_P/resources/corpora/written/data/latin/word.crc<br>
- writing compressed index offsets to /B_NFS_P/resources/corpora/written/data/latin/word.crx<br>VALIDATING LATIN.word<br>- reading compressed index from /B_NFS_P/resources/corpora/written/data/latin/word.crc<br>- reading compressed index offsets from /B_NFS_P/resources/corpora/written/data/latin/word.crx<br>
!! You can delete the file </B_NFS_P/resources/corpora/written/data/latin/word.corpus.rev> now.<br>!! You can delete the file </B_NFS_P/resources/corpora/written/data/latin/word.corpus.rdx> now.<br>COMPRESSING INDEX of LATIN.lema<br>
- writing compressed index to /B_NFS_P/resources/corpora/written/data/latin/lema.crc<br>- writing compressed index offsets to /B_NFS_P/resources/corpora/written/data/latin/lema.crx<br>CL: index is out of range: (aborting) token frequency == 0<br>
<br>=== Makeall: processing corpus LATIN ===<br>Registry directory: /B_NFS_P/resources/corpora/written/registry/<br>ATTRIBUTE word<br> - lexicon OK<br> - frequencies OK<br> - token stream OK (COMPRESSED)<br> - index OK (COMPRESSED)<br>
ATTRIBUTE lema<br> - lexicon OK<br> - frequencies OK<br> - token stream OK (COMPRESSED)<br> - index OK (COMPRESSED)<br>ATTRIBUTE pos<br> - lexicon OK<br> - frequencies OK<br> - token stream OK (COMPRESSED)<br>
- index OK (COMPRESSED)<br>========================================<br><br>Thanks <br><br>Eva<br><br></div></div>