Hi Stefan,<br>I got confused about the version between the server and my computer.<br>

I was using an old Scipt that called cwb-makeall, cwb-huffcode, 

cwb-compress-rdx (and the last cwb-makeall that I think it was there to 

check that everything was ok.)<br>

I changed to cwb-make and now it works. So the error must have been related to old files as you pointed out.<br><br>Many thanks for your help<br><br>Eva Bofias<br><br><div class="gmail_quote">2012/10/12 Stefan Evert <span dir="ltr">&lt;<a href="mailto:stefanML@collocations.de" target="_blank">stefanML@collocations.de</a>&gt;</span><br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im"><br>

On 11 Oct 2012, at 17:43, BOFÍAS ALBERCH, EVA wrote:<br>

<br>

&gt; I&#39;m using a server: Debian GNU/Linux 6.0<br>

&gt; We downloaded the beta version (cwb-3.4.1 )<br>

<br>

</div>In a previous mail you stated that you&#39;re using CWB 3.0.2 -- is it possible that you&#39;ve mixed up two different versions? However, file formats should be fully compatible between 3.0.x and 3.4.x, so this is unlikely to be the cause of your problems.<br>


<div class="im"><br></div></blockquote><div><br><br> </div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="im">

&gt; We also need to know exactly which commands you entered to index and compress the corpus, plus the output from each of these commands.  Perhaps this will allow us to make a guess at the source of the error.<br>

&gt;<br>

&gt; the command I use is:<br>

&gt;  cat $SOURCEFILE | /usr/local/cwb-3.4.1/bin/cwb-encode -c utf8 -d $DATADIR -R $REGDIR/$CORPUSNAME -xsB -P lema -P pos -V s  -S doc:0+type+title -S not:0+text<br>

<br>

</div>That can&#39;t be all you&#39;re doing.<br>

<br>

For one thing, you need to define the shell variables SOURCEFILE, DATADIR, etc. for this command to do anything sensible.<br>

<br>

More importantly, this command only runs cwb-encode, which is the first step of the indexing process.  You still need to run cwb-makeall (to build the actual index structures) and cwb-huffcode and cwb-compress-rdx (to compress the index files, which is where your error occurs).<br>


<br>

The output you sent us (as shown below) stems from these programs, so you must be running those additional commands in some way!<br>

<br>

There are two strange things about the output:<br>

<br>

1) You seem to run cwb-makeall twice, once before compressing and once after.  There&#39;s no need to run cwb-makeall a second time -- why do you do that?<br>

<br>

2) The output from the first cwb-makeall run indicates that the index structures have already been created _and_ compressed (it just says &quot;OK&quot; rather than &quot;creating ...&quot;).  Those might be stale, damaged files from a previous encoding run.   Did you forget to clean the data directory /B_NFS_P/resources/corpora/written/data/latin/ before re-running cwb-encode?  It&#39;s quite possible that your error is due  to damaged index files still lying around ...<br>


<br>

By the way, this is a good reason why you should use cwb-make from the CWB/Perl modules rather than calling cwb-makeall etc. directly.  cwb-make would recognise that they index files are out of date and automatically delete and rebuild them.<br>


<br>

Best,<br>

Stefan<br>

<div class="HOEnZb"><div class="h5"><br>

<br>

&gt;<br>

&gt; This is the output (after correcting the errors you mentioned):<br>

&gt;<br>

&gt; === Makeall: processing corpus LATIN ===<br>

&gt; Registry directory: /B_NFS_P/resources/corpora/written/registry/<br>

&gt; ATTRIBUTE word<br>

&gt;  - lexicon      OK<br>

&gt;  - frequencies  OK<br>

&gt;  - token stream OK (COMPRESSED)<br>

&gt;  - index        OK (COMPRESSED)<br>

&gt; ATTRIBUTE lema<br>

&gt;  - lexicon      OK<br>

&gt;  - frequencies  OK<br>

&gt;  - token stream OK (COMPRESSED)<br>

&gt;  - index        OK (COMPRESSED)<br>

&gt; ATTRIBUTE pos<br>

&gt;  - lexicon      OK<br>

&gt;  - frequencies  OK<br>

&gt;  - token stream OK (COMPRESSED)<br>

&gt;  - index        OK (COMPRESSED)<br>

&gt; ========================================<br>

&gt; COMPRESSING TOKEN STREAM of LATIN.word<br>

&gt; - writing code descriptor block to /B_NFS_P/resources/corpora/written/data/latin/word.hcd<br>

&gt; - writing compressed item sequence to /B_NFS_P/resources/corpora/written/data/latin/word.huf<br>

&gt; - writing sync (every 128 tokens) to /B_NFS_P/resources/corpora/written/data/latin/word.huf.syn<br>

&gt; VALIDATING LATIN.word<br>

&gt; - reading code descriptor block from /B_NFS_P/resources/corpora/written/data/latin/word.hcd<br>

&gt; - reading compressed item sequence from /B_NFS_P/resources/corpora/written/data/latin/word.huf<br>

&gt; - reading sync (mod 128) from /B_NFS_P/resources/corpora/written/data/latin/word.huf.syn<br>

&gt; !! You can delete the file &lt;/B_NFS_P/resources/corpora/written/data/latin/word.corpus&gt; now.<br>

&gt; COMPRESSING TOKEN STREAM of LATIN.lema<br>

&gt; Error: Huffman codes too long (33 bits, current maximum is 31 bits).<br>

&gt;        Please contact the CWB development team for assistance.<br>

&gt; COMPRESSING INDEX of LATIN.word<br>

&gt; - writing compressed index to /B_NFS_P/resources/corpora/written/data/latin/word.crc<br>

&gt; - writing compressed index offsets to /B_NFS_P/resources/corpora/written/data/latin/word.crx<br>

&gt; VALIDATING LATIN.word<br>

&gt; - reading compressed index from /B_NFS_P/resources/corpora/written/data/latin/word.crc<br>

&gt; - reading compressed index offsets from /B_NFS_P/resources/corpora/written/data/latin/word.crx<br>

&gt; !! You can delete the file &lt;/B_NFS_P/resources/corpora/written/data/latin/word.corpus.rev&gt; now.<br>

&gt; !! You can delete the file &lt;/B_NFS_P/resources/corpora/written/data/latin/word.corpus.rdx&gt; now.<br>

&gt; COMPRESSING INDEX of LATIN.lema<br>

&gt; - writing compressed index to /B_NFS_P/resources/corpora/written/data/latin/lema.crc<br>

&gt; - writing compressed index offsets to /B_NFS_P/resources/corpora/written/data/latin/lema.crx<br>

&gt; CL: index is out of range: (aborting) token frequency == 0<br>

&gt;<br>

&gt; === Makeall: processing corpus LATIN ===<br>

&gt; Registry directory: /B_NFS_P/resources/corpora/written/registry/<br>

&gt; ATTRIBUTE word<br>

&gt;  - lexicon      OK<br>

&gt;  - frequencies  OK<br>

&gt;  - token stream OK (COMPRESSED)<br>

&gt;  - index        OK (COMPRESSED)<br>

&gt; ATTRIBUTE lema<br>

&gt;  - lexicon      OK<br>

&gt;  - frequencies  OK<br>

&gt;  - token stream OK (COMPRESSED)<br>

&gt;  - index        OK (COMPRESSED)<br>

&gt; ATTRIBUTE pos<br>

&gt;  - lexicon      OK<br>

&gt;  - frequencies  OK<br>

&gt;  - token stream OK (COMPRESSED)<br>

&gt;  - index        OK (COMPRESSED)<br>

&gt; ========================================<br>

&gt;<br>

&gt; Thanks<br>

&gt;<br>

&gt; Eva<br>

&gt;<br>

<br>

</div></div></blockquote></div><br>