On Wed, Jul 24, 2013 at 2:43 AM, Stefan Evert <span dir="ltr">&lt;<a href="mailto:stefanML@collocations.de" target="_blank">stefanML@collocations.de</a>&gt;</span> wrote:<div><br></div><div>Dear Stefan,</div><div><br></div>


<div>Thanks so much for your help. The following seems to have fixed the problem:</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">


If you have &quot;cwb-make&quot; from the CWB/Perl modules, you can simply trash the &quot;.crc&quot; and &quot;.crx&quot; files (which contain the actual lookup index that appears to be damaged) and rebuild them with<br>


        cwb-make [...] PERS-DIVER-USENET</blockquote><div><br></div><div>More testing will be needed to be sure, of course.</div><div><br></div><div>Best wishes,</div><div>Scott</div><div><br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div class="im"><br>

On 24 Jul 2013, at 04:30, Scott Sadowsky &lt;<a href="mailto:ssadowsky@gmail.com">ssadowsky@gmail.com</a>&gt; wrote:<br>

<br>

&gt; Something very strange is going on. I&#39;ve replaced my index for this corpus with a third backup copy, and the following happened:<br>

&gt;<br>

&gt; PERS-DIVER-USENET&gt; &quot;jai&quot;<br>

&gt; 0 matches.<br>

&gt; PERS-DIVER-USENET&gt; &quot;.+ai&quot;<br>

&gt; Segmentation fault (core dumped)<br>

&gt; Here the search for &quot;jai&quot;, which previously caused a segfault, worked. So all seemed good. But the search returned 0 hits, instead of the 1 which is returned by the command cwb-lexdecode -f -p &#39;.ai&#39; PERS-DIVER-USENET. So something isn&#39;t adding up here.<br>


<br>

</div>If this is indeed a buffer overflow or so triggered by a faulty index file, it is not surprising that there&#39;s somewhat erratic behaviour.<br>

<div class="im"><br>

&gt; I suspect the next step is to rebuild the index from scratch, but that involves decompressing a ZIP file with 1.2 million files inside it, which I&#39;d rather avoid if at all possible.<br>

<br>

</div><br>

<br>

Of course, make sure you have a backup copy of the corpus beforehand.<br>

<br>

You should also be able to rebuild the index files manually with &quot;cwb-makeall&quot; and &quot;cwb-compress-rdx&quot;, but those tools sometimes get confused about which files need to be rebuilt in which order.<br>

<br>

<br>

If you need to try re-encoding from scratch, an easier solution is<br>

<br>

        cwb-decode -Cx PERS-DIVER-USENET -ALL | cwb-encode -x [...] &lt;appropriate declarations&gt;<br>

<br>

Note that the attribute declarations in the cwb-encode command will be different from the ones you used for the original encoding, because attributes on XML regions are not decoded in proper XML notation.<br>

<br>

<br>

Hope that one of these steps helps!<br>

<span class="HOEnZb"><font color="#888888">Stefan<br>

<br>

</font></span></blockquote></div><br><br></div>