Something very strange is going on. I've replaced my index for this corpus with a third backup copy, and the following happened:<div><br></div><div><div>PERS-DIVER-USENET> "jai"</div><div>0 matches. </div>
<div>PERS-DIVER-USENET> ".+ai"</div><div>Segmentation fault (core dumped) </div><div><br></div><div>Here the search for "jai", which previously caused a segfault, worked. So all seemed good. But the search returned 0 hits, instead of the 1 which is returned by the command <span style="background-color:rgb(255,255,255);color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.727272033691406px">cwb-lexdecode -f -p '.ai' PERS-DIVER-USENET. So something isn't adding up here.</span></div>
<div><span style="background-color:rgb(255,255,255);color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.727272033691406px"><br></span></div><div><span style="background-color:rgb(255,255,255);color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.727272033691406px">Immediately after this, I ran the folloiwng</span></div>
<div> </div><div>$ cqp -eC</div><div>Welcome to CQP -- the Colourful Query Processor .</div><div>[30: NBUS 31: NBUS 32: NBUS 33: NBUS 34: NBUS 35: NBUS 36: NBUS 37: NBUS ]</div><div>[40: NBUS 41: NBUS 42: NBUS 43: NBUS 44: NBUS 45: NBUS 46: NBUS 47: NBUS ]</div>
<div>[no corpus]> PERS-DIVER-USENET </div><div>PERS-DIVER-USENET> "jai"</div><div>Segmentation fault (core dumped) </div><div>$ </div><div><br></div><div>Now when I try the same search for "jai" that worked a minute ago, it segfaults.</div>
<div><br></div><div>I suspect the next step is to rebuild the index from scratch, but that involves decompressing a ZIP file with 1.2 million files inside it, which I'd rather avoid if at all possible.</div><div><br>
</div>
<div>Cheers,</div><div>Scott</div><div><br></div><div><br></div><br><div class="gmail_quote">On Tue, Jul 23, 2013 at 11:11 AM, Stefan Evert <span dir="ltr"><<a href="mailto:stefanML@collocations.de" target="_blank">stefanML@collocations.de</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Thanks.<br>
<br>
Now, could you please try the following queries in CQP on this corpus?<br>
<br>
[word = "Tai"]<br>
[word = "dai"]<br>
[word = "tai"]<br>
etc.<br>
<br>
If some of them crash, but others don't, your index is probably damaged. Otherwise, we'll have to dig deeper. How did you compile and install CWB?<br>
<br>
Cheers,<br>
Stefan<br>
<div class="HOEnZb"><div class="h5"><br>
<br>
On 23 Jul 2013, at 14:57, Scott Sadowsky <<a href="mailto:ssadowsky@gmail.com">ssadowsky@gmail.com</a>> wrote:<br>
<br>
> That worked just fine. Here's the output:<br>
><br>
> $ cwb-lexdecode -f -p '.ai' PERS-DIVER-USENET<br>
> 165 Tai<br>
> 57 dai<br>
> 357 tai<br>
> 7 Mai<br>
> 3 Kai<br>
> 357 vai<br>
> 23 Vai<br>
> 6 rai<br>
> 81 cai<br>
> 13 sai<br>
> 4 Dai<br>
> 1 uai<br>
> 32 hai<br>
> 1 Jai<br>
> 23 mai<br>
> 7 fai<br>
> 2 lai<br>
> 2 Wai<br>
> 2 wai<br>
> 6 nai<br>
> 2 pai<br>
> 5 Sai<br>
> 4 bai<br>
> 4 Cai<br>
> 5 kai<br>
> 4 gai<br>
> 1 Bai<br>
> 2 Rai<br>
> 1 Lai<br>
> 1 Fai<br>
> 1 jai<br>
> $<br>
><br>
<br>
</div></div></blockquote></div><br><br></div>