[CWB] CWB: problems with indexing a corpus

Mikhail Mikhailov (TAU) mikhail.mikhailov at tuni.fi
Mon Feb 13 22:24:10 CET 2023


Hi again,

> That means the start tags of these XML elements contain attribute-value pairs, which you're ignoring – cwb-encode simply warns you about this fact.
I tried to run the command with attribute values: ....  -xsBC9 -c utf8 -P pos -P lemma -S text:code+title -S p:id -S s:id
And the programme issued the same warnings.

>Are you sure there isn't any error message?

No message, it terminates silently.

> A first step would be to re-run cwb-encode with the -v option added (at the start, not after the attribute flags). This should print how many tokens have been read and encoded from the vrt file.

I ran the same command with the -v option.
Now it adds in the end:
Total size: 31747 tokens (0.0M)

Best,
Mikhail


________________________________
From: cwb-bounces at sslmit.unibo.it <cwb-bounces at sslmit.unibo.it> on behalf of Stephanie Evert <stefanML at collocations.de>
Sent: Monday, February 13, 2023 22:39
To: CWBdev Mailing List <cwb at sslmit.unibo.it>
Subject: Re: [CWB] CWB: problems with indexing a corpus

> I am trying to process a vrt file with cwb-encode.
> The file has pos tagging and I used the examples from CWB-manuals as a template.
>
> I run the command
> cwb-encode -f /path_to_file.vrt -d /path/datafiles -R /path/registry/corpus_name -9 -c utf8 -P pos -P lemma -S text -S p -S s
>
> and I am getting the following warnings:
> > Annotations of s-attribute <text> not stored (file /xxx.vrt, line #1, warning issued only once).
> > Annotations of s-attribute <p> not stored (file /xx.vrt, line #3, warning issued only once).
> > Annotations of s-attribute <s> not stored (file /xx.vrt, line #4, warning issued only once).

That means the start tags of these XML elements contain attribute-value pairs, which you're ignoring – cwb-encode simply warns you about this fact.

> And the programme terminates without producing any result.

That sounds like an error, though, and completely unrelated to the warnings.  After successful completion of the command, your data directory /path/datafiles should be populated with index files.

Are you sure there isn't any error message?

A first step would be to re-run cwb-encode with the -v option added (at the start, not after the attribute flags). This should print how many tokens have been read and encoded from the vrt file.

Best,
Stephanie
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
http://liste.sslmit.unibo.it/mailman/listinfo/cwb
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20230213/d65e3f33/attachment-0001.html>


More information about the CWB mailing list