[CWB] CWB: problems with indexing a corpus
Mikhail Mikhailov (TAU)
mikhail.mikhailov at tuni.fi
Mon Feb 13 22:24:10 CET 2023
Hi again,
> That means the start tags of these XML elements contain attribute-value pairs, which you're ignoring – cwb-encode simply warns you about this fact.
I tried to run the command with attribute values: .... -xsBC9 -c utf8 -P pos -P lemma -S text:code+title -S p:id -S s:id
And the programme issued the same warnings.
>Are you sure there isn't any error message?
No message, it terminates silently.
> A first step would be to re-run cwb-encode with the -v option added (at the start, not after the attribute flags). This should print how many tokens have been read and encoded from the vrt file.
I ran the same command with the -v option.
Now it adds in the end:
Total size: 31747 tokens (0.0M)
Best,
Mikhail
________________________________
From: cwb-bounces at sslmit.unibo.it <cwb-bounces at sslmit.unibo.it> on behalf of Stephanie Evert <stefanML at collocations.de>
Sent: Monday, February 13, 2023 22:39
To: CWBdev Mailing List <cwb at sslmit.unibo.it>
Subject: Re: [CWB] CWB: problems with indexing a corpus
> I am trying to process a vrt file with cwb-encode.
> The file has pos tagging and I used the examples from CWB-manuals as a template.
>
> I run the command
> cwb-encode -f /path_to_file.vrt -d /path/datafiles -R /path/registry/corpus_name -9 -c utf8 -P pos -P lemma -S text -S p -S s
>
> and I am getting the following warnings:
> > Annotations of s-attribute <text> not stored (file /xxx.vrt, line #1, warning issued only once).
> > Annotations of s-attribute <p> not stored (file /xx.vrt, line #3, warning issued only once).
> > Annotations of s-attribute <s> not stored (file /xx.vrt, line #4, warning issued only once).
That means the start tags of these XML elements contain attribute-value pairs, which you're ignoring – cwb-encode simply warns you about this fact.
> And the programme terminates without producing any result.
That sounds like an error, though, and completely unrelated to the warnings. After successful completion of the command, your data directory /path/datafiles should be populated with index files.
Are you sure there isn't any error message?
A first step would be to re-run cwb-encode with the -v option added (at the start, not after the attribute flags). This should print how many tokens have been read and encoded from the vrt file.
Best,
Stephanie
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
http://liste.sslmit.unibo.it/mailman/listinfo/cwb
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20230213/d65e3f33/attachment-0001.html>
More information about the CWB
mailing list