[CWB] Question about translations for sentences
Stefan Evert
stefanML at collocations.de
Tue Dec 8 15:48:07 CET 2020
In addition to what Andrew explained, you should also (when you can afford the time :) …
> Thanks a lot. However (maybe this is because I am using a version of cqp which is too old? 3.0.0)
1) Get a current version of CWB (3.4.27 at the moment). There are a lot of improvements and bug fixes that haven't been ported back to the old 3.0 branch.
You'll need to check CWB out from the SVN repository and compile from source, but that's not too difficult (internal note: I guess we should provide some instructions on the Web site). Unless you have Ubuntu 20.04 because the install script is broken there.
> The corpus is encoded with eg.
> <mwe lema=one=example=of lema pos=N>
2) Encode your XML tags as proper XML, i.e. with attribute values quoted:
<mwe lema="some noun" pos="N">
…
</mwe>
> and created with the flag -V mwe.
3) Encode with -S mwe:0+lema+pos
This will split out the annotations on <mwe> tags into separate attributes mwe_lema and mwe_pos; the ":0" checks that your open and close tags are properly balanced and will ignore any nested <mwe> regions (with warnings).
> However, when I query
> [ ] :: match.mwe="/.*/";
Then you can directly match lemma and pos
… :: match.mwe_lema=".+ness" & match.mwe_pos = "N";
Best,
Stefan
More information about the CWB
mailing list