[CWB] Restrictions on lemma annotation

Hardie, Andrew a.hardie at lancaster.ac.uk
Mon Jun 2 10:57:10 CEST 2025


Hi Graham

This isn’t a restriction on the lemma format. It’s simply that CQP doesn’t, by default, understand things like | as meaning an alternative in its input data.

Thus, what gets indexed is the string “eau|eaux” – so that’s what you have to search for.

In CQL

[pos="eau\|eaux"]

Note that the pipe has to be escaped because you are searching for the pipe, not separating queriable alternatives.

In CEQL

{eau\|eaux}

Escape is for the same reason. Or, more concisely for this specific example:

[pos="eaux?"]

{eau[x,]}

(or else just use a bunch of * at the start and end of every lemma query, though that will probably lose you precision in the query results)

HOWEVER, there is a way to get the lemma field to behave like I think you expect it to (though you would need to recode to add leading and trailing pipes to each lemma value), which is to create the p-attribute as a feature set. See encoding manual Sec 6, and CQP manual Sec 6.6. Note that the special CQP functions for feature sets aren’t accessible via CEQL.

Hope that helps

Best

Andrew.


From: cwb-bounces at sslmit.unibo.it <cwb-bounces at sslmit.unibo.it> On Behalf Of Graham Ranger -- UAPV
Sent: 31 May 2025 10:43
To: Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it>
Subject: [CWB] Restrictions on lemma annotation

Hello,
In a corpus I'm setting up, using treetagger with a parameter file for classical French, there are a number of alternative lemmata, i.e. things like:
eau    Nc    eau|eaux [Nc: common noun]
I'm not entirely sure why, since there is no ambiguity here, but as a result it is impossible to search for the lemma "eau".
Are there any solutions to other than simply opting to remove the pipe and what comes after it from column three of the vrt file to allow querying only for the first choice of lemma?
Many thanks in advance.
Graham.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20250602/15663f4f/attachment.html>


More information about the CWB mailing list