[CWB] How to search reduplication tokens with CQP Syntax?

Stephanie Evert stefanML at collocations.de
Sun Mar 30 16:54:04 CEST 2025


> This works for me:
> n1:[word="\b.*?\b"] n2:[word="\b.*?\b"]:: n1.word=n2.word

This is the correct (and only) approach in principle, but …

> But there might be better ways of doing it. 


… there are more elegant queries.  In particular, \b usually doesn't make sense in CQP queries because a single token isn't supposed to contain multiple words – so why search for a word boundary?  Simplified query:

	n1:[] n2:[] :: n1.word = n2.word

This is the most explicit and readable version of the query, I guess. On would think that checking the constraint directly in the query

	[] [word = match.word]

should be faster, but actually it's slower than the first query in my tests.

If you can somehow limit the relevant tokens, e.g. to a specific part of speech, you can probably speed up the query because fewer positions have to be tested. Something like

	[pos="JJ.*"] [word = match.word]

Best,
Stephanie




More information about the CWB mailing list