[CWB] How to search reduplication tokens with CQP Syntax?
Stephanie Evert
stefanML at collocations.de
Sun Mar 30 16:54:04 CEST 2025
> This works for me:
> n1:[word="\b.*?\b"] n2:[word="\b.*?\b"]:: n1.word=n2.word
This is the correct (and only) approach in principle, but …
> But there might be better ways of doing it.
… there are more elegant queries. In particular, \b usually doesn't make sense in CQP queries because a single token isn't supposed to contain multiple words – so why search for a word boundary? Simplified query:
n1:[] n2:[] :: n1.word = n2.word
This is the most explicit and readable version of the query, I guess. On would think that checking the constraint directly in the query
[] [word = match.word]
should be faster, but actually it's slower than the first query in my tests.
If you can somehow limit the relevant tokens, e.g. to a specific part of speech, you can probably speed up the query because fewer positions have to be tested. Something like
[pos="JJ.*"] [word = match.word]
Best,
Stephanie
More information about the CWB
mailing list