[CWB] Regular expressions with word groups
Maarten Janssen
maartenpt at gmail.com
Tue Jul 28 22:41:22 CEST 2020
I did not look in detail at the implementation in CWB - but if these were normal regular expressions, your query
[(word="f[ei]rid.*")|(word="muert[ao].*")] !(([(pos="S.*")
&
(word="d.*")][word=".*el"][word="ca[buv]allo.*"])|[word="entierra"]|([word="en"][word="tierra"]))
should match
cayo *muerto* en tierra
Namely - “muerto” for the first part of the query, and nothing for the second - there is no indication of how long the second part should be - add a word requirement after it and it even becomes unwelldefined what you would mean by it; it would be different if you were looking for a specific word after it that cannot be one of several, like [!(word=“en ?tierra” | word=“ca[buv]allo")] - but your second part has a variable word length. What you are looking for is a negative look-ahead, which you cannot do by negating the parts of what you are looking for - and given how query matches work in CWB I would be very surprised if there is a negative look-ahead...
More information about the CWB
mailing list