[CWB] Regular expressions with word groups

Maarten Janssen maartenpt at gmail.com
Tue Jul 28 22:41:22 CEST 2020


I did not look in detail at the implementation in CWB - but if these were normal regular expressions, your query 

[(word="f[ei]rid.*")|(word="muert[ao].*")] !(([(pos="S.*")   
      &
(word="d.*")][word=".*el"][word="ca[buv]allo.*"])|[word="entierra"]|([word="en"][word="tierra"]))

should match 

cayo *muerto* en tierra

Namely - “muerto” for the first part of the query, and nothing for the second - there is no indication of how long the second part should be - add a word requirement after it and it even becomes unwelldefined what you would mean by it; it would be different if you were looking for a specific word after it that cannot be one of several, like [!(word=“en ?tierra”  | word=“ca[buv]allo")] - but your second part has a variable word length. What you are looking for is a negative look-ahead, which you cannot do by negating the parts of what you are looking for - and given how query matches work in CWB I would be very surprised if there is a negative look-ahead...


More information about the CWB mailing list