[CWB] Regular expressions with word groups

Josep M. Fontana josepm.fontana at upf.edu
Tue Jul 28 21:22:40 CEST 2020


Hi Andrés,

Yes, you understand correctly. In this search that I gave as an example:

[(word="f[ei]rid.*")|(word="muert[ao].*")] !(([(pos="S.*") & 
(word="d.*")][word=".*el"][word="ca[buv]allo.*"])|[word="entierra"]|([word="en"][word="tierra"]))

I want an expression matching any of the patterns within the first pair 
of square brackers '[ ]' followed by any words or word groups that are 
NOT either "del cauallo" (or any of its variations), "entierra" or "en 
tierra".

If I place the ! before the parenthesis enclosing the rest of word 
groups, I still get "muerto en tierra". If I place it before the 
parenthesis including the word group "en tierra" (like this ➝ 
!([word="en"][word="tierra"])  I still get "muerto en tierra".

I assume the example that appears in the manual (i.e. [(lemma="go") & 
!(word="went"%c | word="gone"%c)]; ) works but I'm wondering whether the 
'!' operator only works when it appears within a group enclosed in '[ ]'.

The kind of regular expression I'm using cannot be all included within 
'[ ]'. When I don't use the ! operator it works perfectly and I get 
"muerto en tierra", "muerto entierra", "muerto del cavallo", etc. Why 
can't I negate the whole thing by placing a single ! somewhere? I've 
tried placing the ! after the first parenthesis and it doesn't work either.

JM


> Maybe I'm wrong, but I understand that you want nothing from the ! on....
>
> check this regex from the manual:
>
> [(lemma="go") & !(word="went"%c | word="gone"%c)];
>
>
> I hope it is what you're looking for, if so, it is just a thing of 
> notation....
>
>
>
> El Mar, 28 de Julio de 2020, 19:56, Josep M. Fontana escribió:
>
> Thanks Andrews and/or Andreses for your quick responses,
>
> I have had problems with both of our suggestions. In principle I would 
> like to use a single ! operator for the whole regular expression 
> pattern rather than having to add it to every relevant subpattern.
>
> But wherever I place the '!' i don't seem to get the desired results. 
> So, if I do the following:
>
> [(word="f[ei]rid.*")|(word="muert[ao].*")] !(([(pos="S.*") & 
> (word="d.*")][word=".*el"][word="ca[buv]allo.*"])|[word="entierra"]|([word="en"][word="tierra"]))
>
> one of the first hits I get is:
>
> cayo*muerto*en tierra
>
> This should be out because the last of the word groups that I have is 
> ([word="en"][word="tierra"]). Shouldn't the ! operator have scope over 
> the last group as well?
>
> Josep M.
>
>
>> Hey Josep,
>>
>> I'm fairly sure you just use ! if you're not testing a specific 
>> annotation. Place ! before the left parenthesis of the group you're 
>> trying to test. So if you're trying to test the negation of that 
>> entire group, just add ! to the front of it. See 2.6 in the CQP Tutorial.
>>
>> All the best,
>>
>> Andrew
>>
>>
>>
>> On Tue, Jul 28, 2020 at 11:15 AM Josep M. Fontana 
>> josepm.fontana at upf.edu> wrote:
>>
>>     Hi,
>>
>>     I don't know whether this is the right forum to ask this
>>     particular kind
>>     of question but I figure there are enough people here with
>>     sufficient
>>     experience to lend me a hand with this problem. If you cannot
>>     answer the
>>     question but you can point me to some other forum/group where I
>>     can find
>>     help, I would appreciate it.
>>
>>     So, I have the following regular expression to identify a set of
>>     expressions that can appear in a particular position in the text.
>>     What I
>>     would like to do is to create the negation of this regular
>>     expression.
>>     That is, any string/expression/group of expressions that does NOT
>>     contain the expressions in these groups.
>>
>>     I know how to use the != operator for a particular item (word, pos,
>>     lemma) but where would one insert this operator to have scope
>>     over the
>>     whole group of expressions that match this pattern? Thanks in
>>     advance.
>>
>>     (([(pos="S.*") &
>>     (word="d.*")][word=".*el"][word="ca[buv]allo.*"])|[word="entierra"]|([word="en"][word="tierra"]))
>>
>>
>>     Josep M.
>>
>>
>>     _______________________________________________
>>     CWB mailing list
>>     CWB at sslmit.unibo.it <mailto:CWB at sslmit.unibo.it>
>>     http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>>
>>
>> _______________________________________________ CWB mailing
>> listCWB at sslmit.unibo.it  http://liste.sslmit.unibo.it/mailman/listinfo/cwb  
>
>
>
> _______________________
>             andrés chandía
> chandia.net <http://www.chandia.net> <https://twitter.com/chandianet>
> Düngupeyem <http://chandia.net/content/dungupeyem> | IECMap 
> <http://chandia.net/content/iecmap> | ISECMap 
> <http://chandia.net/content/isecmap> | NMT 
> <http://chandia.net/content/nmt> | Corlexim <http://corlexim.cl>
>
> Desarrollador de:
> Parles.upf <https://parles.upf.edu> | IWCH <https://iwch.upf.edu> | 
> Amind terapia <http://amindterapia.com> | ONG Mapuche koyaktu 
> <http://koyaktumapuche.net> | Nocando 
> <https://parles.upf.edu/llocs/nocando> | IAC <https://iac.upf.edu> | 
> CddZ <https://iac.upf.edu/cddz> | ISAC <https://iac.upf.edu/isac> | 
> CatCg <http://catcg.upf.edu>
> P No imprima innecesariamente. ¡Cuide el medio ambiente!
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20200728/a336d356/attachment-0001.html>


More information about the CWB mailing list