[CWB] Regular expressions with word groups

"Andrés Chandía" andres at chandia.net
Tue Jul 28 21:38:20 CEST 2020



I can see a difference from the example on the manual:


you have       !([word="en"][word="tierra"])
while following the example you should have        !(word="en"
word="tierra")


the [ ] square brackets enclosure all the expression not every part, at least at the
exemple...



try that!





El Mar, 28 de Julio de 2020, 21:22, Josep M. Fontana escribió:
 


Hi Andrés,

Yes, you understand correctly. In this search that I gave as an       example:

[(word="f[ei]rid.*")|(word="muert[ao].*")] !(([(pos="S.*")   
   &
(word="d.*")][word=".*el"][word="ca[buv]allo.*"])|[word="entierra"]|([word="en"][word="tierra"]))

I want an expression matching any of the patterns within the       first pair of square
brackers '[ ]' followed by any words or word       groups that are NOT either "del
cauallo" (or any of its       variations), "entierra" or "en tierra".

     

If I place the ! before the parenthesis enclosing the rest of       word groups, I still
get "muerto en tierra". If I place it before       the parenthesis including the
word group "en tierra" (like this ➝      
!([word="en"][word="tierra"])  I still get "muerto en
tierra". 
     

I assume the example that appears in the manual (i.e.       [(lemma="go") &
!(word="went"%c | word="gone"%c)]; ) works but       I'm wondering whether
the '!' operator only works when it appears       within a group enclosed in '[ ]'.

The kind of regular expression I'm using cannot be all included       within '[ ]'. When I
don't use the ! operator it works perfectly       and I get "muerto en tierra",
"muerto entierra", "muerto del       cavallo", etc. Why can't I negate the
whole thing by placing a       single ! somewhere? I've tried placing the ! after the first   
   parenthesis and it doesn't work either.

JM
     


     

Maybe I'm wrong, but I understand that you want nothing from         the ! on....

       
check this regex from the manual:

       
 [(lemma="go") &
          !(word="went"%c | word="gone"%c)];

       

       
I hope it is what you're looking for, if so, it is just a         thing of
notation....
       

       

       

       
El Mar, 28 de Julio de 2020, 19:56, Josep M. Fontana escribió:



Thanks Andrews and/or Andreses for your quick responses,
         

I have had problems with both of our suggestions. In           principle I would like to
use a           single ! operator for the whole regular expression pattern           rather
than having to add it           to every relevant subpattern. 
         

But wherever I place the '!' i don't seem to get the desired           results. So, if I do
the           following:

[(word="f[ei]rid.*")|(word="muert[ao].*")] !(([(pos="S.*")   
       &
(word="d.*")][word=".*el"][word="ca[buv]allo.*"])|[word="entierra"]|([word="en"][word="tierra"]))

one of the first hits I get is:
         

 cayo  muerto  en tierra

This should be out because the last of the word groups that I           have is          
([word="en"][word="tierra"]). Shouldn't the ! operator have          
scope           over the last group as well?

Josep M.
         

         


Hey Josep,


           
I'm fairly sure you just use ! if you're not             testing a specific   
         annotation. Place ! before the left parenthesis of the group             you're
trying to test. So             if you're trying to test the negation of that entire group,    
        just add ! to the             front of it. See 2.6 in the CQP Tutorial.

           
All the best,

           
Andrew

           

           



On Tue, Jul 28, 2020 at                 11:15 AM Josep M.   
             Fontana josepm.fontana at upf.edu> wrote:
               
Hi,
                 
                 I
don't know whether this is                 the right forum to ask this particular kind 
 
               of question but I                 figure there are enough people here with
sufficient 
                 experience                 to lend me a hand with this
problem. If you cannot                 answer the 
                 question but you can
point me to some other forum/group                 where I can find 
                
help, I would appreciate it.
                 
                 So, I have the      
          following regular expression to identify a set of 
                 expressions
                that can appear in a particular position in the text.                 What I

                 would like to do is to create the negation of this                
regular expression. 
                 That is, any string/expression/group of expressions
that                 does NOT 
                 contain the expressions in these
groups.
                 
                 I know                 how to use the !=
operator for a particular item (word,                 pos, 
                 lemma) but
where would one insert this operator to have                 scope over the 
            
    whole group of expressions that match this pattern?                 Thanks in advance.
                 
                 (([(pos="S.*") & 

(word="d.*")][word=".*el"][word="ca[buv]allo.*"])|[word="entierra"]|([word="en"][word="tierra"]))
                 
                 
                 Josep M.
               
 
                 
                
_______________________________________________
                 CWB                
mailing list
                 CWB at sslmit.unibo.it
                 http://liste.sslmit.unibo.it/mailman/listinfo/cwb
              
             


           
_______________________________________________ CWB mailing
list CWB at sslmit.unibo.it http://liste.sslmit.unibo.it/mailman/listinfo/cwb 
       

       
       
       _______________________
      
            andrés      
chandía
        
       Düngupeyem | IECMap       | ISECMap | NMT | Corlexim
       
      
Desarrollador de:
       Parles.upf       | IWCH       | Amind        
terapia | ONG         Mapuche koyaktu |
Nocando | IAC | CddZ       | ISAC       | CatCg
       P No imprima innecesariamente. ¡Cuide el         medio
ambiente!       
       
_______________________________________________ CWB mailing
list CWB at sslmit.unibo.it http://liste.sslmit.unibo.it/mailman/listinfo/cwb 
   
 


_______________________

            andrés
chandía
 
Düngupeyem | IECMap | ISECMap | NMT | Corlexim

Desarrollador de:
Parles.upf | IWCH | Amind terapia | ONG
Mapuche koyaktu | Nocando | IAC | CddZ | ISAC | CatCg
P No imprima innecesariamente. ¡Cuide el
medio ambiente!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20200728/5cdabd0e/attachment.html>


More information about the CWB mailing list