[CWB] Regular expressions with word groups

"Andrés Chandía" andres at chandia.net
Wed Jul 29 11:52:32 CEST 2020



No, I didn't suggest waht you say, I was just calling your attention to the difference
between your RegEx and that from the manual...


Manual

[(lemma="go")           &           !(word="went"%c           |  
        word="gone"%c)];
 


Yours
([word="en"][word="tierra"])


to match yours to the manual one, regex should be: !(word="en"
word="tierra")






El Mar, 28 de Julio de 2020, 23:42, Josep M. Fontana escribió:
> Thanks Maarten
and to everybody who responded.
> 
> Yes. What you say makes total sense. I
had assumed that since in essence
> the complex pattern involving grouped expressions
are all within a
> single parenthesis '( )' and that allows one to treat it as if it
were a
> single word within '[]', the ! operator would work the same way it works
> when it is associated to any expression inside square brackets.
> 
>
Andrés seems to suggest to enclose everything within square brackets but
> that
doesn't work. In principle it shouldn't word because the convention
> is that square
brackets enclose a word. So it makes sense that we can't
> do that. Once we allow to
use parenthesis to form groups of sequences of
> words that are treated essentially as
a single unit, however, why can't
> we use the same operators we use with single
expressions enclosed within
> '[ ]'? I don't see why it shouldn't be possible.
> 
> If no one has asked this before it must mean that there are not that
> many people who would need to do this kind of search and of course I
> have no
idea of how hard this might be to implement. Having said this,
> however, I certainly
think that this would be very useful. I find the
> idea of doing a diff as Andrew
suggests a bit unpractical.
> 
> JM
> 
> On 28/07/2020 22:41,
Maarten Janssen wrote:
>> I did not look in detail at the implementation in CWB -
but if these were normal regular
>> expressions, your query
>>
>> [(word="f[ei]rid.*")|(word="muert[ao].*")]
!(([(pos="S.*")
>>        &
>>
(word="d.*")][word=".*el"][word="ca[buv]allo.*"])|[word="entierra"]|([word="en"][word="tierra"]))
>>
>> should match
>>
>> cayo *muerto* en tierra
>>
>> Namely - “muerto” for the first part of the query, and
nothing for the second - there is no
>> indication of how long the second part
should be - add a word requirement after it and it
>> even becomes unwelldefined
what you would mean by it; it would be different if you were
>> looking for a
specific word after it that cannot be one of several, like [!(word=“en
>>
?tierra”  | word=“ca[buv]allo")] - but your second part has a variable word
length. What you
>> are looking for is a negative look-ahead, which you cannot do
by negating the parts of what
>> you are looking for - and given how query matches
work in CWB I would be very surprised if
>> there is a negative look-ahead...
>> _______________________________________________
>> CWB mailing list
>> CWB at sslmit.unibo.it
>>
http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>
_______________________________________________
> CWB mailing list
>
CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
> 



_______________________

            andrés
chandía
 
Düngupeyem | IECMap | ISECMap | NMT | Corlexim

Desarrollador de:
Parles.upf | IWCH | Amind terapia | ONG
Mapuche koyaktu | Nocando | IAC | CddZ | ISAC | CatCg
P No imprima innecesariamente. ¡Cuide el
medio ambiente!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20200729/ffd77122/attachment.html>


More information about the CWB mailing list