[CWB] Regular expressions with word groups
Josep M. Fontana
josepm.fontana at upf.edu
Tue Jul 28 21:22:40 CEST 2020
Hi Andrés,
Yes, you understand correctly. In this search that I gave as an example:
[(word="f[ei]rid.*")|(word="muert[ao].*")] !(([(pos="S.*") &
(word="d.*")][word=".*el"][word="ca[buv]allo.*"])|[word="entierra"]|([word="en"][word="tierra"]))
I want an expression matching any of the patterns within the first pair
of square brackers '[ ]' followed by any words or word groups that are
NOT either "del cauallo" (or any of its variations), "entierra" or "en
tierra".
If I place the ! before the parenthesis enclosing the rest of word
groups, I still get "muerto en tierra". If I place it before the
parenthesis including the word group "en tierra" (like this ➝
!([word="en"][word="tierra"]) I still get "muerto en tierra".
I assume the example that appears in the manual (i.e. [(lemma="go") &
!(word="went"%c | word="gone"%c)]; ) works but I'm wondering whether the
'!' operator only works when it appears within a group enclosed in '[ ]'.
The kind of regular expression I'm using cannot be all included within
'[ ]'. When I don't use the ! operator it works perfectly and I get
"muerto en tierra", "muerto entierra", "muerto del cavallo", etc. Why
can't I negate the whole thing by placing a single ! somewhere? I've
tried placing the ! after the first parenthesis and it doesn't work either.
JM
> Maybe I'm wrong, but I understand that you want nothing from the ! on....
>
> check this regex from the manual:
>
> [(lemma="go") & !(word="went"%c | word="gone"%c)];
>
>
> I hope it is what you're looking for, if so, it is just a thing of
> notation....
>
>
>
> El Mar, 28 de Julio de 2020, 19:56, Josep M. Fontana escribió:
>
> Thanks Andrews and/or Andreses for your quick responses,
>
> I have had problems with both of our suggestions. In principle I would
> like to use a single ! operator for the whole regular expression
> pattern rather than having to add it to every relevant subpattern.
>
> But wherever I place the '!' i don't seem to get the desired results.
> So, if I do the following:
>
> [(word="f[ei]rid.*")|(word="muert[ao].*")] !(([(pos="S.*") &
> (word="d.*")][word=".*el"][word="ca[buv]allo.*"])|[word="entierra"]|([word="en"][word="tierra"]))
>
> one of the first hits I get is:
>
> cayo*muerto*en tierra
>
> This should be out because the last of the word groups that I have is
> ([word="en"][word="tierra"]). Shouldn't the ! operator have scope over
> the last group as well?
>
> Josep M.
>
>
>> Hey Josep,
>>
>> I'm fairly sure you just use ! if you're not testing a specific
>> annotation. Place ! before the left parenthesis of the group you're
>> trying to test. So if you're trying to test the negation of that
>> entire group, just add ! to the front of it. See 2.6 in the CQP Tutorial.
>>
>> All the best,
>>
>> Andrew
>>
>>
>>
>> On Tue, Jul 28, 2020 at 11:15 AM Josep M. Fontana
>> josepm.fontana at upf.edu> wrote:
>>
>> Hi,
>>
>> I don't know whether this is the right forum to ask this
>> particular kind
>> of question but I figure there are enough people here with
>> sufficient
>> experience to lend me a hand with this problem. If you cannot
>> answer the
>> question but you can point me to some other forum/group where I
>> can find
>> help, I would appreciate it.
>>
>> So, I have the following regular expression to identify a set of
>> expressions that can appear in a particular position in the text.
>> What I
>> would like to do is to create the negation of this regular
>> expression.
>> That is, any string/expression/group of expressions that does NOT
>> contain the expressions in these groups.
>>
>> I know how to use the != operator for a particular item (word, pos,
>> lemma) but where would one insert this operator to have scope
>> over the
>> whole group of expressions that match this pattern? Thanks in
>> advance.
>>
>> (([(pos="S.*") &
>> (word="d.*")][word=".*el"][word="ca[buv]allo.*"])|[word="entierra"]|([word="en"][word="tierra"]))
>>
>>
>> Josep M.
>>
>>
>> _______________________________________________
>> CWB mailing list
>> CWB at sslmit.unibo.it <mailto:CWB at sslmit.unibo.it>
>> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>>
>>
>> _______________________________________________ CWB mailing
>> listCWB at sslmit.unibo.it http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>
>
>
> _______________________
> andrés chandía
> chandia.net <http://www.chandia.net> <https://twitter.com/chandianet>
> Düngupeyem <http://chandia.net/content/dungupeyem> | IECMap
> <http://chandia.net/content/iecmap> | ISECMap
> <http://chandia.net/content/isecmap> | NMT
> <http://chandia.net/content/nmt> | Corlexim <http://corlexim.cl>
>
> Desarrollador de:
> Parles.upf <https://parles.upf.edu> | IWCH <https://iwch.upf.edu> |
> Amind terapia <http://amindterapia.com> | ONG Mapuche koyaktu
> <http://koyaktumapuche.net> | Nocando
> <https://parles.upf.edu/llocs/nocando> | IAC <https://iac.upf.edu> |
> CddZ <https://iac.upf.edu/cddz> | ISAC <https://iac.upf.edu/isac> |
> CatCg <http://catcg.upf.edu>
> P No imprima innecesariamente. ¡Cuide el medio ambiente!
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20200728/a336d356/attachment-0001.html>
More information about the CWB
mailing list