[CWB] Regular expressions with word groups

Hardie, Andrew a.hardie at lancaster.ac.uk
Tue Jul 28 22:01:21 CEST 2020


Practically speaking, probably the easiest way to do what you want is to do a diff between a query that contains (wanted+unwanted) and a query that contains just (unwanted).

best

Andrew.

From: cwb-bounces at sslmit.unibo.it <cwb-bounces at sslmit.unibo.it> On Behalf Of Josep M. Fontana
Sent: 28 July 2020 20:23
To: cwb at sslmit.unibo.it
Subject: Re: [CWB] Regular expressions with word groups


Hi Andrés,

Yes, you understand correctly. In this search that I gave as an example:

[(word="f[ei]rid.*")|(word="muert[ao].*")] !(([(pos="S.*") & (word="d.*")][word=".*el"][word="ca[buv]allo.*"])|[word="entierra"]|([word="en"][word="tierra"]))

I want an expression matching any of the patterns within the first pair of square brackers '[ ]' followed by any words or word groups that are NOT either "del cauallo" (or any of its variations), "entierra" or "en tierra".

If I place the ! before the parenthesis enclosing the rest of word groups, I still get "muerto en tierra". If I place it before the parenthesis including the word group "en tierra" (like this ➝ !([word="en"][word="tierra"])  I still get "muerto en tierra".

I assume the example that appears in the manual (i.e. [(lemma="go") & !(word="went"%c | word="gone"%c)]; ) works but I'm wondering whether the '!' operator only works when it appears within a group enclosed in '[ ]'.

The kind of regular expression I'm using cannot be all included within '[ ]'. When I don't use the ! operator it works perfectly and I get "muerto en tierra", "muerto entierra", "muerto del cavallo", etc. Why can't I negate the whole thing by placing a single ! somewhere? I've tried placing the ! after the first parenthesis and it doesn't work either.

JM


Maybe I'm wrong, but I understand that you want nothing from the ! on....

check this regex from the manual:

[(lemma="go") & !(word="went"%c | word="gone"%c)];


I hope it is what you're looking for, if so, it is just a thing of notation....



El Mar, 28 de Julio de 2020, 19:56, Josep M. Fontana escribió:


Thanks Andrews and/or Andreses for your quick responses,

I have had problems with both of our suggestions. In principle I would like to use a single ! operator for the whole regular expression pattern rather than having to add it to every relevant subpattern.

But wherever I place the '!' i don't seem to get the desired results. So, if I do the following:

[(word="f[ei]rid.*")|(word="muert[ao].*")] !(([(pos="S.*") & (word="d.*")][word=".*el"][word="ca[buv]allo.*"])|[word="entierra"]|([word="en"][word="tierra"]))

one of the first hits I get is:

 cayo  muerto  en tierra

This should be out because the last of the word groups that I have is ([word="en"][word="tierra"]). Shouldn't the ! operator have scope over the last group as well?

Josep M.

Hey Josep,

I'm fairly sure you just use ! if you're not testing a specific annotation. Place ! before the left parenthesis of the group you're trying to test. So if you're trying to test the negation of that entire group, just add ! to the front of it. See 2.6 in the CQP Tutorial.

All the best,

Andrew



On Tue, Jul 28, 2020 at 11:15 AM Josep M. Fontana josepm.fontana at upf.edu<mailto:josepm.fontana at upf.edu>> wrote:
Hi,

I don't know whether this is the right forum to ask this particular kind
of question but I figure there are enough people here with sufficient
experience to lend me a hand with this problem. If you cannot answer the
question but you can point me to some other forum/group where I can find
help, I would appreciate it.

So, I have the following regular expression to identify a set of
expressions that can appear in a particular position in the text. What I
would like to do is to create the negation of this regular expression.
That is, any string/expression/group of expressions that does NOT
contain the expressions in these groups.

I know how to use the != operator for a particular item (word, pos,
lemma) but where would one insert this operator to have scope over the
whole group of expressions that match this pattern? Thanks in advance.

(([(pos="S.*") &
(word="d.*")][word=".*el"][word="ca[buv]allo.*"])|[word="entierra"]|([word="en"][word="tierra"]))


Josep M.


_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it<mailto:CWB at sslmit.unibo.it>
http://liste.sslmit.unibo.it/mailman/listinfo/cwb<https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fliste.sslmit.unibo.it%2Fmailman%2Flistinfo%2Fcwb&data=02%7C01%7Ca.hardie%40lancaster.ac.uk%7Cb66b3258c5e14166979408d8332cad95%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C0%7C637315614284859494&sdata=mQhE7BXMCXN%2BgtPwFkMZEEv6iMKm2R6NvfJMZDSinZk%3D&reserved=0>



_______________________________________________ CWB mailing

list CWB at sslmit.unibo.it<mailto:CWB at sslmit.unibo.it> http://liste.sslmit.unibo.it/mailman/listinfo/cwb<https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fliste.sslmit.unibo.it%2Fmailman%2Flistinfo%2Fcwb&data=02%7C01%7Ca.hardie%40lancaster.ac.uk%7Cb66b3258c5e14166979408d8332cad95%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C0%7C637315614284859494&sdata=mQhE7BXMCXN%2BgtPwFkMZEEv6iMKm2R6NvfJMZDSinZk%3D&reserved=0>



_______________________
            andrés chandía
[chandia.net]<https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.chandia.net%2F&data=02%7C01%7Ca.hardie%40lancaster.ac.uk%7Cb66b3258c5e14166979408d8332cad95%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C0%7C637315614284869455&sdata=67icysSE%2Fr7%2BL%2FNaRgM43I3ptpngAZU4WTcBwiYEpCE%3D&reserved=0>[http://mail.chandia.net/images/ico_tw.png]<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Fchandianet&data=02%7C01%7Ca.hardie%40lancaster.ac.uk%7Cb66b3258c5e14166979408d8332cad95%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C0%7C637315614284869455&sdata=bVcfAkzeAQ1BYCkiXIs4bjJIvKnoF4fZsrKL2Q8oeHc%3D&reserved=0>
Düngupeyem<https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fchandia.net%2Fcontent%2Fdungupeyem&data=02%7C01%7Ca.hardie%40lancaster.ac.uk%7Cb66b3258c5e14166979408d8332cad95%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C1%7C637315614284879407&sdata=pPmc4uh1wSuhfH3MMD8Z98q9t0YysbkXJcDiG%2BmpfGo%3D&reserved=0> | IECMap<https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fchandia.net%2Fcontent%2Fiecmap&data=02%7C01%7Ca.hardie%40lancaster.ac.uk%7Cb66b3258c5e14166979408d8332cad95%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C1%7C637315614284879407&sdata=FLNfYFC7TvRZFkHlf6kGhs4jgQNeJlymr8XvfSk6Etw%3D&reserved=0> | ISECMap<https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fchandia.net%2Fcontent%2Fisecmap&data=02%7C01%7Ca.hardie%40lancaster.ac.uk%7Cb66b3258c5e14166979408d8332cad95%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C1%7C637315614284889373&sdata=RqNfMYarmEQFmT6E9RQj1SeGy1PUoukkXIdNYSD4jjo%3D&reserved=0> | NMT<https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fchandia.net%2Fcontent%2Fnmt&data=02%7C01%7Ca.hardie%40lancaster.ac.uk%7Cb66b3258c5e14166979408d8332cad95%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C1%7C637315614284889373&sdata=Ev%2BPc8F5DyF0%2FMDjQV9jf56OS43ktPClvryeh4ISNkw%3D&reserved=0> | Corlexim<https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcorlexim.cl%2F&data=02%7C01%7Ca.hardie%40lancaster.ac.uk%7Cb66b3258c5e14166979408d8332cad95%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C1%7C637315614284889373&sdata=LMm8DuSBzCy%2FG%2BuQxkKH5Z9QzJF04SGuW6WzlQZlT8Y%3D&reserved=0>

Desarrollador de:
Parles.upf<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fparles.upf.edu%2F&data=02%7C01%7Ca.hardie%40lancaster.ac.uk%7Cb66b3258c5e14166979408d8332cad95%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C1%7C637315614284899318&sdata=pcD%2BwCfjXZeLNBg9%2BXJJ8rZbczcmbKMskazBBN0Txzc%3D&reserved=0> | IWCH<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fiwch.upf.edu%2F&data=02%7C01%7Ca.hardie%40lancaster.ac.uk%7Cb66b3258c5e14166979408d8332cad95%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C1%7C637315614284899318&sdata=46LJa3oZYC%2F3FmiNVs%2FcfDkJnNCprPYzKehVb79HJaY%3D&reserved=0> | Amind terapia<https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Famindterapia.com%2F&data=02%7C01%7Ca.hardie%40lancaster.ac.uk%7Cb66b3258c5e14166979408d8332cad95%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C1%7C637315614284909278&sdata=WNBHwkKwDhedww5VA9WELtrZKt7N8XBn6tkNluCd79I%3D&reserved=0> | ONG Mapuche koyaktu<https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fkoyaktumapuche.net%2F&data=02%7C01%7Ca.hardie%40lancaster.ac.uk%7Cb66b3258c5e14166979408d8332cad95%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C1%7C637315614284909278&sdata=6z5ywJqqwtxv1gQptDOTaoykRjkBTWEgueJJyPc42jo%3D&reserved=0> | Nocando<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fparles.upf.edu%2Fllocs%2Fnocando&data=02%7C01%7Ca.hardie%40lancaster.ac.uk%7Cb66b3258c5e14166979408d8332cad95%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C0%7C637315614284919243&sdata=RWwzcdiX606bF%2F3YKRWV%2Fll9UHNhLLgxpinQWrTYSLc%3D&reserved=0> | IAC<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fiac.upf.edu%2F&data=02%7C01%7Ca.hardie%40lancaster.ac.uk%7Cb66b3258c5e14166979408d8332cad95%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C0%7C637315614284919243&sdata=bRyfdCnSxoXx%2BbOlM4gPtF9996YG9kWlortJqtXie%2Bw%3D&reserved=0> | CddZ<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fiac.upf.edu%2Fcddz&data=02%7C01%7Ca.hardie%40lancaster.ac.uk%7Cb66b3258c5e14166979408d8332cad95%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C0%7C637315614284919243&sdata=mvxI2oAueOBIi4hcYqmGjgXY3dZRycuB%2F1mCdfkI1sc%3D&reserved=0> | ISAC<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fiac.upf.edu%2Fisac&data=02%7C01%7Ca.hardie%40lancaster.ac.uk%7Cb66b3258c5e14166979408d8332cad95%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C0%7C637315614284929188&sdata=suyHMueHyEn9B3PiB0lVWIiOsI%2BhZfDNa8dEm2vyS8Y%3D&reserved=0> | CatCg<https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcatcg.upf.edu%2F&data=02%7C01%7Ca.hardie%40lancaster.ac.uk%7Cb66b3258c5e14166979408d8332cad95%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C1%7C637315614284929188&sdata=Efsm2JQfVXycUZx8U3H%2F0si%2FqXQtCtPWGCKRyJGshdM%3D&reserved=0>
P No imprima innecesariamente. ¡Cuide el medio ambiente!


_______________________________________________

CWB mailing list

CWB at sslmit.unibo.it<mailto:CWB at sslmit.unibo.it>

http://liste.sslmit.unibo.it/mailman/listinfo/cwb<https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fliste.sslmit.unibo.it%2Fmailman%2Flistinfo%2Fcwb&data=02%7C01%7Ca.hardie%40lancaster.ac.uk%7Cb66b3258c5e14166979408d8332cad95%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C0%7C637315614284939152&sdata=7BJt3juDTZifNeu1y3ercKVcPwZPdfm%2Fc8sJHqUFkm8%3D&reserved=0>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20200728/5e6fda08/attachment-0001.html>


More information about the CWB mailing list