[CWB] Regular expressions with word groups
"Andrés Chandía"
andres at chandia.net
Tue Jul 28 21:38:20 CEST 2020
I can see a difference from the example on the manual:
you have !([word="en"][word="tierra"])
while following the example you should have !(word="en"
word="tierra")
the [ ] square brackets enclosure all the expression not every part, at least at the
exemple...
try that!
El Mar, 28 de Julio de 2020, 21:22, Josep M. Fontana escribió:
Hi Andrés,
Yes, you understand correctly. In this search that I gave as an example:
[(word="f[ei]rid.*")|(word="muert[ao].*")] !(([(pos="S.*")
&
(word="d.*")][word=".*el"][word="ca[buv]allo.*"])|[word="entierra"]|([word="en"][word="tierra"]))
I want an expression matching any of the patterns within the first pair of square
brackers '[ ]' followed by any words or word groups that are NOT either "del
cauallo" (or any of its variations), "entierra" or "en tierra".
If I place the ! before the parenthesis enclosing the rest of word groups, I still
get "muerto en tierra". If I place it before the parenthesis including the
word group "en tierra" (like this ➝
!([word="en"][word="tierra"]) I still get "muerto en
tierra".
I assume the example that appears in the manual (i.e. [(lemma="go") &
!(word="went"%c | word="gone"%c)]; ) works but I'm wondering whether
the '!' operator only works when it appears within a group enclosed in '[ ]'.
The kind of regular expression I'm using cannot be all included within '[ ]'. When I
don't use the ! operator it works perfectly and I get "muerto en tierra",
"muerto entierra", "muerto del cavallo", etc. Why can't I negate the
whole thing by placing a single ! somewhere? I've tried placing the ! after the first
parenthesis and it doesn't work either.
JM
Maybe I'm wrong, but I understand that you want nothing from the ! on....
check this regex from the manual:
[(lemma="go") &
!(word="went"%c | word="gone"%c)];
I hope it is what you're looking for, if so, it is just a thing of
notation....
El Mar, 28 de Julio de 2020, 19:56, Josep M. Fontana escribió:
Thanks Andrews and/or Andreses for your quick responses,
I have had problems with both of our suggestions. In principle I would like to
use a single ! operator for the whole regular expression pattern rather
than having to add it to every relevant subpattern.
But wherever I place the '!' i don't seem to get the desired results. So, if I do
the following:
[(word="f[ei]rid.*")|(word="muert[ao].*")] !(([(pos="S.*")
&
(word="d.*")][word=".*el"][word="ca[buv]allo.*"])|[word="entierra"]|([word="en"][word="tierra"]))
one of the first hits I get is:
cayo muerto en tierra
This should be out because the last of the word groups that I have is
([word="en"][word="tierra"]). Shouldn't the ! operator have
scope over the last group as well?
Josep M.
Hey Josep,
I'm fairly sure you just use ! if you're not testing a specific
annotation. Place ! before the left parenthesis of the group you're
trying to test. So if you're trying to test the negation of that entire group,
just add ! to the front of it. See 2.6 in the CQP Tutorial.
All the best,
Andrew
On Tue, Jul 28, 2020 at 11:15 AM Josep M.
Fontana josepm.fontana at upf.edu> wrote:
Hi,
I
don't know whether this is the right forum to ask this particular kind
of question but I figure there are enough people here with
sufficient
experience to lend me a hand with this
problem. If you cannot answer the
question but you can
point me to some other forum/group where I can find
help, I would appreciate it.
So, I have the
following regular expression to identify a set of
expressions
that can appear in a particular position in the text. What I
would like to do is to create the negation of this
regular expression.
That is, any string/expression/group of expressions
that does NOT
contain the expressions in these
groups.
I know how to use the !=
operator for a particular item (word, pos,
lemma) but
where would one insert this operator to have scope over the
whole group of expressions that match this pattern? Thanks in advance.
(([(pos="S.*") &
(word="d.*")][word=".*el"][word="ca[buv]allo.*"])|[word="entierra"]|([word="en"][word="tierra"]))
Josep M.
_______________________________________________
CWB
mailing list
CWB at sslmit.unibo.it
http://liste.sslmit.unibo.it/mailman/listinfo/cwb
_______________________________________________ CWB mailing
list CWB at sslmit.unibo.it http://liste.sslmit.unibo.it/mailman/listinfo/cwb
_______________________
andrés
chandía
Düngupeyem | IECMap | ISECMap | NMT | Corlexim
Desarrollador de:
Parles.upf | IWCH | Amind
terapia | ONG Mapuche koyaktu |
Nocando | IAC | CddZ | ISAC | CatCg
P No imprima innecesariamente. ¡Cuide el medio
ambiente!
_______________________________________________ CWB mailing
list CWB at sslmit.unibo.it http://liste.sslmit.unibo.it/mailman/listinfo/cwb
_______________________
andrés
chandía
Düngupeyem | IECMap | ISECMap | NMT | Corlexim
Desarrollador de:
Parles.upf | IWCH | Amind terapia | ONG
Mapuche koyaktu | Nocando | IAC | CddZ | ISAC | CatCg
P No imprima innecesariamente. ¡Cuide el
medio ambiente!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20200728/5cdabd0e/attachment.html>
More information about the CWB
mailing list