[CWB] empty element
Hardie, Andrew
a.hardie at lancaster.ac.uk
Mon May 11 20:19:35 CEST 2020
The alternative solution is just to represent empty tags as opening tags. There will then be an implicit closing tag before the next tag of the same sort - which you can ignore. Then, you can search for for instance
<pause> []
to get words after a pause (after a "begin-pause" literally, but you know the begin-pause actually represents the point-position of the pause).
Andrew
-----Original Message-----
From: cwb-bounces at sslmit.unibo.it <cwb-bounces at sslmit.unibo.it> On Behalf Of Stefan Evert
Sent: 11 May 2020 16:15
To: CWBdev Mailing List <cwb at sslmit.unibo.it>
Subject: Re: [CWB] empty element
> I have problems with a <pause></pause> xml tag in a spoken corpus.
> If I run a query, for example to look at all the words following or preceding a pause, I get no results, both in CWB and in CQPweb. I guess that the problem is that it is an empty element, without any text inside the xml tags.
Exactly: CWB doesn't support empty XML elements, all s-attribute regions must enclose one or more tokens. And for good reason, as empty elements are a major pain in the corpus.
> How do you suggest to solve this problem?
BNCweb solves this problem by encoding such empty tags before the current token as a p-attribute, either in XML notation, e.g.
<pause/><noise/>
or as a feature set
|noise|pause|
so it is easier to query for a specific tag, e.g with
[tags_before contains "pause"]
In fact, BNCweb stores _all_ XML tags (not just empty ones) before and after the current position in two separate p-attributes, which makes it a lot easier to reconstruct the original XML markup in the context display.
Best,
Stefan
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fliste.sslmit.unibo.it%2Fmailman%2Flistinfo%2Fcwb&data=02%7C01%7Ca.hardie%40lancaster.ac.uk%7Cd20290b175144e4d3fa808d7f5be2337%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C1%7C637248069312816540&sdata=WjxvmA%2FlEVU8JMv5o9w56cc6htpGLg4dIo%2FDbGT6p6E%3D&reserved=0
More information about the CWB
mailing list