[CWB] empty element

Hardie, Andrew a.hardie at lancaster.ac.uk
Mon May 11 20:19:35 CEST 2020


The alternative solution is just to represent empty tags as opening tags. There will then be an implicit closing tag before the next tag of the same sort - which you can ignore. Then, you can search for for instance

    <pause> []

to get words after a pause (after a "begin-pause" literally, but you know the begin-pause actually represents the point-position of the pause).

Andrew

-----Original Message-----
From: cwb-bounces at sslmit.unibo.it <cwb-bounces at sslmit.unibo.it> On Behalf Of Stefan Evert
Sent: 11 May 2020 16:15
To: CWBdev Mailing List <cwb at sslmit.unibo.it>
Subject: Re: [CWB] empty element


> I have problems with a <pause></pause> xml tag in a spoken corpus.
> If I run a query, for example to look at all the words following or preceding a pause, I get no results, both in CWB and in CQPweb. I guess that the problem is that it is an empty element, without any text inside the xml tags.

Exactly: CWB doesn't support empty XML elements, all s-attribute regions must enclose one or more tokens.  And for good reason, as empty elements are a major pain in the corpus.

> How do you suggest to solve this problem?

BNCweb solves this problem by encoding such empty tags before the current token as a p-attribute, either in XML notation, e.g.

        <pause/><noise/>

or as a feature set

        |noise|pause|

so it is easier to query for a specific tag, e.g with

        [tags_before contains "pause"]

In fact, BNCweb stores _all_ XML tags (not just empty ones) before and after the current position in two separate p-attributes, which makes it a lot easier to reconstruct the original XML markup in the context display.

Best,
Stefan

_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fliste.sslmit.unibo.it%2Fmailman%2Flistinfo%2Fcwb&amp;data=02%7C01%7Ca.hardie%40lancaster.ac.uk%7Cd20290b175144e4d3fa808d7f5be2337%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C1%7C637248069312816540&amp;sdata=WjxvmA%2FlEVU8JMv5o9w56cc6htpGLg4dIo%2FDbGT6p6E%3D&amp;reserved=0


More information about the CWB mailing list