[CWB] empty element

Stefania Spina stefania.spina at unistrapg.it
Mon May 11 23:04:36 CEST 2020


Thank you Stefan and Andrew!
And will it also work if <pause> has an attribute?
<pause dur="short"></pause>

Best,
Stefania

---
*Prof. Stefania Spina*
Università per Stranieri di Perugia
Delegata alla Ricerca
stefania.spina at unistrapg.it
https://www.researchgate.net/profile/Stefania_Spina2
<https://unistrapg.academia.edu/StefaniaSpina>



Il giorno lun 11 mag 2020 alle ore 20:19 Hardie, Andrew <
a.hardie at lancaster.ac.uk> ha scritto:

> The alternative solution is just to represent empty tags as opening tags.
> There will then be an implicit closing tag before the next tag of the same
> sort - which you can ignore. Then, you can search for for instance
>
>     <pause> []
>
> to get words after a pause (after a "begin-pause" literally, but you know
> the begin-pause actually represents the point-position of the pause).
>
> Andrew
>
> -----Original Message-----
> From: cwb-bounces at sslmit.unibo.it <cwb-bounces at sslmit.unibo.it> On Behalf
> Of Stefan Evert
> Sent: 11 May 2020 16:15
> To: CWBdev Mailing List <cwb at sslmit.unibo.it>
> Subject: Re: [CWB] empty element
>
>
> > I have problems with a <pause></pause> xml tag in a spoken corpus.
> > If I run a query, for example to look at all the words following or
> preceding a pause, I get no results, both in CWB and in CQPweb. I guess
> that the problem is that it is an empty element, without any text inside
> the xml tags.
>
> Exactly: CWB doesn't support empty XML elements, all s-attribute regions
> must enclose one or more tokens.  And for good reason, as empty elements
> are a major pain in the corpus.
>
> > How do you suggest to solve this problem?
>
> BNCweb solves this problem by encoding such empty tags before the current
> token as a p-attribute, either in XML notation, e.g.
>
>         <pause/><noise/>
>
> or as a feature set
>
>         |noise|pause|
>
> so it is easier to query for a specific tag, e.g with
>
>         [tags_before contains "pause"]
>
> In fact, BNCweb stores _all_ XML tags (not just empty ones) before and
> after the current position in two separate p-attributes, which makes it a
> lot easier to reconstruct the original XML markup in the context display.
>
> Best,
> Stefan
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
>
> https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fliste.sslmit.unibo.it%2Fmailman%2Flistinfo%2Fcwb&amp;data=02%7C01%7Ca.hardie%40lancaster.ac.uk%7Cd20290b175144e4d3fa808d7f5be2337%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C1%7C637248069312816540&amp;sdata=WjxvmA%2FlEVU8JMv5o9w56cc6htpGLg4dIo%2FDbGT6p6E%3D&amp;reserved=0
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20200511/e4eef217/attachment.html>


More information about the CWB mailing list