[CWB] empty element
Hardie, Andrew
a.hardie at lancaster.ac.uk
Tue May 12 08:58:07 CEST 2020
>>>>Do you remember whether cwb-encode will read <pause dur=short/> as an open tag?
Yes. Line 1886ff:
if (buf[j-1] == '/') {
j--; /* empty tag: remove "/" from annotation string and handle as an open tag */
/* Note that this implicitly closes the previous instance of the empty tag:
* - this means that we can work with empty elements by looking just at the "open-point" of each range;
* - it also means that empty tags with metadata at the start of each text will automatically extend over the full text.
* However, the approach sketched here only works with "flat" s-attributes declared without recursion (even without :0). */
}
>>>> Perhaps worth a new flag (-E pause+dur) which only accepts empty elements and doesn't allow nesting?
I think that sort of additional complication should be left till v4, no?
Andrew.
-----Original Message-----
From: cwb-bounces at sslmit.unibo.it <cwb-bounces at sslmit.unibo.it> On Behalf Of Stefan Evert
Sent: 12 May 2020 07:22
To: CWBdev Mailing List <cwb at sslmit.unibo.it>
Subject: Re: [CWB] empty element
> On 11 May 2020, at 23:04, Stefania Spina <stefania.spina at unistrapg.it> wrote:
>
> Thank you Stefan and Andrew!
> And will it also work if <pause> has an attribute?
> <pause dur="short"></pause>
Yes. In the BNCweb solution, you would have to search the tags_before attribute for the full shape of the XML tag, e.g.
[tags_before = '.*<pause\s*dur="short">.*']
which becomes much more complicated if there could be multiple attributes in <pause> in different order. For the feature set, we devised a special encoding, e.g. something like
|pause|pause_dur=short|
that you can search with
[tags_before contains "pause_dur=short"]
In Andrew's solution, which I like better, you'll have to make sure to (i) remove any close tags (so the range extends to the next <pause> item) and (ii) encode with declaration
-S pause+dur
Omitting a nesting specifier (such as ":0") ensures that ranges are automatically closed when the next open tag is encountered.
@Andrew: Do you remember whether cwb-encode will read
<pause dur=short/>
as an open tag? Perhaps worth a new flag (-E pause+dur) which only accepts empty elements and doesn't allow nesting?
Best,
Stefan
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fliste.sslmit.unibo.it%2Fmailman%2Flistinfo%2Fcwb&data=02%7C01%7Ca.hardie%40lancaster.ac.uk%7C4968de9cadcc4dc3292d08d7f63ce27b%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C1%7C637248613674777211&sdata=1g83Cyz45sFsDq8YCF%2F6Zc1XJvFYEMKNtZ8zmfCPI0U%3D&reserved=0
More information about the CWB
mailing list