[CWB] empty element

Hardie, Andrew a.hardie at lancaster.ac.uk
Tue May 12 08:58:07 CEST 2020


>>>>Do you remember whether cwb-encode will read   <pause dur=short/>   as an open tag?

Yes. Line 1886ff:

                    if (buf[j-1] == '/') {
                      j--; /* empty tag: remove "/" from annotation string and handle as an open tag */
                      /* Note that this implicitly closes the previous instance of the empty tag:
                       *  - this means that we can work with empty elements by looking just at the "open-point" of each range;
                       *  - it also means that empty tags with metadata at the start of each text will automatically extend over the full text.
                       * However, the approach sketched here only works with "flat" s-attributes declared without recursion (even without :0). */
                    }

>>>> Perhaps worth a new flag (-E pause+dur) which only accepts empty elements and doesn't allow nesting?

I think that sort of additional complication should be left till v4, no?

Andrew.

-----Original Message-----
From: cwb-bounces at sslmit.unibo.it <cwb-bounces at sslmit.unibo.it> On Behalf Of Stefan Evert
Sent: 12 May 2020 07:22
To: CWBdev Mailing List <cwb at sslmit.unibo.it>
Subject: Re: [CWB] empty element


> On 11 May 2020, at 23:04, Stefania Spina <stefania.spina at unistrapg.it> wrote:
>
> Thank you Stefan and Andrew!
> And will it also work if <pause> has an attribute?
> <pause dur="short"></pause>

Yes.  In the BNCweb solution, you would have to search the tags_before attribute for the full shape of the XML tag, e.g.

        [tags_before = '.*<pause\s*dur="short">.*']

which becomes much more complicated if there could be multiple attributes in <pause> in different order.  For the feature set, we devised a special encoding, e.g. something like

        |pause|pause_dur=short|

that you can search with

        [tags_before contains "pause_dur=short"]

In Andrew's solution, which I like better, you'll have to make sure to (i) remove any close tags (so the range extends to the next <pause> item) and (ii) encode with declaration

        -S pause+dur

Omitting a nesting specifier (such as ":0") ensures that ranges are automatically closed when the next open tag is encountered.

@Andrew: Do you remember whether cwb-encode will read

        <pause dur=short/>

as an open tag?  Perhaps worth a new flag (-E pause+dur) which only accepts empty elements and doesn't allow nesting?

Best,
Stefan

_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fliste.sslmit.unibo.it%2Fmailman%2Flistinfo%2Fcwb&amp;data=02%7C01%7Ca.hardie%40lancaster.ac.uk%7C4968de9cadcc4dc3292d08d7f63ce27b%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C1%7C637248613674777211&amp;sdata=1g83Cyz45sFsDq8YCF%2F6Zc1XJvFYEMKNtZ8zmfCPI0U%3D&amp;reserved=0


More information about the CWB mailing list