[CWB] empty element

Stefania Spina stefania.spina at unistrapg.it
Tue May 12 09:18:15 CEST 2020


Thank you, it works perfectly now.
In my first try without the close tag I had kept the :0 encoding and it
only found the first occurrence of <pause>.

Thank you so much for your help,
Stefania

---
*Prof. Stefania Spina*
Università per Stranieri di Perugia
Delegata alla Ricerca
stefania.spina at unistrapg.it
https://www.researchgate.net/profile/Stefania_Spina2
<https://unistrapg.academia.edu/StefaniaSpina>



Il giorno mar 12 mag 2020 alle ore 08:58 Hardie, Andrew <
a.hardie at lancaster.ac.uk> ha scritto:

> >>>>Do you remember whether cwb-encode will read   <pause dur=short/>   as
> an open tag?
>
> Yes. Line 1886ff:
>
>                     if (buf[j-1] == '/') {
>                       j--; /* empty tag: remove "/" from annotation string
> and handle as an open tag */
>                       /* Note that this implicitly closes the previous
> instance of the empty tag:
>                        *  - this means that we can work with empty
> elements by looking just at the "open-point" of each range;
>                        *  - it also means that empty tags with metadata at
> the start of each text will automatically extend over the full text.
>                        * However, the approach sketched here only works
> with "flat" s-attributes declared without recursion (even without :0). */
>                     }
>
> >>>> Perhaps worth a new flag (-E pause+dur) which only accepts empty
> elements and doesn't allow nesting?
>
> I think that sort of additional complication should be left till v4, no?
>
> Andrew.
>
> -----Original Message-----
> From: cwb-bounces at sslmit.unibo.it <cwb-bounces at sslmit.unibo.it> On Behalf
> Of Stefan Evert
> Sent: 12 May 2020 07:22
> To: CWBdev Mailing List <cwb at sslmit.unibo.it>
> Subject: Re: [CWB] empty element
>
>
> > On 11 May 2020, at 23:04, Stefania Spina <stefania.spina at unistrapg.it>
> wrote:
> >
> > Thank you Stefan and Andrew!
> > And will it also work if <pause> has an attribute?
> > <pause dur="short"></pause>
>
> Yes.  In the BNCweb solution, you would have to search the tags_before
> attribute for the full shape of the XML tag, e.g.
>
>         [tags_before = '.*<pause\s*dur="short">.*']
>
> which becomes much more complicated if there could be multiple attributes
> in <pause> in different order.  For the feature set, we devised a special
> encoding, e.g. something like
>
>         |pause|pause_dur=short|
>
> that you can search with
>
>         [tags_before contains "pause_dur=short"]
>
> In Andrew's solution, which I like better, you'll have to make sure to (i)
> remove any close tags (so the range extends to the next <pause> item) and
> (ii) encode with declaration
>
>         -S pause+dur
>
> Omitting a nesting specifier (such as ":0") ensures that ranges are
> automatically closed when the next open tag is encountered.
>
> @Andrew: Do you remember whether cwb-encode will read
>
>         <pause dur=short/>
>
> as an open tag?  Perhaps worth a new flag (-E pause+dur) which only
> accepts empty elements and doesn't allow nesting?
>
> Best,
> Stefan
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
>
> https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fliste.sslmit.unibo.it%2Fmailman%2Flistinfo%2Fcwb&amp;data=02%7C01%7Ca.hardie%40lancaster.ac.uk%7C4968de9cadcc4dc3292d08d7f63ce27b%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C1%7C637248613674777211&amp;sdata=1g83Cyz45sFsDq8YCF%2F6Zc1XJvFYEMKNtZ8zmfCPI0U%3D&amp;reserved=0
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20200512/d060d75a/attachment.html>


More information about the CWB mailing list