[CWB] empty element
Stefan Evert
stefanML at collocations.de
Tue May 12 08:22:22 CEST 2020
> On 11 May 2020, at 23:04, Stefania Spina <stefania.spina at unistrapg.it> wrote:
>
> Thank you Stefan and Andrew!
> And will it also work if <pause> has an attribute?
> <pause dur="short"></pause>
Yes. In the BNCweb solution, you would have to search the tags_before attribute for the full shape of the XML tag, e.g.
[tags_before = '.*<pause\s*dur="short">.*']
which becomes much more complicated if there could be multiple attributes in <pause> in different order. For the feature set, we devised a special encoding, e.g. something like
|pause|pause_dur=short|
that you can search with
[tags_before contains "pause_dur=short"]
In Andrew's solution, which I like better, you'll have to make sure to (i) remove any close tags (so the range extends to the next <pause> item) and (ii) encode with declaration
-S pause+dur
Omitting a nesting specifier (such as ":0") ensures that ranges are automatically closed when the next open tag is encountered.
@Andrew: Do you remember whether cwb-encode will read
<pause dur=short/>
as an open tag? Perhaps worth a new flag (-E pause+dur) which only accepts empty elements and doesn't allow nesting?
Best,
Stefan
More information about the CWB
mailing list