[CWB] empty element

Stefan Evert stefanML at collocations.de
Tue May 12 08:22:22 CEST 2020


> On 11 May 2020, at 23:04, Stefania Spina <stefania.spina at unistrapg.it> wrote:
> 
> Thank you Stefan and Andrew!
> And will it also work if <pause> has an attribute?
> <pause dur="short"></pause>

Yes.  In the BNCweb solution, you would have to search the tags_before attribute for the full shape of the XML tag, e.g.

	[tags_before = '.*<pause\s*dur="short">.*']

which becomes much more complicated if there could be multiple attributes in <pause> in different order.  For the feature set, we devised a special encoding, e.g. something like

	|pause|pause_dur=short|

that you can search with

	[tags_before contains "pause_dur=short"]

In Andrew's solution, which I like better, you'll have to make sure to (i) remove any close tags (so the range extends to the next <pause> item) and (ii) encode with declaration

	-S pause+dur

Omitting a nesting specifier (such as ":0") ensures that ranges are automatically closed when the next open tag is encountered.

@Andrew: Do you remember whether cwb-encode will read

	<pause dur=short/>

as an open tag?  Perhaps worth a new flag (-E pause+dur) which only accepts empty elements and doesn't allow nesting?

Best,
Stefan



More information about the CWB mailing list