[CWB] cqp sorting by s-attribute?
Maarten Janssen
maartenpt at gmail.com
Tue Jun 11 09:50:49 CEST 2019
That is indeed an inconvenient feature of CQP - and in fact one of 3 reasons that made me make my own implementation (which, as Stefan already hints at, is currently insufficiently efficient at least for generic queries). Sorting for the command line sometimes will work, but it quickly becomes nasty. For historical corpora, there is a quick hack: you typically have only one thing you want to sort on: the year. So you can just hard-code the year as a pattribute, and then you will be able to sort on it. And since you should always search “within text”, the problem of not all the elements in the match having the same year should not pop up (which would be different if you were to use the same trick for say NP based features. And of course, you will have to assign an actual number to the year, since you cannot sort on say [ca 1500-1543], but you would have to add an explicit what TEITOK calls “best guess year” - which is typically the midpoint of the range, but 1540 for a date like “probably around 1540 but could be as early as 1500”. I don’t think CWB has any feature to dynamically do this, so you would just have to modify the VRT:
<text year=“[ca 1500-1543]”>
My PP my
little AQ little
text NN text
</text>
=>
<text year=“[ca 1500-1543]”>
My PP my 1540
little AQ little 1540
text NN text 1540
</text>
Matches = [word=“text”]; sort Matches by year;
More information about the CWB
mailing list