[CWB] cqp sorting by s-attribute?
Stefan Evert
stefanML at collocations.de
Tue Jun 11 09:05:06 CEST 2019
> I was wondering if there is some way to sort cqp KWIC results by s-attributes (text_id, tex_date) instead of sorting them -attributes (word, lemma, etc.). I tried
>
> CORPUS> sort Last by text_date;
That's not possible because s-attributes are implemented in an entirely different way than p-attributes, so the sorting code would have to be rewritten completely (and it would be less efficient).
S-attributes also used not to work with "group", but special case code was added there at some time (which uses a trick I'd rather not speak about to achieve efficient counting).
If you want your query sorted by s-attributes, you will have to rely on external tools. The basic procedure is as follows (assuming a named query Query rather than implicit Last):
tabulate Query match, matchend, match text_date > "query.txt";
Then open the file "query.txt" with spreadsheet software (preferably LibreOffice; with MS Excel, make sure to select "Open File" from the menu so you'll get the import dialog to read TAB-delimited data properly). You can now sort on the third column (or whatever other criteria you want to add), then remove everything except for the first two columns and save them as a TAB-delimited file (say "query_sorted.txt").
It is important to make sure that only the "match" and "matchend" columns are left in this file, so it can be imported back into CQP. In CQP, the next steps is:
undump Sorted < "query_sorted.txt";
You should now see that the query results are sorted by date:
set PrintStructures story_title;
cat Sorted;
If you're familiar with Unix command line tools, you will be able to do the sort much more easily with a combination of "sort" and "cut". This can even be included in a pipe run from within CQP.
If your query has target and/or keyword anchors, you will have to add them to the file and make sure they're read back in.
Hope this helps,
Stefan
More information about the CWB
mailing list