[CWB] [CQPweb] Expanding existing corpora

Scott Sadowsky ssadowsky at gmail.com
Sat Jun 15 11:19:37 CEST 2019


I have a situation which is probably not the norm for most users here. I
have a corpus which I will be putting online gradually, in 20 or 30
installments over the next two years or so, as texts can be reviewed a
second time for personally identifying or sensitive information, and such
things can be redacted (it's a speech corpus).

When a new batch of texts is ready I process, tag and compile all the files
that are fit for public consumption into a CQP corpus, upload the new set
of CQP files to the server (replacing the old ones), and then re-run the
frequency and STTR calculation scripts on the server. This updates the
frequencies shown everywhere I've looked (test query results, corpus
metadata, etc.) -- so far, so good.

The one thing I haven't been able to get to update, however, are the values
of the text metadata and word-level annotation variables (as seen in the
selection boxes of restricted queries and subcorpus creation).

Thus, if the first version of the corpus only had four of six socioeconomic
statuses (say 1, 2, 3, 6) and a new version includes one or more speakers
of SES 4, this new SES doesn't show up anywhere.

*Is there any way to update a corpus so that it rescans metadata like p-
and s-attributes and their values?* My goal is to avoid having to recreate
the corpus from scratch over and over.

Thanks in advance,
Scott

NOTE Unless I've misunderstood something, I'm *not* adding new p- or
s-attributes, but rather new *values* for existing p-attributes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20190615/806b614e/attachment.html>


More information about the CWB mailing list