[CWB] How to append corpus data into an existing corpora?

Hardie, Andrew a.hardie at lancaster.ac.uk
Mon Oct 16 14:31:51 CEST 2023


I mean it cannot be done at all. You need to start over. As you indicate … because this…

>> we can instead only run cwb-encode command to re-index and overwrite the existing corpora index

=starting over. So it’s starting over whether you do it via the web UI or the CLI.

But overwriting the existing index is a bad idea, because any saved queries that referenced the index will still point there – but now they are no longer pointing at the same data.

Better to have parallel names with a changeable suffix:

mycorpus-01
mycorpus-02
…

or

mycorpus-20231015
mycorpus-20231016
…

So that there will not be confusion regarding what corpus any given saved query is associated with. (whether or not you opt to delete older indexes).

best

Andrew.

From: cwb-bounces at sslmit.unibo.it <cwb-bounces at sslmit.unibo.it> On Behalf Of ???
Sent: Monday, October 16, 2023 12:46 PM
To: cwb at sslmit.unibo.it
Subject: Re: [CWB] CWB Digest, Vol 199, Issue 5

Thank you, Andrew! Do you mean we cannot make it on the admin-ui webpage, we can instead only run cwb-encode command to re-index and overwrite the existing corpora index? If so, it really sucks.It cannot be done by adding more files via the web-ui.




Vincent Zhang

From: cwb-request at sslmit.unibo.it<mailto:cwb-request at sslmit.unibo.it>

Date: 2023-10-16 18:00:01

To:  cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>

Subject: CWB Digest, Vol 199, Issue 5>Send CWB mailing list submissions to

>       cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>

>

>To subscribe or unsubscribe via the World Wide Web, visit

>       http://liste.sslmit.unibo.it/mailman/listinfo/cwb

>or, via email, send a message with subject or body 'help' to

>       cwb-request at sslmit.unibo.it<mailto:cwb-request at sslmit.unibo.it>

>

>You can reach the person managing the list at

>       cwb-owner at sslmit.unibo.it<mailto:cwb-owner at sslmit.unibo.it>

>

>When replying, please edit your Subject line so it is more specific

>than "Re: Contents of CWB digest..."

>

>

>Today's Topics:

>

>   1. How to append corpus data into an existing corpora?

>      (wzzhang at shisu.edu.cn<mailto:wzzhang at shisu.edu.cn>)

>   2. Re: How to append corpus data into an existing corpora?

>      (Hardie, Andrew)

>

>

>----------------------------------------------------------------------

>

>Message: 1

>Date: Mon, 16 Oct 2023 13:59:39 +0800

>From: "wzzhang at shisu.edu.cn<mailto:wzzhang at shisu.edu.cn>" <wzzhang at shisu.edu.cn<mailto:wzzhang at shisu.edu.cn>>

>To: cwb <cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>>

>Subject: [CWB] How to append corpus data into an existing corpora?

>Message-ID: <202310161358581732745 at shisu.edu.cn<mailto:202310161358581732745 at shisu.edu.cn>>

>Content-Type: text/plain; charset="gb2312"

>

>Hello everyone,

>I found nowhere to append a new VRT file into an existing corpora. If it lack this feature, how to sustainably improve a corpora?

>

>

>

>Vincent Zhang

>Institute of Corpus Studies and Applications, Shanghai International Studies University

>-------------- next part --------------

>An HTML attachment was scrubbed...

>URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20231016/ef192825/attachment-0001.html>

>

>------------------------------

>

>Message: 2

>Date: Mon, 16 Oct 2023 06:19:46 +0000

>From: "Hardie, Andrew" <a.hardie at lancaster.ac.uk<mailto:a.hardie at lancaster.ac.uk>>

>To: Open source development of the Corpus WorkBench

>       <cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>>

>Subject: Re: [CWB] How to append corpus data into an existing corpora?

>Message-ID:

> <LO4P265MB3485AD0D1262A6549EBA62EECBD7A at LO4P265MB3485.GBRP265.PROD.OUTLOOK.COM<mailto:LO4P265MB3485AD0D1262A6549EBA62EECBD7A at LO4P265MB3485.GBRP265.PROD.OUTLOOK.COM>>

>

>Content-Type: text/plain; charset="us-ascii"

>

>That's because you can't do it.

>

>You have to create a new corpus index from your original files with your new files appended to them.

>

>Each CWB index then corresponds to the state of your corpus at some particular moment in time. (This is actually desirable from the point of view of replicability of results.)

>

>best

>

>Andrew.

>

>From: cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it> <cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it>> On Behalf Of wzzhang at shisu.edu.cn<mailto:wzzhang at shisu.edu.cn>

>Sent: Monday, October 16, 2023 7:00 AM

>To: cwb <cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>>

>Subject: [CWB] How to append corpus data into an existing corpora?

>

>Hello everyone,

>I found nowhere to append a new VRT file into an existing corpora. If it lack this feature, how to sustainably improve a corpora?

>

>________________________________

>Vincent Zhang

>Institute of Corpus Studies and Applications, Shanghai International Studies University

>-------------- next part --------------

>An HTML attachment was scrubbed...

>URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20231016/38eb1612/attachment-0001.html>

>

>------------------------------

>

>_______________________________________________

>CWB mailing list

>CWB at sslmit.unibo.it<mailto:CWB at sslmit.unibo.it>

>http://liste.sslmit.unibo.it/mailman/listinfo/cwb

>

>

>End of CWB Digest, Vol 199, Issue 5

>***********************************

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20231016/fe7f0fd6/attachment-0001.html>


More information about the CWB mailing list