[CWB] How to append corpus data into an existing corpora?
Hardie, Andrew
a.hardie at lancaster.ac.uk
Mon Oct 16 14:31:51 CEST 2023
I mean it cannot be done at all. You need to start over. As you indicate … because this…
>> we can instead only run cwb-encode command to re-index and overwrite the existing corpora index
=starting over. So it’s starting over whether you do it via the web UI or the CLI.
But overwriting the existing index is a bad idea, because any saved queries that referenced the index will still point there – but now they are no longer pointing at the same data.
Better to have parallel names with a changeable suffix:
mycorpus-01
mycorpus-02
…
or
mycorpus-20231015
mycorpus-20231016
…
So that there will not be confusion regarding what corpus any given saved query is associated with. (whether or not you opt to delete older indexes).
best
Andrew.
From: cwb-bounces at sslmit.unibo.it <cwb-bounces at sslmit.unibo.it> On Behalf Of ???
Sent: Monday, October 16, 2023 12:46 PM
To: cwb at sslmit.unibo.it
Subject: Re: [CWB] CWB Digest, Vol 199, Issue 5
Thank you, Andrew! Do you mean we cannot make it on the admin-ui webpage, we can instead only run cwb-encode command to re-index and overwrite the existing corpora index? If so, it really sucks.It cannot be done by adding more files via the web-ui.
Vincent Zhang
From: cwb-request at sslmit.unibo.it<mailto:cwb-request at sslmit.unibo.it>
Date: 2023-10-16 18:00:01
To: cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>
Subject: CWB Digest, Vol 199, Issue 5>Send CWB mailing list submissions to
> cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>
>
>To subscribe or unsubscribe via the World Wide Web, visit
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>or, via email, send a message with subject or body 'help' to
> cwb-request at sslmit.unibo.it<mailto:cwb-request at sslmit.unibo.it>
>
>You can reach the person managing the list at
> cwb-owner at sslmit.unibo.it<mailto:cwb-owner at sslmit.unibo.it>
>
>When replying, please edit your Subject line so it is more specific
>than "Re: Contents of CWB digest..."
>
>
>Today's Topics:
>
> 1. How to append corpus data into an existing corpora?
> (wzzhang at shisu.edu.cn<mailto:wzzhang at shisu.edu.cn>)
> 2. Re: How to append corpus data into an existing corpora?
> (Hardie, Andrew)
>
>
>----------------------------------------------------------------------
>
>Message: 1
>Date: Mon, 16 Oct 2023 13:59:39 +0800
>From: "wzzhang at shisu.edu.cn<mailto:wzzhang at shisu.edu.cn>" <wzzhang at shisu.edu.cn<mailto:wzzhang at shisu.edu.cn>>
>To: cwb <cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>>
>Subject: [CWB] How to append corpus data into an existing corpora?
>Message-ID: <202310161358581732745 at shisu.edu.cn<mailto:202310161358581732745 at shisu.edu.cn>>
>Content-Type: text/plain; charset="gb2312"
>
>Hello everyone,
>I found nowhere to append a new VRT file into an existing corpora. If it lack this feature, how to sustainably improve a corpora?
>
>
>
>Vincent Zhang
>Institute of Corpus Studies and Applications, Shanghai International Studies University
>-------------- next part --------------
>An HTML attachment was scrubbed...
>URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20231016/ef192825/attachment-0001.html>
>
>------------------------------
>
>Message: 2
>Date: Mon, 16 Oct 2023 06:19:46 +0000
>From: "Hardie, Andrew" <a.hardie at lancaster.ac.uk<mailto:a.hardie at lancaster.ac.uk>>
>To: Open source development of the Corpus WorkBench
> <cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>>
>Subject: Re: [CWB] How to append corpus data into an existing corpora?
>Message-ID:
> <LO4P265MB3485AD0D1262A6549EBA62EECBD7A at LO4P265MB3485.GBRP265.PROD.OUTLOOK.COM<mailto:LO4P265MB3485AD0D1262A6549EBA62EECBD7A at LO4P265MB3485.GBRP265.PROD.OUTLOOK.COM>>
>
>Content-Type: text/plain; charset="us-ascii"
>
>That's because you can't do it.
>
>You have to create a new corpus index from your original files with your new files appended to them.
>
>Each CWB index then corresponds to the state of your corpus at some particular moment in time. (This is actually desirable from the point of view of replicability of results.)
>
>best
>
>Andrew.
>
>From: cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it> <cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it>> On Behalf Of wzzhang at shisu.edu.cn<mailto:wzzhang at shisu.edu.cn>
>Sent: Monday, October 16, 2023 7:00 AM
>To: cwb <cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>>
>Subject: [CWB] How to append corpus data into an existing corpora?
>
>Hello everyone,
>I found nowhere to append a new VRT file into an existing corpora. If it lack this feature, how to sustainably improve a corpora?
>
>________________________________
>Vincent Zhang
>Institute of Corpus Studies and Applications, Shanghai International Studies University
>-------------- next part --------------
>An HTML attachment was scrubbed...
>URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20231016/38eb1612/attachment-0001.html>
>
>------------------------------
>
>_______________________________________________
>CWB mailing list
>CWB at sslmit.unibo.it<mailto:CWB at sslmit.unibo.it>
>http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>
>
>End of CWB Digest, Vol 199, Issue 5
>***********************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20231016/fe7f0fd6/attachment-0001.html>
More information about the CWB
mailing list