[CWB] Installation of a corpus and subsequent problems...
Hardie, Andrew
a.hardie at lancaster.ac.uk
Wed May 14 22:40:57 CEST 2025
The CQP parser objects to IDs that start with a digit. Thus the problem (“<1984_1> is not a valid corpus name”) that causes it to yield an error as soon as it starts up and reads your registry.
The immediate solution for you: manually delete the file /var/cqpweb/registry/1984_1
And reindex with a corpus ID that starts in a letter.
Long-term, this is a bug, because (a) CQPweb (neither the UI nor the backend) doesn’t check for IDs that start in a digit before feeding them to cwb-encode; and (b) cwb-encode doesn’t check the registry filename, which determines the corpus ID, to make sure it starts in a letter. I’ll see about fixing those.
Best
Andrew.
From: cwb-bounces at sslmit.unibo.it <cwb-bounces at sslmit.unibo.it> On Behalf Of Graham Ranger -- UAPV
Sent: 14 May 2025 20:09
To: Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it>
Subject: [CWB] Installation of a corpus and subsequent problems...
Hello to everyone,
The title says it all, or almost.
1) I attempted to install a corpus with a set of xml tags, etc. but ran into an error;
2) I then attempted to install a mini-corpus, in an effort at debugging, and ran into the same error (something about extra material after xml tags, repeated for every line of the corpus) -- I can't be more precise for reasons which will soon become clear;
3) I then attempted to delete the corpus which, although not created, was occupying a registry entry, and now have another error message: "**** CQP ERROR **** cl_new_corpus: <1984_1> is not a valid corpus name REGISTRY ERROR (/var/cqpweb/registry/1984_1): syntax error REGISTRY ERROR (/var/cqpweb/registry/1984_1): Error parsing the main Registry structure. CQPweb encountered an error and could not continue."
4) I am now unable to execute any queries or do anything much with cqpweb... On executing a query, for example, I get this error message:
CQP reports an error! The CQP program sent back these error messages:
**** CQP ERROR ****
CQP Error:
No corpus activated
CQP Error:
CQP Syntax Error: syntax error
[r] Registry <--
Ignoring subsequent input until next ';'...
I'm going to ask for the server to be restored to a previous state, which should provide a fix, but won't get me any further with installing the corpus I wished to set up. If there's a simpler way to repair the registry entries, I'd be interested.
The "toy corpus" which managed to break the server was as follows:
<text id="1984_1">
<title>Nineteen eighty-four</title>
<div1 type="part" n="1">
<head>PART 1</head>
<div2 type="chapter" n="1">
<head>1</head>
<p>It was a bright cold day in April, and the clocks were striking thirteen. Winston Smith, his chin nuzzled into his breast in an effort to escape the vile wind, slipped quickly through the glass doors of Victory Mansions, though not quickly enough to prevent a swirl of gritty dust from entering along with him.</p>
</div2>
</div1>
</text>
Many thanks in advance for any help with this!
Best regards,
Graham.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20250514/03dfd3a3/attachment-0001.html>
More information about the CWB
mailing list