From graham.ranger at univ-avignon.fr Wed May 14 21:09:21 2025 From: graham.ranger at univ-avignon.fr (Graham Ranger -- UAPV) Date: Wed, 14 May 2025 21:09:21 +0200 Subject: [CWB] Installation of a corpus and subsequent problems... Message-ID: <9a10486e-eec1-40a0-8a54-66857588c268@univ-avignon.fr> Hello to everyone, The title says it all, or almost. 1) I attempted to install a corpus with a set of xml tags, etc. but ran into an error; 2) I then attempted to install a mini-corpus, in an effort at debugging, and ran into the same error (something about extra material after xml tags, repeated for every line of the corpus) -- I can't be more precise for reasons which will soon become clear; 3) I then attempted to delete the corpus which, although not created, was occupying a registry entry, and now have another error message: "**** CQP ERROR **** cl_new_corpus: <1984_1> is not a valid corpus name REGISTRY ERROR (/var/cqpweb/registry/1984_1): syntax error REGISTRY ERROR (/var/cqpweb/registry/1984_1): Error parsing the main Registry structure. CQPweb encountered an error and could not continue." 4) I am now unable to execute any queries or do anything much with cqpweb... On executing a query, for example, I get this error message: CQP reports an error! The CQP program sent back these error messages: **** CQP ERROR **** CQP Error: No corpus activated CQP Error: CQP Syntax Error: syntax error [r] Registry <-- Ignoring subsequent input until next ';'... I'm going to ask for the server to be restored to a previous state, which should provide a fix, but won't get me any further with installing the corpus I wished to set up. If there's a simpler way to repair the registry entries, I'd be interested. The "toy corpus" which managed to break the server was as follows: Nineteen eighty-four PART 1 1

It was a bright cold day in April, and the clocks were striking thirteen. Winston Smith, his chin nuzzled into his breast in an effort to escape the vile wind, slipped quickly through the glass doors of Victory Mansions, though not quickly enough to prevent a swirl of gritty dust from entering along with him.

Many thanks in advance for any help with this! Best regards, Graham. -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.hardie at lancaster.ac.uk Wed May 14 22:40:57 2025 From: a.hardie at lancaster.ac.uk (Hardie, Andrew) Date: Wed, 14 May 2025 20:40:57 +0000 Subject: [CWB] Installation of a corpus and subsequent problems... In-Reply-To: <9a10486e-eec1-40a0-8a54-66857588c268@univ-avignon.fr> References: <9a10486e-eec1-40a0-8a54-66857588c268@univ-avignon.fr> Message-ID: The CQP parser objects to IDs that start with a digit. Thus the problem (?<1984_1> is not a valid corpus name?) that causes it to yield an error as soon as it starts up and reads your registry. The immediate solution for you: manually delete the file /var/cqpweb/registry/1984_1 And reindex with a corpus ID that starts in a letter. Long-term, this is a bug, because (a) CQPweb (neither the UI nor the backend) doesn?t check for IDs that start in a digit before feeding them to cwb-encode; and (b) cwb-encode doesn?t check the registry filename, which determines the corpus ID, to make sure it starts in a letter. I?ll see about fixing those. Best Andrew. From: cwb-bounces at sslmit.unibo.it On Behalf Of Graham Ranger -- UAPV Sent: 14 May 2025 20:09 To: Open source development of the Corpus WorkBench Subject: [CWB] Installation of a corpus and subsequent problems... Hello to everyone, The title says it all, or almost. 1) I attempted to install a corpus with a set of xml tags, etc. but ran into an error; 2) I then attempted to install a mini-corpus, in an effort at debugging, and ran into the same error (something about extra material after xml tags, repeated for every line of the corpus) -- I can't be more precise for reasons which will soon become clear; 3) I then attempted to delete the corpus which, although not created, was occupying a registry entry, and now have another error message: "**** CQP ERROR **** cl_new_corpus: <1984_1> is not a valid corpus name REGISTRY ERROR (/var/cqpweb/registry/1984_1): syntax error REGISTRY ERROR (/var/cqpweb/registry/1984_1): Error parsing the main Registry structure. CQPweb encountered an error and could not continue." 4) I am now unable to execute any queries or do anything much with cqpweb... On executing a query, for example, I get this error message: CQP reports an error! The CQP program sent back these error messages: **** CQP ERROR **** CQP Error: No corpus activated CQP Error: CQP Syntax Error: syntax error [r] Registry <-- Ignoring subsequent input until next ';'... I'm going to ask for the server to be restored to a previous state, which should provide a fix, but won't get me any further with installing the corpus I wished to set up. If there's a simpler way to repair the registry entries, I'd be interested. The "toy corpus" which managed to break the server was as follows: Nineteen eighty-four PART 1 1

It was a bright cold day in April, and the clocks were striking thirteen. Winston Smith, his chin nuzzled into his breast in an effort to escape the vile wind, slipped quickly through the glass doors of Victory Mansions, though not quickly enough to prevent a swirl of gritty dust from entering along with him.

Many thanks in advance for any help with this! Best regards, Graham. -------------- next part -------------- An HTML attachment was scrubbed... URL: From graham.ranger at univ-avignon.fr Thu May 15 13:33:16 2025 From: graham.ranger at univ-avignon.fr (Graham Ranger) Date: Thu, 15 May 2025 13:33:16 +0200 Subject: [CWB] Installation of a corpus and subsequent problems... In-Reply-To: <9a10486e-eec1-40a0-8a54-66857588c268@univ-avignon.fr> References: <9a10486e-eec1-40a0-8a54-66857588c268@univ-avignon.fr> Message-ID: Hello again, I think (shamefacedly) that the errors were simply due to feeding cqpweb a non-verticalised file. I had the fuzzy feeling that I had done this in the past, but I must have been wrong, as all the files in the upload area are in fact verticalised ?. I'll test with a properly verticalised file and, if all goes well, not report back. Best, Graham. Le 14/05/2025 ? 21:09, Graham Ranger -- UAPV a ?crit?: > Hello to everyone, > The title says it all, or almost. > > 1) I attempted to install a corpus with a set of xml tags, etc. but > ran into an error; > 2) I then attempted to install a mini-corpus, in an effort at > debugging, and ran into the same error (something about extra material > after xml tags, repeated for every line of the corpus) -- I can't be > more precise for reasons which will soon become clear; > 3) I then attempted to delete the corpus which, although not created, > was occupying a registry entry, and now have another error message: > "**** CQP ERROR **** cl_new_corpus: <1984_1> is not a valid corpus > name REGISTRY ERROR (/var/cqpweb/registry/1984_1): syntax error > REGISTRY ERROR (/var/cqpweb/registry/1984_1): Error parsing the main > Registry structure. CQPweb encountered an error and could not continue." > 4) I am now unable to execute any queries or do anything much with > cqpweb... On executing a query, for example, I get this error message: > > CQP reports an error! The CQP program sent back these error messages: > > **** CQP ERROR **** > > CQP Error: > > No corpus activated > > CQP Error: > > CQP Syntax Error: syntax error > > [r] Registry <-- > > Ignoring subsequent input until next ';'... > > I'm going to ask for the server to be restored to a previous state, > which should provide a fix, but won't get me any further with > installing the corpus I wished to set up. If there's a simpler way to > repair the registry entries, I'd be interested. > > The "toy corpus" which managed to break the server was as follows: > > > Nineteen eighty-four > > PART 1 > > 1 >

It was a bright cold day in April, and the clocks were striking > thirteen. Winston Smith, his chin nuzzled into his breast in an effort > to escape the vile wind, slipped quickly through the glass doors of > Victory Mansions, though not quickly enough to prevent a swirl of > gritty dust from entering along with him.

>
>
>
> > Many thanks in advance for any help with this! > Best regards, > Graham. > > > _______________________________________________ > CWB mailing list > CWB at sslmit.unibo.it > http://liste.sslmit.unibo.it/mailman/listinfo/cwb -------------- next part -------------- An HTML attachment was scrubbed... URL: From graham.ranger at univ-avignon.fr Thu May 15 21:31:17 2025 From: graham.ranger at univ-avignon.fr (Graham Ranger) Date: Thu, 15 May 2025 21:31:17 +0200 Subject: [CWB] [SPAM]Re: Installation of a corpus and subsequent problems... In-Reply-To: References: <9a10486e-eec1-40a0-8a54-66857588c268@univ-avignon.fr> Message-ID: <05bba0b9-671b-4d69-9587-072e7d170e59@univ-avignon.fr> Thanks for your answer, Andrew. And for the coming fix. My own message earlier ignored your answer, which had landed in my spam folder. Apologies. Best, Graham. Le 14/05/2025 ? 22:40, Hardie, Andrew a ?crit?: > > The CQP parser objects to IDs that start with a digit. Thus the > problem (?<1984_1> is not a valid corpus name?) that causes it to > yield an error as soon as it starts up and reads your registry. > > The immediate solution for you: manually delete the file > /var/cqpweb/registry/1984_1 > > And reindex with a corpus ID that starts in a letter. > > Long-term, this is a bug, because (a) CQPweb? (neither the UI nor the > backend) doesn?t check for IDs that start in a digit before feeding > them to cwb-encode; and (b) cwb-encode doesn?t check the registry > filename, which determines the corpus ID, to make sure it starts in a > letter. I?ll see about fixing those. > > Best > > Andrew. > > *From:*cwb-bounces at sslmit.unibo.it *On > Behalf Of *Graham Ranger -- UAPV > *Sent:* 14 May 2025 20:09 > *To:* Open source development of the Corpus WorkBench > > *Subject:* [CWB] Installation of a corpus and subsequent problems... > > Hello to everyone, > The title says it all, or almost. > > 1) I attempted to install a corpus with a set of xml tags, etc. but > ran into an error; > 2) I then attempted to install a mini-corpus, in an effort at > debugging, and ran into the same error (something about extra material > after xml tags, repeated for every line of the corpus) -- I can't be > more precise for reasons which will soon become clear; > 3) I then attempted to delete the corpus which, although not created, > was occupying a registry entry, and now have another error message: > "**** CQP ERROR **** cl_new_corpus: <1984_1> is not a valid corpus > name REGISTRY ERROR (/var/cqpweb/registry/1984_1): syntax error > REGISTRY ERROR (/var/cqpweb/registry/1984_1): Error parsing the main > Registry structure. CQPweb encountered an error and could not continue." > 4) I am now unable to execute any queries or do anything much with > cqpweb... On executing a query, for example, I get this error message: > > CQP reports an error! The CQP program sent back these error messages: > > **** CQP ERROR **** > > CQP Error: > > No corpus activated > > CQP Error: > > CQP Syntax Error: syntax error > > [r] Registry <-- > > Ignoring subsequent input until next ';'... > > I'm going to ask for the server to be restored to a previous state, > which should provide a fix, but won't get me any further with > installing the corpus I wished to set up. If there's a simpler way to > repair the registry entries, I'd be interested. > > The "toy corpus" which managed to break the server was as follows: > > > Nineteen eighty-four > > PART 1 > > 1 >

It was a bright cold day in April, and the clocks were striking > thirteen. Winston Smith, his chin nuzzled into his breast in an effort > to escape the vile wind, slipped quickly through the glass doors of > Victory Mansions, though not quickly enough to prevent a swirl of > gritty dust from entering along with him.

>
>
>
> > Many thanks in advance for any help with this! > Best regards, > Graham. > > > _______________________________________________ > CWB mailing list > CWB at sslmit.unibo.it > http://liste.sslmit.unibo.it/mailman/listinfo/cwb -------------- next part -------------- An HTML attachment was scrubbed... URL: From graham.ranger at univ-avignon.fr Thu May 22 13:38:21 2025 From: graham.ranger at univ-avignon.fr (Graham Ranger -- UAPV) Date: Thu, 22 May 2025 13:38:21 +0200 Subject: [CWB] Parallel corpus alignment question Message-ID: <9911a128-a56f-48be-a162-245df0529b72@univ-avignon.fr> Hello to all, I'm currently trying to set up a parallel corpus including a source text and four different translations. The method I use to set up a parallel corpus is this (copied and adapted from the cqp / cwb manuals): To set up parallel corpora: 1) Get them installed on cqpweb with the different xml tags declared, etc. 2) Use cwb-align to generate an alignment file suffixed .align, i.e. cwb-align -r /var/cqpweb/registry/ -o test.align TEST_EN TEST_FR s This indicates the registry directory explicitly with the -r option. 3) Modify the registry files using nano to indicate the other aligned corpus. Th is means modifying /var/cqpweb/registry/"my_corpus" and appending ALIGNED "other _corpus". 4) Use cwb-align-encode to point to the alignment file. This need to be done as admin i.e. with su and using -d and -r options to point to the data and registry ?directories The second command does the same thing backwards, i.e. reads the alignments the other way round, with the -R switch. cwb-align-encode -d /var/cqpweb/index/test_en/ -r /var/cqpweb/registry/ test.ali gn cwb-align-encode -d /var/cqpweb/index/test_fr/ -r /var/cqpweb/registry/ -R test. align 5) Test it out in cqpweb. Now, my question is: can I set up a parallel corpus in such a way that a search in the source will display all the aligned translations simultaneously? If so, is it just a question of following this how-to for each source-target pair, and then declaring multiple alignments in cqpweb or do I align all the text from the CLI? I hope the question is clear and thank you in advance for any guidance. Best, Graham. -------------- next part -------------- An HTML attachment was scrubbed... URL: From graham.ranger at univ-avignon.fr Thu May 22 14:03:49 2025 From: graham.ranger at univ-avignon.fr (Graham Ranger -- UAPV) Date: Thu, 22 May 2025 14:03:49 +0200 Subject: [CWB] Parallel corpus alignment question In-Reply-To: <9911a128-a56f-48be-a162-245df0529b72@univ-avignon.fr> References: <9911a128-a56f-48be-a162-245df0529b72@univ-avignon.fr> Message-ID: <4e7a0d03-2694-4bf3-98da-da5f84512346@univ-avignon.fr> And a follow-up question... could somebody tell me what the admin password is for cqpwebinabox? (I'm trying to do this on a VM with cqpwebinabox, before putting it on a public server.) Thanks again! Graham. Le 22/05/2025 ? 13:38, Graham Ranger -- UAPV a ?crit?: > Hello to all, > I'm currently trying to set up a parallel corpus including a source > text and four different translations. > The method I use to set up a parallel corpus is this (copied and > adapted from the cqp / cwb manuals): > > To set up parallel corpora: > > 1) Get them installed on cqpweb with the different xml tags declared, etc. > 2) Use cwb-align to generate an alignment file suffixed .align, i.e. > cwb-align -r /var/cqpweb/registry/ -o test.align TEST_EN TEST_FR s > This indicates the registry directory explicitly with the -r option. > 3) Modify the registry files using nano to indicate the other aligned > corpus. Th > is means modifying /var/cqpweb/registry/"my_corpus" and appending > ALIGNED "other > _corpus". > 4) Use cwb-align-encode to point to the alignment file. This need to > be done as > admin i.e. with su and using -d and -r options to point to the data > and registry > ?directories > The second command does the same thing backwards, i.e. reads the > alignments the > other way round, with the -R switch. > cwb-align-encode -d /var/cqpweb/index/test_en/ -r > /var/cqpweb/registry/ test.ali > gn > cwb-align-encode -d /var/cqpweb/index/test_fr/ -r > /var/cqpweb/registry/ -R test. > align > 5) Test it out in cqpweb. > > Now, my question is: can I set up a parallel corpus in such a way that > a search in the source will display all the aligned translations > simultaneously? > If so, is it just a question of following this how-to for each > source-target pair, and then declaring multiple alignments in cqpweb > or do I align all the text from the CLI? > I hope the question is clear and thank you in advance for any guidance. > Best, > Graham. > > _______________________________________________ > CWB mailing list > CWB at sslmit.unibo.it > http://liste.sslmit.unibo.it/mailman/listinfo/cwb -------------- next part -------------- An HTML attachment was scrubbed... URL: From graham.ranger at univ-avignon.fr Thu May 22 14:27:18 2025 From: graham.ranger at univ-avignon.fr (Graham Ranger -- UAPV) Date: Thu, 22 May 2025 14:27:18 +0200 Subject: [CWB] Parallel corpus alignment question In-Reply-To: <4e7a0d03-2694-4bf3-98da-da5f84512346@univ-avignon.fr> References: <9911a128-a56f-48be-a162-245df0529b72@univ-avignon.fr> <4e7a0d03-2694-4bf3-98da-da5f84512346@univ-avignon.fr> Message-ID: <0a2e7df0-ccf2-4435-8aaf-3a42141d1df3@univ-avignon.fr> OK... I'll answer my own questions (this is becoming a habit, apologies!). (Please correct me, elaborate, etc. if necessary.) 1) the admin pw is "user", but it seems to work only with sudo, not with su 2) I have aligned source texts with target texts, cqpweb doesn't appear to be able to display all the alignments simultaneously but does enable quick switching which is good. Apologies for troubling you but perhaps this might be of help to other cqpweb[inabox] users. Best, Graham. Le 22/05/2025 ? 14:03, Graham Ranger -- UAPV a ?crit?: > And a follow-up question... could somebody tell me what the admin > password is for cqpwebinabox? (I'm trying to do this on a VM with > cqpwebinabox, before putting it on a public server.) > Thanks again! > Graham. > > > Le 22/05/2025 ? 13:38, Graham Ranger -- UAPV a ?crit?: >> Hello to all, >> I'm currently trying to set up a parallel corpus including a source >> text and four different translations. >> The method I use to set up a parallel corpus is this (copied and >> adapted from the cqp / cwb manuals): >> >> To set up parallel corpora: >> >> 1) Get them installed on cqpweb with the different xml tags declared, >> etc. >> 2) Use cwb-align to generate an alignment file suffixed .align, i.e. >> cwb-align -r /var/cqpweb/registry/ -o test.align TEST_EN TEST_FR s >> This indicates the registry directory explicitly with the -r option. >> 3) Modify the registry files using nano to indicate the other aligned >> corpus. Th >> is means modifying /var/cqpweb/registry/"my_corpus" and appending >> ALIGNED "other >> _corpus". >> 4) Use cwb-align-encode to point to the alignment file. This need to >> be done as >> admin i.e. with su and using -d and -r options to point to the data >> and registry >> ?directories >> The second command does the same thing backwards, i.e. reads the >> alignments the >> other way round, with the -R switch. >> cwb-align-encode -d /var/cqpweb/index/test_en/ -r >> /var/cqpweb/registry/ test.ali >> gn >> cwb-align-encode -d /var/cqpweb/index/test_fr/ -r >> /var/cqpweb/registry/ -R test. >> align >> 5) Test it out in cqpweb. >> >> Now, my question is: can I set up a parallel corpus in such a way >> that a search in the source will display all the aligned translations >> simultaneously? >> If so, is it just a question of following this how-to for each >> source-target pair, and then declaring multiple alignments in cqpweb >> or do I align all the text from the CLI? >> I hope the question is clear and thank you in advance for any guidance. >> Best, >> Graham. >> >> _______________________________________________ >> CWB mailing list >> CWB at sslmit.unibo.it >> http://liste.sslmit.unibo.it/mailman/listinfo/cwb > > > _______________________________________________ > CWB mailing list > CWB at sslmit.unibo.it > http://liste.sslmit.unibo.it/mailman/listinfo/cwb -------------- next part -------------- An HTML attachment was scrubbed... URL: From graham.ranger at univ-avignon.fr Sat May 31 11:43:10 2025 From: graham.ranger at univ-avignon.fr (Graham Ranger -- UAPV) Date: Sat, 31 May 2025 11:43:10 +0200 Subject: [CWB] Restrictions on lemma annotation Message-ID: <51da41f3-ee8b-45b4-a28d-a55879b63bfd@univ-avignon.fr> Hello, In a corpus I'm setting up, using treetagger with a parameter file for classical French, there are a number of alternative lemmata, i.e. things like: eau??? Nc??? eau|eaux [Nc: common noun] I'm not entirely sure why, since there is no ambiguity here, but as a result it is impossible to search for the lemma "eau". Are there any solutions to other than simply opting to remove the pipe and what comes after it from column three of the vrt file to allow querying only for the first choice of lemma? Many thanks in advance. Graham. -------------- next part -------------- An HTML attachment was scrubbed... URL: From chozelinek at gmail.com Sat May 31 13:14:03 2025 From: chozelinek at gmail.com (=?UTF-8?B?Sm9zw6kgTWFudWVsIE1hcnTDrW5leiBNYXJ0w61uZXo=?=) Date: Sat, 31 May 2025 13:14:03 +0200 Subject: [CWB] Error with categorise query Message-ID: Hi there, I'm using an install of CQPweb version v3.2.44. I'm trying to create a categorised query, when I configure the different labels and choose the default one, I click on submit and I get a cryptic 500 error message: This page isn?t working 150.128.97.141 is currently unable to handle this request. HTTP ERROR 500 Can anyone guide me on how to resolve this? This feature was working in the past. Something has gone wrong. But I cannot figure it out. Thanks in advance for your help -- Jos? Manuel Mart?nez Mart?nez https://chozelinek.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: