<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Dear all,<br>
I am trying to index the following corpus structure but it is not
working. Here is an extract of the corpus:<br>
<br>
<small><text id="FR_DI_2000_1" organisation="CERD"
country="Francia" type="Documento informativo" year="2000"
signature="CERD/C/SR.1373"><br>
<s id="1"><br>
<seg lang="fr"><br>
La<br>
séance<br>
est<br>
ouverte<br>
à<br>
10h05<br>
.<br>
</seg><br>
<seg lang="es"><br>
Se<br>
declara<br>
abierta<br>
la<br>
sesión<br>
a<br>
las<br>
10.05<br>
horas<br>
.<br>
</seg><br>
</s><br>
...<br>
</text><br>
</small><br>
The corresponding files on the disk drive remains empty:<br>
<pre>> ll /export/data/CQPweb_data/corpus/test_pb_fr_es/
total 120
drwxr-xr-x 2 www-data www-data 4096 Jun 6 12:18 ./
drwxrwxr-x 58 www-data letrint 4096 Jun 6 12:18 ../
-rw-r--r-- 1 www-data www-data 0 Jun 6 12:18 seg_lang.avs
-rw-r--r-- 1 www-data www-data 0 Jun 6 12:18 seg_lang.avx
-rw-r--r-- 1 www-data www-data 0 Jun 6 12:18 seg_lang.rng
-rw-r--r-- 1 www-data www-data 0 Jun 6 12:18 seg.rng
-rw-r--r-- 1 www-data www-data 0 Jun 6 12:18 s_id.avs
-rw-r--r-- 1 www-data www-data 0 Jun 6 12:18 s_id.avx
-rw-r--r-- 1 www-data www-data 0 Jun 6 12:18 s_id.rng
-rw-r--r-- 1 www-data www-data 0 Jun 6 12:18 s.rng
-rw-r--r-- 1 www-data www-data 8 Jun 6 12:18 text_country.avs
-rw-r--r-- 1 www-data www-data 8 Jun 6 12:18 text_country.avx
-rw-r--r-- 1 www-data www-data 8 Jun 6 12:18 text_country.rng
-rw-r--r-- 1 www-data www-data 13 Jun 6 12:18 text_id.avs
-rw-r--r-- 1 www-data www-data 8 Jun 6 12:18 text_id.avx
-rw-r--r-- 1 www-data www-data 8 Jun 6 12:18 text_id.rng
...</pre>
<br>
The indexing command is as follow:<br>
<pre><pre>> cwb-encode -xsB -c utf8 -d /export/data/CQPweb_data/corpus/test_pb_fr_es -f /export/data/CQPweb_data/upload/Test-PB-FR_ES.vrt -R "/export/data/CQPweb_data/registry/test_pb_fr_es" -S text+id+organisation+country+type+year+signature -S s+id -S seg+lang 2>&1
> cwb-makeall -r "/export/data/CQPweb_data/registry" -V TEST_PB_FR_ES 2>&1
<big>I guess due to the redundence of the <seg> element it is impossible to correctely index that corpus, but I want to have your opinion on that.
In case it is possible, what would then be the correct indexing command.
Thank you for your help, greetings,
</big></pre></pre>
<pre class="moz-signature" cols="72">--
Baudrion Philippe
Correspondant Informatique
UNIVERSITE DE GENEVE
Faculté de traduction et d'interprétation
40, bd. du Pont d'Arve
1211 GENEVE 4
Tél +41 22 379 94 95
</pre>
</body>
</html>