<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Thank you Susanne for your quick answer.<br>
Until now I have only tried automatic indexing through CQPweb.<br>
I guess I will need to dig a bit more CQP encoding options in order
to have it work.<br>
Thank you for putting me on the right track, Philippe <br>
<br>
<div class="moz-cite-prefix">On 06/10/2016 02:54 PM, Susanne Flach
wrote:<br>
</div>
<blockquote
cite="mid:0D46CF34-D737-498E-AB4B-4577B0F4C045@fu-berlin.de"
type="cite">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
Dear Philippe,
<div class=""><br class="">
</div>
<div class="">Have you tried declaring nested XML elements with :0
as described in Sec 4?</div>
<div class=""><a moz-do-not-send="true"
href="http://cwb.sourceforge.net/files/CWB_Encoding_Tutorial/node5.html"
class="">http://cwb.sourceforge.net/files/CWB_Encoding_Tutorial/node5.html</a></div>
<div class=""><br class="">
</div>
<div class="">I’ve never had your problem, but I have always used
the :0.</div>
<div class=""><br class="">
</div>
<div class="">Best,</div>
<div class="">Susanne<br class="">
<div apple-content-edited="true" class="">
<div class="">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space;
-webkit-line-break: after-white-space;" class=""><br
class="">
--<br class="">
Susanne Flach, M.A.<br class="">
Arbeitsbereich Linguistik<br class="">
Institut für Englische Philologie<br class="">
Freie Universität Berlin<br class="">
Habelschwerdter Allee 45<br class="">
14195 Berlin<br class="">
<br class="">
</div>
<div style="word-wrap: break-word; -webkit-nbsp-mode: space;
-webkit-line-break: after-white-space;" class="">NEU! <a
moz-do-not-send="true"
href="http://userpage.fu-berlin.de/%7Eflach/corpling/"
class="">Korpustutorium mit CQP</a><br class="">
<br class="">
<a moz-do-not-send="true"
href="http://userpage.fu-berlin.de/%7Eflach/" class="">http://userpage.fu-berlin.de/~flach/</a><br
class="">
<br class="">
Raum JK29/223<br class="">
Telefon +49 30 838 72311</div>
</div>
</div>
<br class="">
<div>
<blockquote type="cite" class="">
<div class="">On 10 Jun 2016, at 14:39, Philippe Baudrion
<<a moz-do-not-send="true"
href="mailto:Philippe.Baudrion@unige.ch" class="">Philippe.Baudrion@unige.ch</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div text="#000000" bgcolor="#FFFFFF" class=""> Dear all,<br
class="">
I am trying to index the following corpus structure but
it is not working. Here is an extract of the corpus:<br
class="">
<br class="">
<small class=""><text id="FR_DI_2000_1"
organisation="CERD" country="Francia" type="Documento
informativo" year="2000"
signature="CERD/C/SR.1373"><br class="">
<s id="1"><br class="">
<seg lang="fr"><br class="">
La<br class="">
séance<br class="">
est<br class="">
ouverte<br class="">
à<br class="">
10h05<br class="">
.<br class="">
</seg><br class="">
<seg lang="es"><br class="">
Se<br class="">
declara<br class="">
abierta<br class="">
la<br class="">
sesión<br class="">
a<br class="">
las<br class="">
10.05<br class="">
horas<br class="">
.<br class="">
</seg><br class="">
</s><br class="">
...<br class="">
</text><br class="">
</small><br class="">
The corresponding files on the disk drive remains empty:<br
class="">
<pre class="">> ll /export/data/CQPweb_data/corpus/test_pb_fr_es/
total 120
drwxr-xr-x 2 www-data www-data 4096 Jun 6 12:18 ./
drwxrwxr-x 58 www-data letrint 4096 Jun 6 12:18 ../
-rw-r--r-- 1 www-data www-data 0 Jun 6 12:18 seg_lang.avs
-rw-r--r-- 1 www-data www-data 0 Jun 6 12:18 seg_lang.avx
-rw-r--r-- 1 www-data www-data 0 Jun 6 12:18 seg_lang.rng
-rw-r--r-- 1 www-data www-data 0 Jun 6 12:18 seg.rng
-rw-r--r-- 1 www-data www-data 0 Jun 6 12:18 s_id.avs
-rw-r--r-- 1 www-data www-data 0 Jun 6 12:18 s_id.avx
-rw-r--r-- 1 www-data www-data 0 Jun 6 12:18 s_id.rng
-rw-r--r-- 1 www-data www-data 0 Jun 6 12:18 s.rng
-rw-r--r-- 1 www-data www-data 8 Jun 6 12:18 text_country.avs
-rw-r--r-- 1 www-data www-data 8 Jun 6 12:18 text_country.avx
-rw-r--r-- 1 www-data www-data 8 Jun 6 12:18 text_country.rng
-rw-r--r-- 1 www-data www-data 13 Jun 6 12:18 text_id.avs
-rw-r--r-- 1 www-data www-data 8 Jun 6 12:18 text_id.avx
-rw-r--r-- 1 www-data www-data 8 Jun 6 12:18 text_id.rng
...</pre>
<br class="">
The indexing command is as follow:<br class="">
<pre class=""><pre class="">> cwb-encode -xsB -c utf8 -d /export/data/CQPweb_data/corpus/test_pb_fr_es -f /export/data/CQPweb_data/upload/Test-PB-FR_ES.vrt -R "/export/data/CQPweb_data/registry/test_pb_fr_es" -S text+id+organisation+country+type+year+signature -S s+id -S seg+lang 2>&1
> cwb-makeall -r "/export/data/CQPweb_data/registry" -V TEST_PB_FR_ES 2>&1
<big class="">I guess due to the redundence of the <seg> element it is impossible to correctely index that corpus, but I want to have your opinion on that.
In case it is possible, what would then be the correct indexing command.
Thank you for your help, greetings,
</big></pre></pre>
<pre class="moz-signature" cols="72">--
Baudrion Philippe
Correspondant Informatique
UNIVERSITE DE GENEVE
Faculté de traduction et d'interprétation
40, bd. du Pont d'Arve
1211 GENEVE 4
Tél +41 22 379 94 95
</pre>
</div>
_______________________________________________<br
class="">
CWB mailing list<br class="">
<a moz-do-not-send="true"
href="mailto:CWB@sslmit.unibo.it" class="">CWB@sslmit.unibo.it</a><br
class="">
<a class="moz-txt-link-freetext" href="http://devel.sslmit.unibo.it/mailman/listinfo/cwb">http://devel.sslmit.unibo.it/mailman/listinfo/cwb</a><br
class="">
</div>
</blockquote>
</div>
<br class="">
</div>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
Baudrion Philippe
Correspondant Informatique
UNIVERSITE DE GENEVE
Faculté de traduction et d'interprétation
40, bd. du Pont d'Arve
1211 GENEVE 4
Tél +41 22 379 94 95
</pre>
</body>
</html>