<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<font size="+1">Hi<font size="+1">, Pavel:<br>
<font size="+1">My dirty way is to<font size="+1"> launch a
query <font size="+1">with the restrictions for that <font
size="+1">subcorpus. Take the EUROPARL-EN, element
speaker has <font size="+1"><font size="+1"><font
size="+1"><font size="+1">an attri<font size="+1">bute
called language, indicating the source
language of th<font size="+1">e tokens
contained in that element</font>. If I only
want the <font size="+1">tokens <font
size="+1">in English y run th<font
size="+1">is query:<br>
<br>
[]<font size="+1"> <font size="+1">::
ma<font size="+1">tch.speaker<font
size="+1">_language="<font
size="+1">DE</font>";<br>
<font size="+1"><br>
If you do<font size="+1">:<br>
<font size="+1"><br>
size Last<font
size="+1">;<br>
<font size="+1"><br>
You get the size in<font
size="+1"> tok<font
size="+1">ens<font
size="+1">, in
this case
5532412.<br>
</font><br>
<font size="+1">When
I want to
calculate the
same but for all
the subc<font
size="+1">orpora
at once<font
size="+1"> (in
<font
size="+1">my
case all subc<font
size="+1">orpora
according to
the source
language)</font>:<br>
<br>
<font
size="+1">
[]<font
size="+1">;<br>
<br>
<font
size="+1">
group<font
size="+1">
Last match <font
size="+1">verbalization_language;<br>
<br>
<font
size="+1">Then
you get a
table <font
size="+1">similar
to:<br>
<br>
<font
size="+1">DE
5532412<br>
FR 4921250<br>
NL 3003754<br>
ES 2772929<br>
IT 2407213<br>
PT 1665839<br>
EL 1382710<br>
SV 1378828<br>
DA 698575<br>
FI 571006<br>
PL 363083<br>
<font
size="+1">...
...<br>
<br>
<font
size="+1">I
hope it hel<font
size="+1">ps<font
size="+1"><font
size="+1">!</font></font></font></font><br>
</font></font></font></font><br>
</font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font>Best,<br>
<br>
<font size="+1">jmm</font><br>
<br>
</font></font></font></font></font></font></font></font></font></font></font></font></font></font>
<div class="moz-cite-prefix">El 15/04/13 20:53, Pavel Vondřička
escribió:<br>
</div>
<blockquote cite="mid:1507813.WnCAretq9Z@platyz" type="cite">
<pre wrap="">Excuse me, but is there any way to get the size of a subcorpus in tokens?
Somehow I cannot find such a basic thing, sorry.
Thanks,
Pavel
_______________________________________________
CWB mailing list
<a class="moz-txt-link-abbreviated" href="mailto:CWB@sslmit.unibo.it">CWB@sslmit.unibo.it</a>
<a class="moz-txt-link-freetext" href="http://devel.sslmit.unibo.it/mailman/listinfo/cwb">http://devel.sslmit.unibo.it/mailman/listinfo/cwb</a>
</pre>
</blockquote>
<br>
</body>
</html>