<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Hi,<br>
<br>
Le 17/10/2014 20:28, Teresa Molés Cases a écrit :<br>
</div>
<blockquote
cite="mid:25135581-1827-4799-8A17-5BFA30E13CA2@gmail.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
<div>I have a question regarding the counting of tokens in CQP. I
know that the exact query would be <span style="font-size:
11px;">DICKENS> Q1 = []; size Q1;</span></div>
<div><br>
</div>
<div>But I have also read that this search would count not only
tokens but also punctuation marks. Is that right?</div>
</blockquote>
Yes<br>
<blockquote
cite="mid:25135581-1827-4799-8A17-5BFA30E13CA2@gmail.com"
type="cite">
<div> Is it possible in CQP to count just tokens (not including
punctuation marks)?<br>
</div>
</blockquote>
Sure, just ask for something different from a punctuation mark in
your query instead of any "word"/token.<br>
For example : <span style="font-size: 11px;">DICKENS> Q1 =
[word!="."&word!="''|``"|word="[ai]"%c]; size Q1;</span><br>
(to formulate such a query, you need to know the surface forms of
punctuations in your corpus)<br>
<br>
Of course it would be better if you run a tagger or a syntactic
analyzer on your sources before CQP to tel it<br>
what property could be used to filter punctuations (and not only
'word' forms).<br>
<br>
You can also filter punctuations from the sources before CQP encode
and makeall, in which case your original query will work.<br>
But a corpus without punctuation is difficult to read. Another
strategy is to have two versions of your corpus: one with<br>
punctuations and one without, depending on the queries you need to
run.<br>
<br>
Best,<br>
Serge<br>
<pre class="moz-signature" cols="72">--
Dr. Serge Heiden, <a class="moz-txt-link-abbreviated" href="mailto:slh@ens-lyon.fr">slh@ens-lyon.fr</a>, <a class="moz-txt-link-freetext" href="http://textometrie.ens-lyon.fr">http://textometrie.ens-lyon.fr</a>
ENS de Lyon/CNRS - ICAR UMR5191, Institut de Linguistique Française
15, parvis René Descartes 69342 Lyon BP7000 Cedex, tél. +33622003883</pre>
</body>
</html>