<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">Hi,<br>

      <br>

      Le 17/10/2014 20:28, Teresa Molés Cases a écrit :<br>

    </div>

    <blockquote

      cite="mid:25135581-1827-4799-8A17-5BFA30E13CA2@gmail.com"

      type="cite">

      <meta http-equiv="Content-Type" content="text/html;

        charset=windows-1252">

      <div>I have a question regarding the counting of tokens in CQP. I

        know that the exact query would be <span style="font-size:

          11px;">DICKENS&gt; Q1 = []; size Q1;</span></div>

      <div><br>

      </div>

      <div>But I have also read that this search would count not only

        tokens but also punctuation marks. Is that right?</div>

    </blockquote>

    Yes<br>

    <blockquote

      cite="mid:25135581-1827-4799-8A17-5BFA30E13CA2@gmail.com"

      type="cite">

      <div> Is it possible in CQP to count just tokens (not including

        punctuation marks)?<br>

      </div>

    </blockquote>

    Sure, just ask for something different from a punctuation mark in

    your query instead of any "word"/token.<br>

    For example : <span style="font-size: 11px;">DICKENS&gt; Q1 =

      [word!="."&amp;word!="''|``"|word="[ai]"%c]; size Q1;</span><br>

    (to formulate such a query, you need to know the surface forms of

    punctuations in your corpus)<br>

    <br>

    Of course it would be better if you run a tagger or a syntactic

    analyzer on your sources before CQP to tel it<br>

    what property could be used to filter punctuations (and not only

    'word' forms).<br>

    <br>

    You can also filter punctuations from the sources before CQP encode

    and makeall, in which case your original query will work.<br>

    But a corpus without punctuation is difficult to read. Another

    strategy is to have two versions of your corpus: one with<br>

    punctuations and one without, depending on the queries you need to

    run.<br>

    <br>

    Best,<br>

    Serge<br>

    <pre class="moz-signature" cols="72">-- 

Dr. Serge Heiden, <a class="moz-txt-link-abbreviated" href="mailto:slh@ens-lyon.fr">slh@ens-lyon.fr</a>, <a class="moz-txt-link-freetext" href="http://textometrie.ens-lyon.fr">http://textometrie.ens-lyon.fr</a>

ENS de Lyon/CNRS - ICAR UMR5191, Institut de Linguistique Française

15, parvis René Descartes 69342 Lyon BP7000 Cedex, tél. +33622003883</pre>

  </body>

</html>