<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">Thanks a lot for your help, Serge! I will study all this information and I hope to solve the problem.<div><br></div><div>Best,</div><div><br></div><div>Teresa<br><div><br></div><div><br><div><div>El 18/10/2014, a las 13:43, Serge Heiden &lt;<a href="mailto:slh@ens-lyon.fr">slh@ens-lyon.fr</a>&gt; escribió:</div><br class="Apple-interchange-newline"><blockquote type="cite">

    <meta content="text/html; charset=windows-1252" http-equiv="Content-Type">

  <div bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">Teresa,<br>

      I forgot to also mention Unicode punctuation character classes.<br>

      If your corpus is encoded in Unicode, you can express punctuation

      marks character classes on word forms in your queries.<br>

      For example, a search for [word="\p{P}+"] should give you all

      punctuations marks of your corpus.<br>

      And [word!="\p{P}+"] your tokens.<br>

      Best,<br>

      Serge<br>

      <br>

      Le 18/10/2014 13:23, Serge Heiden a écrit&nbsp;:<br>

    </div>

    <blockquote cite="mid:54424DBA.9040804@ens-lyon.fr" type="cite">

      <meta content="text/html; charset=windows-1252" http-equiv="Content-Type">

      <div class="moz-cite-prefix">Teresa,<br>

        <br>

        You need to know how your corpus has been tokenized (segmented

        into a sequence of tokens and punctuation marks to use your

        terminology), which is a process done before and outside of CQP.<br>

        If your corpus provides word properties giving information about

        punctuation status or equivalent you should also be able to

        access such information.<br>

        If your corpus has no documentation about that, you should ask

        the provider of the corpus.<br>

        As a last resort, as an approximation at least for roman

        languages, you can search your corpus for frequent words with a

        short form.<br>

        For example the most frequent words matching [word="."] are

        globally punctuation marks, with some mix of grammatical words

        (auxiliary, pronouns...).<br>

        Then you can explore frequent words of length two: [word=".."],

        etc.<br>

        This is why I suggested to search for words of length longer

        than one character: [word!="."]<br>

        <br>

        Best,<br>

        Serge<br>

        <br>

        Le 18/10/2014 12:44, Teresa Molés Cases a écrit&nbsp;:<br>

      </div>

      <blockquote cite="mid:238E2159-C903-4434-A4D9-54D672B202BC@gmail.com" type="cite">

        <meta http-equiv="Content-Type" content="text/html;

          charset=windows-1252">

        Hi Serge,

        <div><br>

        </div>

        <div>Thank you a lot for your answer, but this query does not

          seem to work in my corpus. Could you please tell me how can I

          get the information about the surface forms of punctuations in

          my corpus? If it is not much effort, of course.</div>

        <div><br>

        </div>

        <div>Thanks a lot! Best,</div>

        <div><br>

        </div>

        <div>Teresa</div>

        <div><br>

        </div>

        <div><br>

          <div>

            <div>El 17/10/2014, a las 23:10, Serge Heiden &lt;<a moz-do-not-send="true" href="mailto:slh@ens-lyon.fr">slh@ens-lyon.fr</a>&gt;

              escribió:</div>

            <br class="Apple-interchange-newline">

            <blockquote type="cite">

              <meta content="text/html; charset=windows-1252" http-equiv="Content-Type">

              <div bgcolor="#FFFFFF" text="#000000">

                <div class="moz-cite-prefix">Hi,<br>

                  <br>

                  Le 17/10/2014 20:28, Teresa Molés Cases a écrit&nbsp;:<br>

                </div>

                <blockquote cite="mid:25135581-1827-4799-8A17-5BFA30E13CA2@gmail.com" type="cite">

                  <meta http-equiv="Content-Type" content="text/html;

                    charset=windows-1252">

                  <div>I have a question regarding the counting of

                    tokens in CQP. I know that the exact query would be&nbsp;<span style="font-size: 11px;">DICKENS&gt; Q1 = []; size

                      Q1;</span></div>

                  <div><br>

                  </div>

                  <div>But I have also read that this search would count

                    not only tokens but also punctuation marks. Is that

                    right?</div>

                </blockquote>

                Yes<br>

                <blockquote cite="mid:25135581-1827-4799-8A17-5BFA30E13CA2@gmail.com" type="cite">

                  <div> Is it possible in CQP to count just tokens (not

                    including punctuation marks)?<br>

                  </div>

                </blockquote>

                Sure, just ask for something different from a

                punctuation mark in your query instead of any

                "word"/token.<br>

                For example : <span style="font-size: 11px;">DICKENS&gt;

                  Q1 = [word!="."&amp;word!="''|``"|word="[ai]"%c]; size

                  Q1;</span><br>

                (to formulate such a query, you need to know the surface

                forms of punctuations in your corpus)<br>

                <br>

                Of course it would be better if you run a tagger or a

                syntactic analyzer on your sources before CQP to tel it<br>

                what property could be used to filter punctuations (and

                not only 'word' forms).<br>

                <br>

                You can also filter punctuations from the sources before

                CQP encode and makeall, in which case your original

                query will work.<br>

                But a corpus without punctuation is difficult to read.

                Another strategy is to have two versions of your corpus:

                one with<br>

                punctuations and one without, depending on the queries

                you need to run.<br>

                <br>

                Best,<br>

                Serge<br>

                <pre class="moz-signature" cols="72">-- 

Dr. Serge Heiden, <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:slh@ens-lyon.fr">slh@ens-lyon.fr</a>, <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://textometrie.ens-lyon.fr/">http://textometrie.ens-lyon.fr</a>

ENS de Lyon/CNRS - ICAR UMR5191, Institut de Linguistique Française

15, parvis René Descartes 69342 Lyon BP7000 Cedex, tél. +33622003883</pre>

              </div>

              _______________________________________________<br>

              CWB mailing list<br>

              <a moz-do-not-send="true" href="mailto:CWB@sslmit.unibo.it">CWB@sslmit.unibo.it</a><br>

              <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://devel.sslmit.unibo.it/mailman/listinfo/cwb">http://devel.sslmit.unibo.it/mailman/listinfo/cwb</a><br>

            </blockquote>

          </div>

          <br>

          <div apple-content-edited="true"> <span class="Apple-style-span" style="border-collapse: separate; font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; border-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-stroke-width: 0px;"><span class="Apple-style-span" style="border-collapse: separate; font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; border-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-stroke-width: 0px;">

                <div style="word-wrap: break-word; -webkit-nbsp-mode:

                  space; -webkit-line-break: after-white-space; ">

                  <div>Teresa Molés Cases</div>

                  <div>Traductora EN/DE/FR &gt; ES/CAT</div>

                  <div><a moz-do-not-send="true" href="mailto:teresamoles@gmail.com">teresamoles@gmail.com</a></div>

                  <div>667848390</div>

                  <div><br>

                  </div>

                </div>

              </span><br class="Apple-interchange-newline">

            </span><br class="Apple-interchange-newline">

          </div>

          <br>

        </div>

        <br>

        <fieldset class="mimeAttachmentHeader"></fieldset>

        <br>

        <pre wrap="">_______________________________________________

CWB mailing list

<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:CWB@sslmit.unibo.it">CWB@sslmit.unibo.it</a>

<a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://devel.sslmit.unibo.it/mailman/listinfo/cwb">http://devel.sslmit.unibo.it/mailman/listinfo/cwb</a>

</pre>

      </blockquote>

      <br>

      <br>

      <pre class="moz-signature" cols="72">-- 

Dr. Serge Heiden, <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:slh@ens-lyon.fr">slh@ens-lyon.fr</a>, <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://textometrie.ens-lyon.fr/">http://textometrie.ens-lyon.fr</a>

ENS de Lyon/CNRS - ICAR UMR5191, Institut de Linguistique Française

15, parvis René Descartes 69342 Lyon BP7000 Cedex, tél. +33622003883</pre>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

CWB mailing list

<a class="moz-txt-link-abbreviated" href="mailto:CWB@sslmit.unibo.it">CWB@sslmit.unibo.it</a>

<a class="moz-txt-link-freetext" href="http://devel.sslmit.unibo.it/mailman/listinfo/cwb">http://devel.sslmit.unibo.it/mailman/listinfo/cwb</a>

</pre>

    </blockquote>

    <br>

    <br>

    <pre class="moz-signature" cols="72">-- 

Dr. Serge Heiden, <a class="moz-txt-link-abbreviated" href="mailto:slh@ens-lyon.fr">slh@ens-lyon.fr</a>, <a class="moz-txt-link-freetext" href="http://textometrie.ens-lyon.fr/">http://textometrie.ens-lyon.fr</a>

ENS de Lyon/CNRS - ICAR UMR5191, Institut de Linguistique Française

15, parvis René Descartes 69342 Lyon BP7000 Cedex, tél. +33622003883</pre>

  </div>

_______________________________________________<br>CWB mailing list<br><a href="mailto:CWB@sslmit.unibo.it">CWB@sslmit.unibo.it</a><br>http://devel.sslmit.unibo.it/mailman/listinfo/cwb<br></blockquote></div><br><div apple-content-edited="true">

<span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;  "><span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;  "><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div>Teresa Molés Cases</div><div>Traductora EN/DE/FR &gt; ES/CAT</div><div><a href="mailto:teresamoles@gmail.com">teresamoles@gmail.com</a></div><div>667848390</div><div><br></div></div></span><br class="Apple-interchange-newline"></span><br class="Apple-interchange-newline">

</div>

<br></div></div></body></html>