<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
        {font-family:Verdana;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
p.MsoAcetate, li.MsoAcetate, div.MsoAcetate
        {mso-style-priority:99;
        mso-style-link:"Balloon Text Char";
        margin:0cm;
        margin-bottom:.0001pt;
        font-size:8.0pt;
        font-family:"Tahoma","sans-serif";}
span.BalloonTextChar
        {mso-style-name:"Balloon Text Char";
        mso-style-priority:99;
        mso-style-link:"Balloon Text";
        font-family:"Tahoma","sans-serif";
        mso-fareast-language:EN-GB;}
span.EmailStyle19
        {mso-style-type:personal-reply;
        font-family:"Verdana","sans-serif";
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-family:"Calibri","sans-serif";
        mso-fareast-language:EN-US;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-GB" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D">No, that’s not possible, though I would consider adding it if the demand was high.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D">Andrew.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><b><span lang="EN-US" style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span lang="EN-US" style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> cwb-bounces@sslmit.unibo.it [mailto:cwb-bounces@sslmit.unibo.it]
<b>On Behalf Of </b>Stefania Spina<br>
<b>Sent:</b> 21 June 2015 12:10<br>
<b>To:</b> Open source development of the Corpus WorkBench<br>
<b>Subject:</b> Re: [CWB] problems with Cqpweb and frequency lists<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">Thank you Andrew.<o:p></o:p></p>
<div>
<p class="MsoNormal">Or, as a third choice, prevent users from viewing frequency lists. Is this possible in Cqpweb? I mean, setting privileges so as users can make queries and use all the CQP functions, except viewing frequency lists?<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">Thank you again,<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">Stefania<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">2015-06-20 15:14 GMT+02:00 Hardie, Andrew <<a href="mailto:a.hardie@lancaster.ac.uk" target="_blank">a.hardie@lancaster.ac.uk</a>>:<o:p></o:p></p>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D">Hi Stefania,</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D">This is a known problem which arises from the fact that the available MySQL collations for sorting-and-merging
strings considered “equal” even if they are not do not match the case/diacritic folding in CWB.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D">There are two choices made possible by the available collations: the behaviour you have currently,
in which all accents and case distinctions are ignored when collating; OR, a collation which doesn’t merge
<i>anything</i>, i.e. it treats accented characters as distinct, but also treats case distinctions as significant.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D">You can engage the latter mode under the “Corpus settings” option in the main screen menu. If you
set “</span><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:black;background:#D5D5D5">Corpus requires case-sensitive collation for string comparison and searches </span><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D">”
to “yes”, you will switch the collation over to the case/diacritic-sensitive mode. Please note well the warning about the need to rebuild all frequency tables.
</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D">I have a long-term idea for a solution to this but unfortunately (a) I don’t know yet whether it
will work, (b) even if does, it will take a long time to implement. </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D">The solution in question involves MySQL custom collations and the big open question is the impact
they have on performance. If anyone has experience with custom collations, your input here would be welcome.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D">best</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D">Andrew.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><b><span lang="EN-US" style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span lang="EN-US" style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">
<a href="mailto:cwb-bounces@sslmit.unibo.it" target="_blank">cwb-bounces@sslmit.unibo.it</a> [mailto:<a href="mailto:cwb-bounces@sslmit.unibo.it" target="_blank">cwb-bounces@sslmit.unibo.it</a>]
<b>On Behalf Of </b>Stefania Spina<br>
<b>Sent:</b> 20 June 2015 10:21<br>
<b>To:</b> cwb<br>
<b>Subject:</b> [CWB] problems with Cqpweb and frequency lists</span><o:p></o:p></p>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Hello,<o:p></o:p></p>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">I have an Italian corpus indexed in Cqpweb (v3.1.13); the corpus is encoded in iso-8859-1.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">When I use frequency lists, it seems that accented and non-accented characters are not properly distinguished. For example, in the word frequency list, the word "è" combines the
frequency values of "è" and "e", and the unaccented word "e" is not included in the frequency list.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">This does not happen in the queries, where accented and non accented characters are perfectly distinguished.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Is there a way I can solve this problem?<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Thank you for your help,<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Stefania<br clear="all">
<o:p></o:p></p>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
</div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">--
<o:p></o:p></p>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;margin-bottom:12.0pt">Stefania Spina<br>
Università per Stranieri di Perugia<br>
Dipartimento di Scienze Umane e Sociali<br>
<a href="mailto:stefania.spina@unistrapg.it" target="_blank">stefania.spina@unistrapg.it</a><br>
<a href="https://unistrapg.academia.edu/StefaniaSpina" target="_blank">https://unistrapg.academia.edu/StefaniaSpina</a><o:p></o:p></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><br>
_______________________________________________<br>
CWB mailing list<br>
<a href="mailto:CWB@sslmit.unibo.it">CWB@sslmit.unibo.it</a><br>
<a href="http://devel.sslmit.unibo.it/mailman/listinfo/cwb" target="_blank">http://devel.sslmit.unibo.it/mailman/listinfo/cwb</a><o:p></o:p></p>
</div>
<p class="MsoNormal"><br>
<br clear="all">
<o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<p class="MsoNormal">-- <o:p></o:p></p>
<div>
<p class="MsoNormal">Stefania Spina<br>
Università per Stranieri di Perugia<br>
Dipartimento di Scienze Umane e Sociali<br>
<a href="mailto:stefania.spina@unistrapg.it" target="_blank">stefania.spina@unistrapg.it</a><br>
<a href="https://unistrapg.academia.edu/StefaniaSpina" target="_blank">https://unistrapg.academia.edu/StefaniaSpina</a><o:p></o:p></p>
</div>
</div>
</div>
</div>
</body>
</html>