<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">OK. Thanks! With what you said I went
to the page for the corpus project and I found the tagset
(<a class="moz-txt-link-freetext" href="http://www.linguist.is/icelandic_treebank/Tagset">http://www.linguist.is/icelandic_treebank/Tagset</a>). Now I'm a
happy camper. <br>
</div>
<blockquote
cite="mid:28078EC3FBF1B940A3EF3D0D19BE351D0E8E7A@EX-0-MB1.lancs.local"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">
<meta name="Generator" content="Microsoft Word 12 (filtered
medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
        {font-family:Verdana;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
        {font-family:Consolas;
        panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";
        color:black;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
pre
        {mso-style-priority:99;
        mso-style-link:"HTML Preformatted Char";
        margin:0cm;
        margin-bottom:.0001pt;
        font-size:10.0pt;
        font-family:"Courier New","serif";
        color:black;}
span.HTMLPreformattedChar
        {mso-style-name:"HTML Preformatted Char";
        mso-style-priority:99;
        mso-style-link:"HTML Preformatted";
        font-family:Consolas;
        color:black;}
span.EmailStyle19
        {mso-style-type:personal-reply;
        font-family:"Verdana","sans-serif";
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D">Hi
Josep,<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D">The
issue here is that the Icelandic corpus on Ray’s server have
been installed as if it had been tagged by the Lancaster
tagger combination of CLAWS + USAS (which uses the CLAWS7
tagset) whereas in fact it hasn’t. Couldn’t be, in fact,
since C7 is a tagset for English not Icelandic.<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D">This
is my fault, indirectly. Way back when CQPweb was only used
here at Lancaster, corpus installation had to be done
manually, which was a very time-consuming process. To speed
things up, I created the indexing web-forms, which have two
settings for p-attributes: “default” i.e. assume it has been
tagged by CLAWS and USAS, or “custom” i.e. specify the
p-attributes yourself. In retrospect this was clearly the
Wrong Thing, as nowhere else but Lancaster is CLAWS+USAS the
“default”, making it too easy for superusers elsewhere to do
the wrong thing in the web forms. I
<i>am</i> going to replace this system with something more
site-neutral, when I get the time....<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D">Anyway,
the upshot: if you leave “default” specified when indexing a
corpus, then CQPweb will believe it has CLAWS7 tags and USAS
semantic tags, even if it doesn’t. The way to get around
this is to ignore what CQPweb says the tags are and to look
at what they really are (e.g. by going to frequency list and
looking at a freq list of the part-of-speech tag attribute).
<o:p></o:p></span></p>
<p class="MsoNormal"><a moz-do-not-send="true"
href="http://124.193.83.252/cqp/IcePaHC/freqlist.php?flTable=__entire_corpus&flAtt=pos&flFilterType=begin&flFilterString=&flFreqLimit1=&flFreqLimit2=&pp=50&flOrder=desc&uT=y">http://124.193.83.252/cqp/IcePaHC/freqlist.php?flTable=__entire_corpus&flAtt=pos&flFilterType=begin&flFilterString=&flFreqLimit1=&flFreqLimit2=&pp=50&flOrder=desc&uT=y</a><span
style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D"><o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D">Once
you know what tags to use, the simple query syntax
<i>will</i> work. (I just tried <b>_Q-A</b>, for instance,
and it works. Not that I have any idea what Q-A means in
this tagset!)<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D">“show
+pos” doesn’t work because the interface only allows
<i>queries</i> to be specified by the user. Other CQP
commands are blocked. (In fact, CQPweb
<i>always</i> uses show +pos or equivalent, but the tags are
rendered in the tooltip that pops over the central link of a
concordance, not in the main concordance itself.)<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D">best<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D">Andrew.<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #B5C4DF
1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif";color:windowtext"
lang="EN-US">From:</span></b><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif";color:windowtext"
lang="EN-US"> <a class="moz-txt-link-abbreviated" href="mailto:cwb-bounces@sslmit.unibo.it">cwb-bounces@sslmit.unibo.it</a>
[<a class="moz-txt-link-freetext" href="mailto:cwb-bounces@sslmit.unibo.it">mailto:cwb-bounces@sslmit.unibo.it</a>] <b>On Behalf Of </b>Josep
M. Fontana<br>
<b>Sent:</b> 25 October 2012 12:04<br>
<b>To:</b> <a class="moz-txt-link-abbreviated" href="mailto:cwb@sslmit.unibo.it">cwb@sslmit.unibo.it</a><br>
<b>Subject:</b> Re: [CWB] Announcement: Another
CWB/CQPweb setup in China<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt">Hi,<br>
<br>
I am a little (or quite) confused about the syntax of CQPweb
queries (simple query language). I went to the wonderful
resource Ray Wu has made available so that I could see how
it works since we are in the process of installing CQPweb as
an interface for our corpora. I wasn't able to complete any
search using the simple query language, though. I'm sure it
is something very simple that I am missing. From what I
understand reading the document 'simple query language
syntax', I should be able to do the following in the simple
query mode:<br>
<br>
_JJ _NN1 <br>
<br>
which would supposedly look for sequences of an adjective
followed by noun according to the CLAWS tag set.
<br>
<br>
OK, I'm conducting the searches in the Old Icelandic Corpus
which has been supposedly tagged using the CLAWS7 tagset
(according to the information in "View corpus metadata".
When I do this, however, I get a message saying "Your query
had no results. There are no matches for your query." This
is very puzzling because you would imagine that there would
be occurrences of adjectives followed by nouns. Doing it the
opposite order (_NN1 _JJ) gives me the same results. What is
even more puzzling is that I also get nothing using single
POS labels such as _NN1 by itself or _JJ. <br>
<br>
Am I doing something wrong or is this due to the fact that
this particular corpus uses a completely different tagset?
When you access a CQPWeb corpus, is there any way to
retrieve the tags that have been used in the corpus? The
only relevant info I find in this corpus is the link to the
CLAWS7 tagset but, as I said, this doesn't seem to be the
right information. Going into the CQP syntax mode and doing
"show +pos" doesn't work.
<br>
<br>
<br>
JM<o:p></o:p></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><span
style="font-size:10.5pt;font-family:"Arial","sans-serif"">Dear
members,<br>
<br>
We are pleased to announce another CWB/CQPweb setup in
China and we dub it BFSU CQPweb. It is closely modelled
after Hardie's own (sorry Andrew, we're badly in need of
imagination) and currently features more than 20
corpora, including two Brown family cousins (CLOB and
Crown) developed at Beijing Foreign Studies Unversity by
Dr. Xu Jiajing and Professor Liang Maocheng.
<br>
<br>
You may access it from <a moz-do-not-send="true"
href="http://124.193.83.252/cqp/">http://124.193.83.252/cqp/</a>
using test/test as username/password.
<br>
<br>
We'd like to take this opportunity to thank the CWB team
for their wonderful work and generosity. It is great fun
to build our work on their shoulders.<o:p></o:p></span></p>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><span
style="font-size:10.5pt;font-family:"Arial","sans-serif"">Best,<br>
Ray<o:p></o:p></span></p>
</div>
<p class="MsoNormal"><span
style="font-size:10.5pt;font-family:"Arial","sans-serif""><br>
<br>
<br>
<br>
<br>
<o:p></o:p></span></p>
<pre>_______________________________________________<o:p></o:p></pre>
<pre>CWB mailing list<o:p></o:p></pre>
<pre><a moz-do-not-send="true" href="mailto:CWB@sslmit.unibo.it">CWB@sslmit.unibo.it</a><o:p></o:p></pre>
<pre><a moz-do-not-send="true" href="http://devel.sslmit.unibo.it/mailman/listinfo/cwb">http://devel.sslmit.unibo.it/mailman/listinfo/cwb</a><o:p></o:p></pre>
</div>
</blockquote>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
CWB mailing list
<a class="moz-txt-link-abbreviated" href="mailto:CWB@sslmit.unibo.it">CWB@sslmit.unibo.it</a>
<a class="moz-txt-link-freetext" href="http://devel.sslmit.unibo.it/mailman/listinfo/cwb">http://devel.sslmit.unibo.it/mailman/listinfo/cwb</a>
</pre>
</blockquote>
<br>
</body>
</html>