[CWB] POS tags have first character cut off
Scott Sadowsky
ssadowsky at gmail.com
Fri Jun 7 10:46:31 CEST 2019
Perfect. I've got it set up and working now. The debug info here shows that
what's retrieved is correct -- I've highlighted one case in green, but they
all look to be correct. However, at some point after that the POS tags are
mangled (example in red), with the first character disappearing. And except
for the SQL query, there doesn't seem to be much info about what happens
between these two points.
CQP << set PrintStructures "text_id"; CQP
--------------------------------------CQP << set LeftKWICDelim '--%%%--';
CQP --------------------------------------CQP << set RightKWICDelim
'--%%%--'; CQP --------------------------------------CQP << cat
g3sl04ac0luF 5 5; CQP >> 5338158: : gente/NCFS000 que/PR0CN00 no/RN
conoce/VMIP3S0 pero/CC con/SP algunas/PI0FP00 sí/RG porque/CS ha/VAIP3S0
ido/VMP00SM harta/AQ0FS00 gente/NCFS000 ,/Fc y/CC ha/VAIP3S0 pasado/VMP00SM
hasta/SP adentro/RG ,/Fc y/CC no/RN le/PP3CSD0 ha/VAIP3S0 hecho/VMP00SM
nada/PI0CS00 pero/CC hay/VMIP3S0 gente/NCFS000 que/PR0CN00 la/PP3FSA0
ve/VMIP3S0 y/CC le/PP3CSD0 ladra/VMIP3S0 ,/Fc le/PP3CSD0 ladra/VMIP3S0 ,/Fc
hasta/SP que/PR0CN00 tie/NCFS000 .../Fs tiene/VMIP3S0 que/CS ir/VMN0000
se/PP3CN00 no/RN más/RG como/CS que/CS no/RN le/PP3CSD0 gusta/VMIP3S0
porque/CS dicen/VMIP3P0 que/CS los/DA0MP0 perros/NCMP000 presienten/VMIP3P0
la/DA0FS0 maldad/NCFS000 ¿/Fia una/DI0FS0 cosa/NCFS000 así/RG ?/Fit ¡/Faa
chuta/I !/Fat entonces/RG hay/VMIP3S0 harta/AQ0FS00 gente/NCFS000
mala/AQ0FS00 sí/RG pero/CC igual/RG ahora/RG como/CS que/CS ya/RG
dejó/VMIS3S0 eso/PD00S00 ,/Fc es/VMIP2S0 que/CS antes/RG lo/PP3MSA0
hacía/VMII3S0 como/CS por/SP jugar/VMN0000 ,/Fc yo/PP1CSN0 creo/VMIP1S0
,/Fc a/SP donde/PR00000 era/VSII3S0 --%%%--perrito/NCMS000--%%%--
solo/AQ0MS00 y/CC nuevo/AQ0MS00 entonces/RG como/CS que/CS lo/PP3MSA0
hacía/VMII3S0 por/SP jugar/VMN0000 creo/VMIP1S0 yo/PP1CSN0 sí/RG es/VSIP3S0
que/CS en/VMIP3P0 realidad/NCFS000 como/CS que/CS no/RN se/P00CN00
las/PP3FPA0 comía/VMII3S0 ,/Fc sino/CC que/PR0CN00 le/PP3CSD0
mandaba/VMII3S0 manotazos/NCMP000 ,/Fc entonces/RG ahí/RG le/PP3CSD0 ,/Fc
como/CS que/CS ,/Fc las/PP3FPA0 dejaba/VMII3S0 aturdía/VMII3S0 ,/Fc
después/RG las/PP3FPA0 agarraba/VMII3S0 para/SP jugar/VMN0000 ,/Fc
entonces/RG ,/Fc las/DA0FP0 zamarreaba/VMII3S0 y/CC ahí/RG a/SP
donde/PR00000 no/RN las/PP3FPA0 dejamos/VMIP1P0 ahí/RG no/RN más/RG sí/RG
es/VSIP3S0 que/CS parece/VMIP3S0 que/CS las/DA0FP0 enterraba/VMII3S0 así/RG
que/CS no/RN sé/VMIP1S0 sí/RG sí/RG pero/CC ahora/RG ya/RG no/RN ,/Fc a/SP
el/DA0MS0 ot/NCFS000 ..../Fz bueno/I ,/Fc que/CS igual/RG mi/DP1CSS
papi/NCMS000 tenía/VMII3S0 tres/Z corderos/NCMP000 ,/Fc y/CC mordió/VMIS3S0
a/SP dos/Z ,/Fc pero/CC les/PP3CPD0 CQP
--------------------------------------
About to run SQL:
select * from xml_visualisations where corpus = 'coscach' and (
(in_context = 1) )
/* from User: scott | Function: get_all_xml_visualisations() |
2019-Jun-07 05:38 */
SQL ran successfully in 0.001 seconds.
Displaying extended context for query match in text
*MMMLP_F_2_19_Ca_interview*
Text info for MMMLP_F_2_19_Ca_interview More context Back to start
of query results New query
URL PRINTINPUT:::Passing through contextSize
(100)....................................... AND USING IT.
(context-ui.php, 598)
gente_CFS000 que_R0CN00 no_N conoce_MIP3S0 pero_C con_P algunas_I0FP00 sí_G
porque_S ha_AIP3S0 ido_MP00SM harta_Q0FS00 gente_CFS000 ,_c y_C ha_AIP3S0
pasado_MP00SM hasta_P adentro_G ,_c y_C no_N le_P3CSD0 ha_AIP3S0hecho_
MP00SM nada_I0CS00 pero_C hay_MIP3S0 gente_CFS000 que_R0CN00 la_P3FSA0 ve_
MIP3S0 y_C le_P3CSD0 ladra_MIP3S0 ,_c le_P3CSD0 ladra_MIP3S0 ,_c hasta_P
que_R0CN00 tie_CFS000 ..._s
tiene_MIP3S0 que_S ir_MN0000 se_P3CN00 no_N más_G como_S que_S no_N le_
P3CSD0 gusta_MIP3S0 porque_S dicen_MIP3P0 que_S los_A0MP0 perros_CMP000
presienten_MIP3P0 la_A0FS0 maldad_CFS000 ¿_ia una_I0FS0 cosa_CFS000 así_G
?_it
¡_aa chuta_ !_at
entonces_G hay_MIP3S0 harta_Q0FS00 gente_CFS000 mala_Q0FS00 sí_G pero_C
igual_G ahora_G como_S que_S ya_G dejó_MIS3S0 eso_D00S00 ,_c es_MIP2S0 que_
S antes_G lo_P3MSA0 hacía_MII3S0 como_S por_P jugar_MN0000 ,_c yo_P1CSN0
creo_MIP1S0 ,_c a_P donde_R00000 era_SII3S0 *perrito_CMS000 *solo_Q0MS00 y_C
nuevo_Q0MS00 entonces_G como_S que_S lo_P3MSA0 hacía_MII3S0 por_P jugar_
MN0000 creo_MIP1S0 yo_P1CSN0 sí_G es_SIP3S0 que_S en_MIP3P0 realidad_CFS000
como_S que_S no_N se_00CN00 las_P3FPA0 comía_MII3S0 ,_c sino_C que_R0CN00
le_P3CSD0 mandaba_MII3S0 manotazos_CMP000 ,_c entonces_G ahí_G le_P3CSD0 ,_
c como_S que_S ,_c las_P3FPA0 dejaba_MII3S0 aturdía_MII3S0 ,_c después_Glas_
P3FPA0 agarraba_MII3S0 para_P jugar_MN0000 ,_c entonces_G ,_c las_A0FP0
zamarreaba_MII3S0 y_C ahí_G a_P donde_R00000 no_N las_P3FPA0 dejamos_MIP1P0
ahí_G no_N más_G sí_G es_SIP3S0 que_S parece_MIP3S0 que_S las_A0FP0
enterraba_MII3S0 así_G que_S no_N sé_MIP1S0 sí_G sí_G pero_C ahora_G ya_G
no_N ,_c a_P el_A0MS0 ot_CFS000 ...._z bueno_ ,_c que_S igual_G mi_P1CSS
papi_CMS000 tenía_MII3S0 tres_ corderos_CMP000 ,_c y_C mordió_MIS3S0 a_P
dos_ ,_c pero_Cles_P3CPD0
Cheers,
Scott
On Fri, Jun 7, 2019 at 4:38 AM Hardie, Andrew <a.hardie at lancaster.ac.uk>
wrote:
> “your config file” == config.inc.php . If you set a configuration variable
> in there (see tables int eh admin manual), the system will pass it round to
> all the other places where it’s used (which are the *other* strings your
> grep retrieved).
>
>
>
> config.inc.php is created by the autoconfig script. It’s not under version
> control and is unique to your machine.
>
>
>
> >> Also, anything in particular I should be looking for?
>
>
>
> What the original concordance lines from CQP look like prior to the system
> reformatting them.
>
>
>
> They should be printed out as the CQP child process produces them. They
> will be on lines beginning
>
>
>
> *CQP >>*
>
>
>
> You may find it easier to read them in the “view source” .
>
>
>
> SQL query debugs will be printed too. Ignore them in this case.
>
>
>
> best
>
>
>
> Andrew.
>
>
>
> *From:* cwb-bounces at sslmit.unibo.it <cwb-bounces at sslmit.unibo.it> *On
> Behalf Of *Scott Sadowsky
> *Sent:* 07 June 2019 09:30
> *To:* Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it
> >
> *Subject:* Re: [CWB] POS tags have first character cut off
>
>
>
> Will do. I've found that string in all of the following files:
>
>
>
> offline-freqlists.php
> autosetup.php
> upgrade-database.php
> defaults.php
> general-lib.php
> config.inc.php
>
>
>
> Which one is the relevant one in this case?
>
>
>
> Also, anything in particular I should be looking for?
>
>
>
> Cheers,
>
> Scott
>
>
>
> On Fri, Jun 7, 2019 at 4:14 AM Hardie, Andrew <a.hardie at lancaster.ac.uk>
> wrote:
>
> I think you should set $print_debug_messages to true in your config file.
> This should reveal what is happening behind the scenes, and at what stage
> the initial character goes missing.
>
>
>
> best
>
>
>
> Andrew.
>
>
>
> *From:* cwb-bounces at sslmit.unibo.it <cwb-bounces at sslmit.unibo.it> *On
> Behalf Of *Scott Sadowsky
> *Sent:* 07 June 2019 09:08
> *To:* Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it
> >
> *Subject:* Re: [CWB] POS tags have first character cut off
>
>
>
> On Fri, Jun 7, 2019 at 3:46 AM Hardie, Andrew <a.hardie at lancaster.ac.uk>
> wrote:
>
>
>
> Hi Andrew,
>
>
>
> Can I pls check that I am correct about what you’re looking at:
>
> · Show tags is ON
>
> · Show alt view is ON (and configured to POS)?
>
>
>
> That's exactly right.
>
>
>
> Note, however, that the first letter of the tags are also cut off when I
> choose "Show Tags" with POS tags selected:
>
>
>
>
> gente_CFS000 que_R0CN00 no_N conoce_MIP3S0 pero_C con_P algunas_I0FP00 sí_G porque_S ha_AIP3S0 ido_MP00SM harta_Q0FS00 gente_CFS000 ,_c y_C ha_AIP3S0 pasado_MP00SM hasta_P adentro_G ,_c y_C no_N le_P3CSD0 ha_AIP3S0hecho_MP00SM nada_I0CS00 pero_C hay_MIP3S0 gente_CFS000 que_R0CN00 la_P3FSA0 ve_MIP3S0 y_C le_P3CSD0 ladra_MIP3S0
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>
>
>
>
> --
>
> Dr. Scott Sadowsky
> Profesor Asistente de Lingüística
>
> Pontificia Universidad Católica de Chile
>
>
>
> ssadowsky gmail com
>
> scsadowsky uc cl
> http://sadowsky.cl/
>
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>
--
Dr. Scott Sadowsky
Profesor Asistente de Lingüística
Pontificia Universidad Católica de Chile
ssadowsky gmail com
scsadowsky uc cl
http://sadowsky.cl/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20190607/04a77467/attachment-0001.html>
More information about the CWB
mailing list