[CWB] CWB-CL test failure (encoding?)
Stefan Evert
stefanML at collocations.de
Thu Aug 3 13:32:23 CEST 2017
Hi Piotr!
> I have CQPweb (3.2.27), cwb (3.4.12) and the CWB Perl module installed from trunk (rev. 982), and wanted to compile the remaining Perl modules, but I'm getting something that looks like an encoding problem when testing CWB-CL, and I have no idea of how to go about fixing that. I'll be grateful for hints.
The weird characters are probably just placeholders inserted by Perl when printing Unicode strings on an output stream that's not known to be Unicode. On my Mac, I get plain question marks.
> Nearly-PostScriptum: I've just had the last look around to make sure that I haven't missed any troubleshooting hints and noticed the warning that "This version of CWB/Perl (...) is not compatible with the current beta track CWB 3.4.x" -- is this what I'm up against, please?
That's incorrect. In fact, the current SVN trunk version of CWB/Perl is the one that _is_ compatible with CWB 3.4.x (and may no longer be compatible with CWB 3.0). Where did you find the warning?
In a nutshell: Go ahead and make install. Three test errors are expected at the moment.
Long story:
Some time ago, I started testing CWB regexp against Perl regular expressions as a gold standard (which makes sense now that Perl has switched to PCRE). This helped to dig out a few bugs in case-insensitive regexp matching, so I want to extend this to a larger range of regexp.
Unfortunately, PCRE isn't 100% compatible with Perl regexp, and these discrepancies lead to the test failures with CWB-CL. (The difference is that /daß/ case-insensitively matches "daß" and "dass" in Perl, but only "daß" in PCRE. Of course, /DASS/ fails to case-insensitively match "daß", so I would argue that the Perl behaviour is less consistent than PCRE. I would also argue that (a) Unicode and (b) natural language is a complete mess and both should be abandoned. :-)
The problematic tests will be skipped in the 3.5 release version. At the moment, I'm contemplating whether I can modify them so it's still possible to validate PCRE matching in this case without the false positive from Perl.
Best,
Stefan
More information about the CWB
mailing list