[CWB] number and <text_id> tag inside a word search
Stefan Evert
stefanML at collocations.de
Mon Feb 22 22:46:25 CET 2016
> But note, Daniel, that cwb-encode is actually already programmed to delete the EF-BB-BF sequence if it finds it at the start of a file –
Great! I should keep track more closely of your progress with CWB. :-)
> but only when the corpus encoding is declared to be UTF-8. Youdisabled this check by using “-c latin1” .
Then the input file is simply invalid – or in other words, CQPweb gave you what you asked for.
Kudos for spotting this flag, which I had overlooked completely.
Best,
Stefan
More information about the CWB
mailing list