[CWB] number and <text_id> tag inside a word search

Stefan Evert stefanML at collocations.de
Mon Feb 22 22:46:25 CET 2016


> But note, Daniel, that cwb-encode is actually already programmed to delete the EF-BB-BF sequence if it finds it at the start of a file –

Great! I should keep track more closely of your progress with CWB. :-)

> but only when the corpus encoding is declared to be UTF-8. Youdisabled this check by using “-c latin1” . 

Then the input file is simply invalid – or in other words, CQPweb gave you what you asked for.

Kudos for spotting this flag, which I had overlooked completely.

Best,
Stefan



More information about the CWB mailing list