<div dir="ltr"><div><div><div><div><div><div>Dear CQP experts,<br><br></div>I would like to set up a parallel German and English corpus and I have two related questions:<br><br></div>1. I understand that the main difficulty here is to align the corpus. Is it possible to port existing alignments (e.g. a translation memory or outputs of other tools) to CWB? So far, I have managed to align and encode a mere 7 sentences with cwb-align and related tools. Beyond that, the difficulty of obtaining the exact same number of sentences on both sides from my sentence splitter made it very hard for me to encode the corpus. Any hints or best practices?<br><br></div>2. Maybe this is a naive question and not entirely related to CWB: Is there a way to handle German characters (ä and the like) properly on the console, that is, to ensure that they can be searched for and displayed properly? Actually, my registry file tells me that &quot;charset = &#39;utf8&#39;&quot;, but searching for Umlauts etc. triggers an error: &quot;Query includes a character ... that is invalid in the encoding specified for this corpus.&quot; At the moment, I work on Windows.<br><br></div>Thanks in advance for your advice.<br><br></div>All the best,<br></div>Anne-Kathrin Schumann<br></div>