[CWB] Virtual seminar: Embedding CWB in a CL Workflow | Finite State Queries
Stefan Evert
stefanML at collocations.de
Sat Jan 16 19:28:10 CET 2021
Dear CWB aficionados,
as part of the colloquium of my research group, there will be an online presentation concerned with applications of CWB and a glimpse into the query “engine room” of CQP.
Philipp Heinrich & Stefan Evert (CCL, FAU Erlangen-Nürnberg)
News from the Corpus Workbench (CWB):
Embedding CWB in a CL Workflow | Finite State Queries
Wednesday, 27 January 2021, 16:15–17:45 CET
https://www.linguistik.phil.fau.de/teaching/oberseminar/#2021_01_27
The presentation will be given in English via a regular Zoom videoconference. Anybody interested is welcome to attend. Please send us an e-mail (to stefan.evert at fau.de) in order to obtain the Zoom link (which we don't want to share on any public Web page).
Feel free to share this e-mail with anyone else who might be interested.
Best & hope to see some of you at the presentation,
Stefan
ABSTRACT
Many powerful corpus query engines – notably the IMS Open Corpus
Workbench (CWB), the (No)Sketch Engine, and several other tools inspired
by them – offer a query language based on generalised regular expressions
(formulated over complex token descriptions rather than individual
characters). This enables researchers to locate lexico-grammatical
patterns of interest and collect corpus instances in a concordance. Many
applications of corpus linguistics – notably corpus-based discourse
analysis and computational lexicography – are furthermore in need of
collocations or word sketches, as well as dispersion and keyword analyses
(based on metadata annotation included in the corpus).
The first part of the talk gives a practical introduction to cwb-ccc, an
open-source Python package that translates CWB query results into pandas
dataframes and then performs collocation analyses for different contexts.
It also offers keyword analysis for subcorpora defined by metadata
constraints.
The second part of the talk gives the first publicly available
introduction to the CWB implementation of corpus queries by
non-deterministic simulation of finite-state automata. It also addresses
pitfalls and limitations of finite-state queries, in particular certain
corner cases that may not be evaluated correctly.
More information about the CWB
mailing list