[CWB] Input for collostructional analysis

Susanne Flach susanne.flach at fu-berlin.de
Fri Mar 18 12:58:12 CET 2022


Hi Elif,

I’d agree that for Step 3, it’ll be much easier to read in a separate overall corpus frequency list and use join.freqs() to merge them - the function has an argument that allows you to specify that you only want the items that are in your list of step 2 (i.e. verbs that occur in the cxn). If the corpus frequency list is too large, I’d use the list.txt and do some sort of a setdiff/awk magic outside CWB/{collostructions}, although I never had to do that with any CWB corpus (of the size hosted at the FU).

Even if it was possible with cwb-lexdecode via an inclusion list (list.txt), it would not give you an advantage, as far as I see. Fwiw, I keep frequency lists for every corpus and/or word class so I can load them into R when I need them for exactly that purpose.

Feel free to contact me offlist if you have further questions about {collostructions} (susanne.flach at es.uzh.ch <mailto:susanne.flach at es.uzh.ch>).

Best,
Susanne

—
Dr. Susanne Flach
Arbeitsbereich Linguistik
Institut für Englische Philologie
Freie Universität Berlin
Habelschwerdter Allee 45
14195 Berlin
sfla.ch

** Ich bin seit Februar 2020 an der
Universität Zürich
(susanne.flach at es.uzh.ch) **

Korpustutorium mit CQP <http://userpage.fu-berlin.de/~flach/corpling/>

> On 18 Mar 2022, at 12:05, Kara, Elif <elif.kara at fu-berlin.de> wrote:
> 
> Dear all,
> 
> I would like to create an input file for collostructional analysis. Is there an efficient way of exporting a file containing three columns:1) collocates occurring in a particular construction (complex query) 2) their frequencies within said construction 3) their corpus frequencies overall? 
> 
> If this isn’t possible in a single step: I already have a list containing 1) and 2) — is there a way of querying the corpus frequencies of the words using a word list?
> 
> I have tried:
> cwb-lexdecode -r <REGISTRY> -F list.txt -f0 -P word <MYCORPUS>
> but this computes no matches which can't be right (my list contains one word per line).I’m using my university's v3.0.0 CWB installation from the command line.
> 
> Apologies if the question is basic but I am new to corpus linguistics and I am at a loss! Any help is greatly appreciated!
> 
> Best
> Elif
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it <mailto:CWB at sslmit.unibo.it>
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb <http://liste.sslmit.unibo.it/mailman/listinfo/cwb>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20220318/fc9859e4/attachment-0001.html>


More information about the CWB mailing list