[CWB] Maximum corpus size

Austin Yang austin.yang.2014 at gmail.com
Tue Feb 7 02:19:01 CET 2023


Dear Stephanie,
Thanks for your response and clarification! Best of luck to implement the
Ziggurat backend!


Best,
Austin Yang (楊承洋)
MS in Cognitive Neuroscience, NCU
BS in Psychology, CYCU


On Mon, Feb 6, 2023 at 6:43 PM Stephanie Evert <stefanML at collocations.de>
wrote:

> Dear Austin,
>
> I think you've been misreading the encoding tutorial, which says that
>
>         The maximum corpus size is 2,147,483,647 tokens (the largest value
> that can be stored as a signed 32-bit integer). In the CWB source code,
> this is represented by the macro CL_MAX_CORPUS_SIZE.
>
>         https://cwb.sourceforge.io/files/CWB_Encoding_Tutorial/B.html
>
> So the maximum size is a hard upper limit, and there is no indication here
> that it would be sensible to modify CL_MAX_CORPUS_SIZE in the source code.
>
> Such limitations will be lifted by the new Ziggurat backend, once we
> finally get round to implementing it.  Things are progressing, though, so
> I'm inclined to say “stay tuned”.
>
> Best,
> Stephanie
>
>
> > On 6 Feb 2023, at 09:53, Austin Yang <austin.yang.2014 at gmail.com> wrote:
> >
> > Dear all,
> > I'm trying to encode a corpus size over 2GiB. The CWB encoding tutorial
> noted that it is possible by changing the CL_MAX_CORPUS_SIZE from CWB
> source code. I modified the parameter (CL_MAX_CORPUS_SIZE) from the cl.h
> file (which I'm not sure if it's the CWB source code mentioned in the
> tutorial) by 10x, but the CQPweb site still show that the maximum token is
> 2,147,483,647 tokens. Did I miss something from the tutorial? Any comments
> will be greatly appreciated!
> >
> > CWB version 3.5.0
> >
> >
> > Best,
> > Austin Yang (楊承洋)
> > MS in Cognitive Neuroscience, NCU
> > BS in Psychology, CYCU
> > _______________________________________________
> > CWB mailing list
> > CWB at sslmit.unibo.it
> > http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20230207/af9a4c7d/attachment.html>


More information about the CWB mailing list