[CWB] What can I do to reduce/control disk I/O in cqpweb?
Jörg Knappen
j.knappen at mx.uni-saarland.de
Tue Sep 5 12:20:38 CEST 2023
Andrew,
thank you very much for the comprehensive answer. An upgrade of the
"disks" for the VMs is indeed planned, ut not realised so far. So for
the purpose of the course we will ressort to your last very useful hint
and seed the cache with the queries we want to teach.
Greetings from Saarbrücken,
Jörg Knappen
Am 2023-09-05 11:32, schrieb Hardie, Andrew:
> Hi Jörg,
>
> There's no single solution because so much varies depending on the
> exact hardware, number / size of corpora, usage pattern, etc.
>
> You mention " frequency breakdown and distribution ". This is a strong
> hint as to the bottleneck, as these both involve creation of large
> InnoDB tables in MySQL, reading data from disk and then writing it all
> to disk again. Over the last few years I have refactored processes to
> involve less disk read in these operations, but there are limits to
> what can be done.
>
> You should first check the MySQL data load settings in the cqpweb
> config:
>
> - $sql_has_file_access should be TRUE (but see manual sec 1.12.4 for
> the preconditions for this.)
>
> - $sql_local_infile_disabled should be FALSE (this being TRUE is an
> especial cause of slowdown)
>
> If those aren't the settings, changing them may improve matters.
>
> System-level things that MIGHT help:
>
> 1) Put the MySQL database storage onto a different physical drive than
> the CQPweb-devoted storage folders, to prevent interrupts and slow
> repositioning of the drive read head when both are in use at once.
> (Make it separate from the OS storage too if possible; my preferred
> setup for a multiuser server is 4 physical drives, 2 smallish for the
> OS & swap, 2 big for CQPweb data and the MySQL storage.)
>
> 2) Allocate more DB cache space (so that these big operations, once
> done, will stay in cache for longer - that should help with classes
> where people are doing the same searches over and over).
>
> 3) Get a faster HD. (You probably knew about this solution.)
>
> If tinkering with hardware is not an option, e.g. if you're on a VM,
> or you are disk-space-limited, then you can reduce the problem for
> classes by running big operations known to come up in the class
> shortly in advance, to put them in cache.
>
> Sorry I couldn't help more,
>
> best
>
> Andrew.
>
>
>
> -----Original Message-----
> From: cwb-bounces at sslmit.unibo.it <cwb-bounces at sslmit.unibo.it> On
> Behalf Of Jörg Knappen
> Sent: Monday, September 4, 2023 5:54 PM
> To: Open source development of the Corpus WorkBench
> <cwb at sslmit.unibo.it>
> Subject: [CWB] What can I do to reduce/control disk I/O in cqpweb?
>
>
> I am running cqpweb (currently version 3.3.17) on a virtual machine
> with Ubuntu 22.04 LTS. Recently, we watched severe slowdowns of cqpweb
> caused by disk I/O (memory and cpu usage weren't the bottlenecks).
>
> What can I do to reduce disk I/O or to control it somehow? The
> slowdowns already occurred with modest-sized corpora on frequency
> breakdown and distribution. Things the were no problem in the past
> (like holding a cqpweb course with
> 30 participants all doing frequency tables and distribution more or
> less
> simultaneously) seem impossible right now.
>
> Greetings from Saarbrücken,
>
> --Jörg Knappen
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
More information about the CWB
mailing list