[CWB] What can I do to reduce/control disk I/O in cqpweb?

Hardie, Andrew a.hardie at lancaster.ac.uk
Tue Sep 5 11:32:43 CEST 2023


Hi Jörg,

There's no single solution because so much varies depending on the exact hardware, number / size of corpora, usage pattern, etc.

You mention " frequency breakdown and distribution ". This is a strong hint as to the bottleneck, as these both involve creation of large InnoDB tables in MySQL, reading data from disk and then writing it all to disk again. Over the last few years I have refactored processes to involve less disk read in these operations, but there are limits to what can be done.

You should first check the MySQL data load settings in the cqpweb config:

- $sql_has_file_access should be TRUE (but see manual sec 1.12.4 for the preconditions for this.)

- $sql_local_infile_disabled should be FALSE (this being TRUE is an especial cause of slowdown)

If those aren't the settings, changing them may improve matters.

System-level things that MIGHT help:

1) Put the MySQL database storage onto a different physical drive than the CQPweb-devoted storage folders, to prevent interrupts and slow repositioning of the drive read head when both are in use at once. (Make it separate from the OS storage too if possible; my preferred setup for a multiuser server is 4 physical drives, 2 smallish for the OS & swap, 2 big for CQPweb data and the MySQL storage.)

2) Allocate more DB cache space (so that these big operations, once done, will stay in cache for longer - that should help with classes where people are doing the same searches over and over).

3) Get a faster HD.  (You probably knew about this solution.)

If tinkering with hardware is not an option, e.g. if you're on a VM, or you are disk-space-limited, then you can reduce the problem for classes by running big operations known to come up in the class shortly in advance, to put them in cache.

Sorry I couldn't help more,

best

Andrew.



-----Original Message-----
From: cwb-bounces at sslmit.unibo.it <cwb-bounces at sslmit.unibo.it> On Behalf Of Jörg Knappen
Sent: Monday, September 4, 2023 5:54 PM
To: Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it>
Subject: [CWB] What can I do to reduce/control disk I/O in cqpweb?


I am running cqpweb (currently version 3.3.17) on a virtual machine with Ubuntu 22.04 LTS. Recently, we watched severe slowdowns of cqpweb caused by disk I/O (memory and cpu usage weren't the bottlenecks).

What can I do to reduce disk I/O or to control it somehow? The slowdowns already occurred with modest-sized corpora on frequency breakdown and distribution. Things the were no problem in the past (like holding a cqpweb course with
30 participants all doing frequency tables and distribution more or less
simultaneously) seem impossible right now.

Greetings from Saarbrücken,

--Jörg Knappen
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
http://liste.sslmit.unibo.it/mailman/listinfo/cwb


More information about the CWB mailing list