I've been looking at the diskd stuff a little for someone to see if I can
mitigate the crashes under load. I didn't feel like trying to fix the
main code paths to support reentry :)
So here's what I've got thus far:
http://www.creative.net.au/diffs/20080119-diskd-2.diff
* Track the number of opened storeIOState's per swap dir;
* Limit magic1 to be the number of open files in that swapdir, rather than
the number of away messages;
* disable using diskd for unlink; just use unlinkd.
These are all an attempt to constrain the queue size to be somewhat related
to the number of open storeIOState's for a given swapdir.
Unfortunately its not -quite- related as I'm still seeing 3x and 4x the number
of away messages to storeIOState for a given swapdir, but it doesn't reach
magic2 anywhere near as often now and doesn't end up having to call
storeDirCallback() recursively under high load. Magic1 = 64, Magic2=128 here.
I think thats about as good a solution as I can come up with in the short
term. I'm not going to commit it in its entirety - I may just commit the
unlinkd change as that itself may mitigate issues enough to be worth it -
but if diskd is going to hang around in the future then it needs to be
a way of dispatching queued disk events rather than being the queue itself
(ie, how aio works.)
(Hopefully it works for the poor guy who is stuck with diskd and the
crashes!)
Adrian
-- - Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support - - $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -Received on Sat Jan 19 2008 - 01:00:55 MST
This archive was generated by hypermail pre-2.1.9 : Wed Jan 30 2008 - 12:00:09 MST