On Wed, Oct 20, 1999 at 02:38:19AM +0000, Dancer wrote:
...
> Yesterday, we _did_ manage to jam one of them up, but good...the
> incoming request rate was on the order of 90-100/second, and the
> service-times shot up, and things began to bog down. Still doing
> post-mortem analysis on the logs, but it looks like what happened was
> that (with the incoming request rate exceeding the handling capacity)
> the number of simultaneous connections climbed, causing further
> overhead, and shortchanged the filtering and authentication processes
> (who are large CPU consumers).
My experience is that this kind of phenomenon can happen with pretty
much any connection-oriented IP service - or more generally, any
transaction-processing type application on nearly any OS - when the OS
and hardware are simply pushed to the limits of what they can handle.
You see a phenomenon where the response time slows fairly linearly and
continuously until it hits some critical level and then falls right off
a cliff.
Right now we're going through similar issues on one of our main
servers which both handles lots of mail and many web servers. Similar
story - once it gets past a certain point, mail starts stacking up,
more and more HTTP connects start piling up, and it totally thrashes
until the connection rate drops enough.
It's a testimony to BSD UNIX that I've seen it go through loads of
around 400 and come back to normal without crashing. Most OSes I've
worked with over the years simply crash and burn at that point.
> I'm not yet quite sure how to avoid this, but I have some ideas about
> having the helper apps call sched_yield() about halfway through certain
> CPU-bound routines, to help share the cycles...I might be talking crazy,
> though. Some experiments are in order.
The only real solution I know of once you hit this kind of point is
to start spreading out the load more, or, in some cases, total
application redesign. You can usually tweak a bit more performance out
of even well-designed apps, but it'll only get you so far and it's not
repeatable.
In our case, we're trying to split the workload off onto several
other servers, hopefully without breaking anything in the process.
-- Clifton
-- Clifton Royston -- LavaNet Systems Architect -- cliftonr@lava.net "An absolute monarch would be absolutely wise and good. But no man is strong enough to have no interest. Therefore the best king would be Pure Chance. It is Pure Chance that rules the Universe; therefore, and only therefore, life is good." - ACReceived on Wed Oct 20 1999 - 00:21:13 MDT
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:48:59 MST