Re: [PATCH] icap_oldest_service_failure option

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Fri, 19 Feb 2010 21:32:25 -0700

On 02/19/2010 06:19 PM, Amos Jeffries wrote:
> Alex Rousskov wrote:
>> Added icap_oldest_service_failure option to forget old ICAP errors.
>>
>> A busy or remote ICAP server may produce a steady but shallow stream of
>> errors. Any ICAP server may become nearly unusable in a short period of
>> time, producing a burst of errors. To avoid disabling a generally usable
>> service, it is important to distinguish these two cases. Just counting
>> the number of errors and suspending the service after
>> icap_service_failure_limit is reached often either suspends the service
>> in both cases or never suspends it at all, depending on the option
>> value.
>>
>> One way to distinguish a large burst of errors from a steady but shallow
>> error stream is to forget about old errors. The added
>> icap_oldest_service_failure option instructs Squid to ignore errors that
>> are "too old" to be counted as a part of a burst.
>>
>> Another way to look at this feature is to say that the combination of
>> the old icap_service_failure_limit and the new
>> icap_oldest_service_failure limits the ICAP error _rate_. For example,
>> # suspend service usage after 10 failures in 5 seconds:
>> icap_service_failure_limit 10
>> icap_oldest_service_failure 5 seconds
>>
>> Squid does not remember every transaction error that occurred within the
>> allowed "oldest error" time period. That would be result in a precise
>> but too expensive implementation, especially during error bursts on a
>> busy server. Instead, Squid divides the period in ten slots, counts the
>> number of errors that occurred in each slot, and forget the oldest
>> slot(s) as needed. Thus, the algorithm has about 90% precision as far as
>> timing of the failures is concerned. That 90% precision ought to be good
>> enough for any deployment.
>>
>> The patch is for Squid v3.1+ but we will port to trunk if approved.
>>
>
> +1. I definitely like the idea.
>
> Would it be possible to deflate the options a little bit though?
>
> If what is being achieved really is meant to be a rate I would expect
> the icap_service_failure_limit could better be extended instead of a new
> option added. To make it easier for users to understand whats going to
> happen.
>
> Something like:
> 'icap_service_failure_limit' number [ ('/'|'per') period ]

It would be a little tricky to explain that although it does look like a
rate, it is not really, because 1/sec value may have a very different
effect than 100/100sec even though the average rate is the same. I am
happy to do it if others think one compound option is better than two
simple ones in this case.

Thank you,

Alex.
Received on Sat Feb 20 2010 - 04:32:46 MST

This archive was generated by hypermail 2.2.0 : Mon Feb 22 2010 - 12:00:07 MST