Re: [squid-users] Complicate ACL affect performance?

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sat, 18 Oct 2008 23:54:52 +1300

Henrik K wrote:
> On Sat, Oct 18, 2008 at 12:44:46PM +0300, Henrik K wrote:
>> On Fri, Oct 17, 2008 at 10:24:21PM +0200, Henrik Nordstrom wrote:
>>> On tor, 2008-10-16 at 12:02 +0300, Henrik K wrote:
>>>
>>>> Optimizing 1000 x "www.foo.bar/<randomstuff>" into a _single_
>>>> "www.foobar.com/(r(egex|and(om)?)|fuba[rz])" regex is nowhere near linear.
>>>> Even if it's all random servers, there are only ~30 characters from which
>>>> branches are created from.
>>> Right.
>>>
>>> Would be interesting to see how 50K dstdomain compares to 50k host
>>> patterns merged into a single dstdomain_regex pattern in terms of CPU
>>> usage. Probably a little tweaking of Squid is needed to support such
>>> large patterns, but that's trivial. (squid.conf parser is limited to
>>> 4096 characters per line, including folding)
>> Not sure what the splay code does in Squid, didn't have time to grab it.
>> But a simple test with Perl:
>>
>> - Grepped some hostnames from wwwlogs etc
>> - Regexp::Assemble'd 50000 unique hostnames (= 560kB regex, took 22 sec)
>> - Run 100000 hostnames on it in 4 seconds (25000 hosts/sec on 2.8Ghz CPU)
>>
>> It's pretty powerful stuff.
>
> Oops, did it even slightly wrong.
>
> By doing it correctly, using ^hostname$ instead of plain hostname in regex
> results in 1.2 seconds, that's 80000+ hosts/sec..
>

Sill out slightly. The fair test for that vs squid splay tree would be
still missing the ^ to match any given *.example.com$

Amos

-- 
Please use Squid 2.7.STABLE4 or 3.0.STABLE9
Received on Sat Oct 18 2008 - 10:55:00 MDT

This archive was generated by hypermail 2.2.0 : Sat Oct 18 2008 - 12:00:03 MDT