On Fri, Oct 17, 2008 at 10:24:21PM +0200, Henrik Nordstrom wrote:
> On tor, 2008-10-16 at 12:02 +0300, Henrik K wrote:
>
> > Optimizing 1000 x "www.foo.bar/<randomstuff>" into a _single_
> > "www.foobar.com/(r(egex|and(om)?)|fuba[rz])" regex is nowhere near linear.
> > Even if it's all random servers, there are only ~30 characters from which
> > branches are created from.
>
> Right.
>
> Would be interesting to see how 50K dstdomain compares to 50k host
> patterns merged into a single dstdomain_regex pattern in terms of CPU
> usage. Probably a little tweaking of Squid is needed to support such
> large patterns, but that's trivial. (squid.conf parser is limited to
> 4096 characters per line, including folding)
Not sure what the splay code does in Squid, didn't have time to grab it.
But a simple test with Perl:
- Grepped some hostnames from wwwlogs etc
- Regexp::Assemble'd 50000 unique hostnames (= 560kB regex, took 22 sec)
- Run 100000 hostnames on it in 4 seconds (25000 hosts/sec on 2.8Ghz CPU)
It's pretty powerful stuff.
Received on Sat Oct 18 2008 - 09:44:54 MDT
This archive was generated by hypermail 2.2.0 : Sat Oct 18 2008 - 12:00:03 MDT