> >Just on a further note: a group called n2h2.com basically does this as
> >a job, hiring people to go searching the net for this trash (interesting
> >job ;) so that they can offer it as a service to schools and so on.
> >The use a modified squid to handle the number of sites they block.
> >
> >They have a list of 180 000 sites that have "dirty pix/bomb-details" etc.
>
> 180,000 regex entries would grind your proxy to a screaming halt!
They run a 'modified squid' according to their info page.
I suspect one of the modifications is a hash table for pattern
matching rather than a linear search. You can anchor on any fixed
part of a URL and only do a linear search on a small subset of
URLs like ".*xxx.*". The obvious fixed part to anchor on is
the site name or IP but any directory path element would do too.
I haven't studied the release conditions of squid recently so I
don't know if it's a condition of commercial use that modifications
are fed back to the squid team. It would be nice if they were.
I hope n2h2 is playing fair.
G
Received on Wed Jun 04 1997 - 20:44:00 MDT
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:35:22 MST