Hi,
Is there a simple way to process files that are requested through Squid?
I'd like to try constructing a database containing links, word counts etc,
for pages that I view. The simplest way I can think of to do this is to
point my browser at a proxy and process data there. Squid seems the obvious
choice for a proxy (but see last point below).
Looking for similar functionality in other code working with Squid, I found
the Viralator which checks downloads for viruses
(http://viralator.loddington.com/). It intercepts requests using Squirm,
pulls the file using wget, and the resupplies it (after scanning) via Apache.
This seems very complicated (and may only work correctly for downloads rather
than page views - I'm not clear about the details yet) (although I could drop
Apache when working on the machine hosting Squid).
Instead, I was wondering if Squid had support for plugin modules (that might
be intended to support filters, for example), but I haven't been able to find
anything.
Another approach might be to scan the files cached by Squid (ie as files on
the local disk, not streamed data). But this presumably won't work with
dynamic pages and it might be difficult to associate URLs with files (also,
it forces caching when, for single person use, proxy-only might be
sufficient). And how would this be triggered for new files?
Does anyone have any suggestions on the best way forwards? Perhaps there's a
simpler proxy that I could use instead? There are certainly a lot of simple
http proxies out there, but I'm not sure how closely they follow the spec.
Any help appreciated,
Thanks,
Andrew
-- http://www.acooke.orgReceived on Sat Jul 27 2002 - 18:55:08 MDT
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:09:22 MST