Re: [squid-users] Pipelines / plugins / triggers / other proxies from andrew cooke on 2002-07-28 (squid-users)

From: andrew cooke <andrew@dont-contact.us>
Date: Sun, 28 Jul 2002 10:38:02 -0400

Thanks. I don't want to become involved in altering Squid itself because
it's such a big project that to do anything useful would take a lot of time
(not bad in itself, but time I would like to spend on the application rather
than the enabling technology).

I looked at Dan's Guardian, but I think it would also need patching (no
support for plugin etc) and the licence and censorship implications made me
uneasy. Then I started wondering why DG needed Squid at all and realised
that it is probably acting as a tunnel - the browser is connecting to DG
expecting a proxy, and DG forwards transparently to Squid.

This architecture (tunnel in front of Squid) is something I can implement
myself (I've years of experience writing multithreaded code using sockets),
while leaving Squid to handle most of the nasty HTTP details (changing down
versions, managing persistence etc, that were worrying me). Of course I'll
still need some parsing of data to separate headers and data, but the logic
should (I hope) be much simpler (just blank lines and content length, I hope).

From another POV, the tunnel can process the body and Squid can process the
headers.

So thanks again - that pointed me in the right direction.

Cheers,
Andrew

On Saturday 27 July 2002 10:14 pm, you wrote:
> Squid currently has no facilities for processing content beyond the HTTP
> headers, in plug-in form or otherwise. There have been a few hacks
> along the way that do some specialized form of filtering (like stripping
> the anim bit from GIFs, or stripping out javascript), but those projects
> never really went anywhere and have long been unsupported.
>
> Robert has done some promising work on generic content processing in
> Squid, but ran into some roadblocks that he didn't have time to address.
> You may want to start from there and tackle the issues he ran into,
> if you have the time and inclination.
>
> ICAP provides support for similar things in limited circumstances (it is
> targetted at content providers who want to customize or aggregate
> content or provide additional services nearer to the client). Geetha
> (and Ralf? I think) has been doing lots of cool stuff in that area, but
> I don't think it will address your needs at all in its existing form.
>
> Dan's Guardian does content processing, and so might provide a good
> starting point (note the request attached to its GPL license, however,
> before embarking on any commercial work with it). It is a standalone
> proxy these days, obviously much simpler in implementation than
> Squid...I do not know how compliant it is with regard to the HTTP
> protocols, but I haven't heard any particularly alarming things about it
> and Dan seems to be a skilled programmer, so I'd suspect it is a good
> choice for your project.
>
> andrew cooke wrote:
> > Hi,
> >
> > Is there a simple way to process files that are requested through Squid?
> >
> > I'd like to try constructing a database containing links, word counts
> > etc, for pages that I view. The simplest way I can think of to do this
> > is to point my browser at a proxy and process data there. Squid seems
> > the obvious choice for a proxy (but see last point below).
> >
> > Looking for similar functionality in other code working with Squid, I
> > found the Viralator which checks downloads for viruses
> > (http://viralator.loddington.com/). It intercepts requests using Squirm,
> > pulls the file using wget, and the resupplies it (after scanning) via
> > Apache. This seems very complicated (and may only work correctly for
> > downloads rather than page views - I'm not clear about the details yet)
> > (although I could drop Apache when working on the machine hosting Squid).
> >
> > Instead, I was wondering if Squid had support for plugin modules (that
> > might be intended to support filters, for example), but I haven't been
> > able to find anything.
> >
> > Another approach might be to scan the files cached by Squid (ie as files
> > on the local disk, not streamed data). But this presumably won't work
> > with dynamic pages and it might be difficult to associate URLs with files
> > (also, it forces caching when, for single person use, proxy-only might be
> > sufficient). And how would this be triggered for new files?
> >
> > Does anyone have any suggestions on the best way forwards? Perhaps
> > there's a simpler proxy that I could use instead? There are certainly a
> > lot of simple http proxies out there, but I'm not sure how closely they
> > follow the spec.
> >
> > Any help appreciated,
> > Thanks,
> > Andrew

-- 
http://www.acooke.org

Received on Sun Jul 28 2002 - 08:44:41 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:09:22 MST