HTTP + khttpd/TUX

From: Joe Cooper <joe@dont-contact.us>
Date: Fri, 03 Nov 2000 18:15:25 -0600

(Sorry, I originally sent this to squid-users...that's what I get for
trying to think after midnight. Sorry if you get it twice.)

Hey folks,

Moez and I have been having discussions about the http layer, and ways
to help accelerate Squid without losing functionality or starting over
on the whole codebase. This message is to sum up what has been
discussed so far and hopefully get some ideas from you guys for how best
to make it workable (and if it is in fact workable).

A few things you need to know before I can really describe what I'm
talking about:

The reiser_raw/butterfly storeio module uses no in core swap.state, as
has been discussed here. Squid does not keep up with what objects are
on disk...they are mapped to disk based on the way they hash, and they
are then looked up via a normal file seek. Hits are loaded, misses are
handled normally. Old objects are purged over time via a mechanism
called passive garbage collection, wherein during rebalancing of the
ReiserFS tree (which happens incrementally as files are written) if an
object is found to be relatively old and space is needed it is removed.
(Simplification...but you get the idea well enough to understand what
I'm really wanting to dicuss, hopefully. If not, sizif or Nikita will
have to explain it further because that's as far as I understand it.)
The side effect of this is that if another process wanted to get a file
from this partition, and serve it, it wouldn't need to have access to
Squid's swap.state to find the files. It would do the same kind of seek
that the butterfly storeio does.

Now...onto the loopy ideas:

I've been looking over the Linux khttpd and TUX code, and thinking on
how we can most easily, and effectively leverage the performance of
these http stacks. What I've come up with is this: Let khttpd or TUX
handle cache hits, and (just as khttpd passes dynamic content requests
onto Apache or some other web server) for misses, pass them over to
Squid to handle.

>From what I can tell, this wouldn't be _that_ difficult a task. The
khttpd would need to be extended to support the butterfly/reiser_raw API
so that the cache dirs can be seeked for objects. And it would have to
know how to handle a miss...which means simply passing the request
untouched on to the Squid port.

This could work for the traditional Squid (with swap.state in core) but
only through a shared memory method, I assume.

Also, Squid would need to be modified so that it no longer tries to
satisfy cache hits...it only acts as a miss engine, of sorts, while the
much faster and more efficient khttpd code handles the hits.

This leads to more asynchronicity, which is probably a good thing. The
problem I see, is that this isn't necessarily the Right Way to turn TUX
or khttpd into a web caching proxy. But it's the only way I can see to
easily maintain all of the current Squid features (mostly...ACL's and
such will present issues, as it can only be checked for misses, unless
we add ACL code to the khttpd) while movin in the direction of
TUX/khttpd.

I welcome others thoughts or ideas for how best to achieve the goal of
improving Squid's http level efficiency...TUX has obviously been done
very right (it's the fastest web server available), and I think we can
learn quite a bit about efficient web service by looking at how it does
what it does. BTW-I mention khttpd, as we don't need a lot of the
functionality of TUX, since we don't deal with dynamic content
anyway...which is one of the improvements of TUX over khttpd. khttpd is
the basis of TUX.
                                  --
                     Joe Cooper <joe@swelltech.com>
                 Affordable Web Caching Proxy Appliances
                        http://www.swelltech.com
Received on Fri Nov 03 2000 - 17:08:43 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:55 MST