RE: [squid-users] Squid to cache a DB? from Robert Collins on 2001-08-17 (squid-users)

From: Robert Collins <robert.collins@dont-contact.us>
Date: 18 Aug 2001 11:43:30 +1000

On 17 Aug 2001 12:05:43 -0700, sean.upton@uniontrib.com wrote:
> Robert Collins wrote:
> >The _only_ content worth accelerating is static
> >content. Dynamic content - changing content - will never have the same
> >hit ratio in a http-accelerator, and thus does not make as effective use
> >of the acceleration. There is a class of semi-dynamic data that is also
> >worth accelerating, but that is a different discussion.
>
> I'm not sure if I completely agree with this... I get the feeling that for
> purely static content, unless you can benefit from a cache hierarchy, an
> accelerated http server like TUX w/ Zero Copy kernel patches is going to
> serve those static files quicker (or for that matter a farm of nodes like
> this).

Sure. Thats getting into the boundary line between web server and
content distribution though :]. I'm not saying that squid is a panacea
though.. Lets split this logically into
client-side nodes and
data storing nodes.
And lets also propose that we have 50Gb of data, of which 5Gb is in
active use - but we can't predict day to day which 5Gb :}.

* Flexible disk allocation
Tux, while soooo fast, doesn't adjust it's content automatically, so you
need a lovely disk virtualization environment or many servers with >50Gb
of disk. (This may or may not be an issue - but the nice thing about the
1ru servers available today, is that while you only get 36-72Gb
(depending on whether you mirror or not) pernode, you can get a ton of
them into a rack.
Squid, give each node 10Gb to play with, and you'll stay very close to
optimal. Now squid isn't as fast as tux - but it's being worked on :}.
For smaller datasets squid will probably be as cost-effective. - but on
to the next point -

* Content management
Tux - rsync or virtualised disks or something similar to one of those
two.
Squid - nothing needed.

> Or for static text/html, Apache with mod_gzip. There must be a

mod_gzip is neat. I've alpha-quality transfer-encoding running with
squid, as content encoding alteration is not recommended for proxies. In
acceleration mode however, it's much more allowable - someday I'll get
onto that :].

> reason some of us running caching accelerators are doing just that, given
> all the other options available out there: that reason is "predictable"
> dynamic content, of which much is, in fact, cacheable. Perhaps this is
> what you mean by 'semi-dynamic' data? IHMO, using Squid as an accelerator
> provides the best balance for accelerating the widest range of content for
> many applications, including static and dynamic content.

Yes. Search result caching for example, or the data from a discussion
group - it changes, but not on every request. I agree completely. The
point I was making about static data is that it has a moderate
management overhead for all non-acceleration solutions, and acceleration
solutions quite neatly auto-adjust to changing conditions.

> My company, for example, uses app servers that dynamically publish content,
> which generally is the same for all users who browse or search the site.
> Everything, for example, in one of our newest applications is
> cache-friendly: search results and browsing are all dynamic, CPU-intensive
> database driven events, and we use GET requests for everything, which means
> near everything is cachable.
>
> The difficulty, of course, is that a certain _class_ of dynamic data is not
> cacheable: anything heavily personalized; some of this limitation can be
> overcome. Small amounts of personalization can (in a limited sense) be done
> on the client-side with Javascript and cookies. For example, in e-commerce,
> someone's shopping cart view page is NOT cached, and it sets a cookie for
> the number of items in the cart every time it is refreshed. Other 'catalog
> viewing' pages (i.e. looking at an entry for a book on Amazon) on the site
> can be cached, but a message at the top of the page saying 'you have 7 items
> in your cart' could be done from the client side (via scripting) from a
> cached page because of a previously set cookie... I guess what I am saying
> is that caching requires app design considerations in dynamic content, but
> that this is a very appropriate use-case for a proxy cache as an http
> accelerator.

Yes. Here you've really crossed the boundary into truely dynamic data.

> And the HIT ratios are good: we have an online newspaper classified ad
> search system that searches about 18-20k ads at any given time... that setup
> behind squid as an accelerator averages about an 88% HIT ratio (including,
> of course, images); I would estimate that at least 80% of the most popular
> 'entry' and 'browse' page views are cached, and 30% of search result lists
> are cached. I don't think this is too bad (especially since most page views
> in our application are search/browse result lists involving catalog queries
> / BTree traversals in an object database), because the ones that do get HITs
> are the most demanded by our users: the most popular content will also be
> the fastest.

Excellent stats there.

> One might say, caching like this should be done within the app server you
> are using.

Thats only got marginal use - if when you find you have to scale to
mutliple app serversm you lose 50% of that benefit :].

> Sure, but why not cached at the proxy too? The app server we
> use (Zope) has cache managers for both internal RAM-based caching of
> executed code, as well as cache managers for HTTP headers used in an http
> accelerator like Squid.
>
> I guess I see a lot of value in using Squid as an accelerator for dynamic
> content. I'm sure others' mileage varies...

I was a bit harsh in my comment - but what you are describing as dynamic
I would describe as semi-dynamic - this is my other discussion. And I
see *huge* value in accelerating dynamically generated content that does
not change on every request.

Rob

> Sean
>
> =========================
> Sean Upton
> Senior Programmer/Analyst
> SignOnSanDiego.com
> The San Diego Union-Tribune
> 619.718.5241
> sean.upton@uniontrib.com
> =========================
Received on Fri Aug 17 2001 - 19:43:12 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:01:43 MST