Re: [squid-users] If not modified since is causing near-hits

From: David Raccah <david_at_raccah.org>
Date: Mon, 3 May 2010 23:34:32 -0700

Thanks for the help. I typed incorrectly. Essentially, we have
crawlers coming to our webpage, and they are using the
if-modified-since header.

The system is designed in a classic L1/L2 architecture. The L1 is
primarily a router and the L2 boxes contain the disk and memory cache.
 If the data is not found on the L2, the L2 calls the origin server,
which is slow-ish.

Based on the squid mgr info, most of the requests which reach the L2
squid ave the If-Modified-Since header. If the value that is being
passed is older than the one in the cache, L2 will respond with a
TCP_HIT. This is the happy path.

But if the value that is being passed in is equal to the one in the
cache (when the same robot comes back a few days later and is checking
for updates), the L2 goes to the origin server.

So the question is, can we set the configuration in some way, to
intrinsically trust the cache, and thereby ignore the
If-Modified-Since header, and use what is local? Of course, if there
is no actual hit, then go to the origin server.

Thanks!

On Mon, May 3, 2010 at 9:42 PM, Amos Jeffries <squid3_at_treenet.co.nz> wrote:
> On Mon, 3 May 2010 13:28:21 -0700, David Raccah <raccah_at_gmail.com> wrote:
>> Hello,
>>
>> Please excuse the newbie.  I checked most of the search engines on
>> squid pages and could not find what I was looking for.  Though it may
>> be because I did not use the correct keywords.
>>
>> So we have a large set of squid boxes sitting in front of some slow
>> running code.  The data is mostly static, so we use squid as a proxy
>> and it caches the data.  The TTL on the cache for now is 1 week or
>> more, and so we are saving the backend/origin from being pounded and
>> love it!!!  However, we are seeing a large number of near-hit instead
>> of pure hits.  For us a near-hit is equal to a miss, because it caches
>> the cache (L1 and L2) to go to the origin/backend.  We are using HTCP
>> to clear the cache when there is a change (much like wikipedia does),
>> so we can trust that our L2 is as close to fresh as possible.
>>
>> So:
>>
>> 1) Since we can guarantee that the L2 will have the latest
>> information, is there a way to ignore the "if-not-modified" header?
>>
>
> Depends on where it is being generated and exactly which if the If-*
> header it is.
> (there is no if-not-modified header).
>
>
>> 2) is there a way to declare the L2 cache as the origin-server instead
>> of just a parent cache - not a great approach, but need to mitigate
>> going to the origin if the L2 has a hit?
>
> Yes. Setting "originserver" on the parent cache_peer.
> However I think ICP/HTCP are not sent to origin servers.
>
>>
>> 3) is there a utility to update the timestamp of the cached objects.
>
> Maybe the squidpurge tool. I have not yet looked at it closely.
>
>
> Amos
>
Received on Tue May 04 2010 - 06:34:39 MDT

This archive was generated by hypermail 2.2.0 : Tue May 04 2010 - 12:00:03 MDT