Re: Validation code.

From: Robert Collins <robert.collins@dont-contact.us>
Date: Fri, 1 Dec 2000 10:05:17 +1100

----- Original Message -----
From: "Duane Wessels" <wessels@squid-cache.org>
To: "Robert Collins" <robert.collins@itdomain.com.au>
Cc: <squid-dev@squid-cache.org>
Sent: Friday, December 01, 2000 9:41 AM
Subject: Re: Validation code.

>
>
> On Sun, 26 Nov 2000, Robert Collins wrote:
>
> > so the cleanup code could that calls doublecheck could call it on a new file
> > being written, thus causes a problem.
>
> Can you be more specific. What bad things are going to happen in this
> case? Unlinking an open file is not bad (at least not on unix) because
> any other thread that has the file open for reading and writing can
> continue reading and writing.

We lose the swapfile. And if it's a big one, that will affect our hit ratio. (for example I reboot squid, a client starts
downloading a service pack. Yes the file is written, but it's released as soon as the client is finished.

>
> > Finally is there any reason the storeCleanup code can't be part of the
> > rebuild? Now the disk checking is a background task, the Cleanup routine
> > just counts the file sizes and sets the VALIDATED bit.
>
> Actually I was thinking that storeCleanup() can probably go away
> entirely. The VALIDATED bit is less useful now that the
> swapin code is more robust (checking size, MD5, URL).

Do you mean the swap file size check or the sane metadata check? It was the file size checking that you pointed out as slowwwwwing
the whole thing down - so I got rid of that again and put it in the background check.

> Also I think there is really no difference between a clean and
> dirty rebuild, so that can disappear as well.
>

In reverse order (it'll make more sense).

The clean and dirty rebuild seem quite different to me: the clean code just reads the file into memory. It doesn't perform any
checks to see if there are duplicate entries/MD5 is wrong etc. So it is very fast. The rebuild from dirty checks for previously read
in entries conflicting URI's cancelled downloads etc etc. Getting rid of the concept of clean vs dirty means those checks will occur
every time - slowing it down.

As an example my workstation running win2k rebuilds around 6000 entries per second on clean, and around 5000 on dirty. But it may
make for difference for a large store size on a big server.

Onto the VALIDATED bit: If we can get rid of that and assume that once it's in memory and the store log has been completely rebuilt
that the object is valid, then lets get rid of it. I do suggest that we only bring the store dir online once the rebuild is
finished - to prevent trying to serve out an old hit (which the rebuild from dirty code corrects by the time the log is fully read).

maybe on a rebuild from directory we could mark the store as hit-only immediately because we know that there will be no collisions
between the objects, and once the directories are checked allow writes to occur?

this will still allow removal of the storeCleanup() routine, and should provide earlier hits on rebuilds than we get today.

Comments? I'm happy to do this as part of the store_check stuff... or maybe it should be a 2.5 project?
Received on Thu Nov 30 2000 - 15:57:38 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:13:01 MST