On 3 Nov 2000, at 23:00, Henrik Nordstrom <hno@hem.passagen.se> wrote:
> > I expect FS meta hitrate be low - it is increasingly easy to build boxes
> > that have disk/ram ratio of 200 and more (200G disk, 1G ram).
>
> Yes, a general purpose FS with general purpose tuning will have a way too
> low hit rate, and this is partly experienced in the reiserfs-raw
> approach (all-misses being a worse load to handle than all-hits), but
> still I think they have proven the concept.
I agree reiserfs is an interesting concept. Yet I'm preaching ;) that
squid MUST NOT (in rfc speak) loose in speed of miss-detection.
I think we must have a way to detect a squidFS miss with at least 99%
correctness, without touching disks, otherwise it IMHO isn't good as
a general solution.
If we can do this with "microdigests" (I'm unfamiliar with digests and
their issues), we can instantly append any new object to microdigest,
then I agree this could be a good approach to go for.
> > ICP - shouldn't ever touch disks to return hit/miss. 2 boxes with equal
> > load and average hitrate of 30% would see 3 times more ICP traffic than
> > actual fetches from peers. We'd rather not allow this to touch disks.
>
> Well, usually I do not consider ICP as an useable option, and take the
> freedom not to consider it's requirements when thinking about store
> implementation. It should be possible to implement ICP with reasonable
> performance if the normal cache operations can be implemented with good
> performance.
I think you are underestimating importance of ICP. Yes it has lots of
problems, yet its the best we have so far if we need to implement a
tightly coupled (in terms of cache content) cache cluster.
> > Based on what do we generate a digest, at startup? First it sounds like we
> > need to access/stat every object on disks to get the MD5 key for digest.
> > ICP shouldn't return hits for stale objects, so object timestamps are
> > needed during ICP request. refcounts and timestamps are also needed for
> > the replacement policys to work.
>
> With a design like the ReiserFS-raw/butterfly we cannot generate a
> digest in a practical manner simply due to the fact that doing so would
> require reading in each and every object in the cache..
So we still have to keep swapstate.
> What is possible is to have the filesystem generate a digest-like hint
> map to speed up availability lookups.
what would it possibly look like? besides digests I mean.
> > long time ago Squid moved from diskbased (ala apache) lookups to rambased
> > lookups, now it seems we are moving towards diskbased again,
>
> Hmm.. Squid has never had a diskbased lookup, and I don't think Harvest
> cached had either..
yes, squid was immediately rambased. I meant that before that we had
only apache-style diskbased caches around.
> > although with much more efficient disk access, and to reduce ram usage.
> > Still, disk performance hasn't changed much, and if we make squid
> > performance dependant on disks too much, then I'm really worried.
>
> Hopefully it is possible combine the best of both worlds. Use ram for
> the things really needing high speed answers, and disk for most else.
>
> In a normal Squid index only a very small fraction is ever needed. Of
> the entries accessed, only very few fields are actually needed to be
> available in a speedy manner.
Indeed. MD5 and some criteria to determine staleness state are the only
things needed. All else can be pushed to disks and fetched only for
pending http requests. Even not full MD5 is needed, we can get away
with a subset of MD5 key, for eg. 4-8 bytes instead of 16bytes would
give reasonably high correctness.
> > I'm worried, probably because I don't see how you solve them.
> > But I'm curious to know.
>
> My idea (which Alex seems to share) is that the in-core requirement of
> metadata can most likely be compressed into a very compact hint map (ala
> a cache digests) which tells us within a reasonably high probability if
> the object is in the cache or not. Sort of like a (but vastly different)
> inode bit map.
That would be very nice. As Alex said we can use incremental digests.
Or did you mean something else?
------------------------------------
Andres Kroonmaa <andre@online.ee>
Delfi Online
Tel: 6501 731, Fax: 6501 708
Pärnu mnt. 158, Tallinn,
11317 Estonia
Received on Mon Nov 06 2000 - 09:34:12 MST
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:55 MST