Squid's neighbor selection algorithm is roughly as follows:
- Send ICP queries
- Begin fetching upon receipt of first HIT, or
- Wait for all (MISS) replies to arrive, or
- Timeout after T seconds (default 2).
- Fetch from the parent with best weighted RTT.
- Fetch from the source if not behind a firewall.
- Pick one of the parents via various means.
This algorithm has a couple of shortcomings
- Doesn't work well in mulicast world. i.e. with multicast
we probably don't want to send MISS replies, only HITs.
- The first HIT reply is not necessarily the best choice.
Maybe we want to wait for further HIT replies.
- When sending a lot of ICP queries (e.g. more than 4)
there is a higher chance that at least one reply will
never arrive so we suffer the timeout. Maybe want
to wait for configurable number/percent of replies
before starting the fetch.
- Squid currently only chooses *one* place to fetch the object
from. If that one place fails, it gives up.
A proposal
It would be nice to support some missing features above, e.g.:
- Wait for some number of replies, then make decision about
where to fetch object.
- Generate a list of places to fetch the object from, try
them in order with a fallback to the source after some
timeout.
- There can be many other factors in the decision about where
to fetch an object from:
- RTT between local cache and peer cache
- RTT between peer cache and source
- age of the object in peer cache
- more...?
I would like to have a mathematical algorithm which could be used to
rank the candidate sources. Each factor in the
decision can be assigned a relative weight. For example
- HIT or MISS: HIT=2, MISS=0
- PARENT or SIBLING: PARENT=1, SIBLING=0
- Weighted RTT between local cache and peer cache. Scaled from 0 to 1
based on the minimum and maximum weights for all sources.
- RTT between peer cache and source. Scaled from 0 to 1
based on the minimum and maximum RTT's for all sources.
- Age, scaled 0 to 1...
We also probably need at least two more configuration options:
- Begin fetching after receiving X HITS
- Begin fetching after receiving Y replies.
- Some configurations will send a different number of queries for different
URLs. May need to specify above as percents also/instead.
Ranking candidate sources in this manner will certainly require more CPU cycles.
Is the tradeoff worth it?
neighbor-selection-proposal.html,v 1.3 1997/01/10 21:01:19 wessels Exp