Hi Andre, hi all:
Let's have another try ;-)
Suppose we have three Squid boxes in a cluster (let's call them A, B and
C, respectively), all configured to talk to each other through ICP. Here
is the problem we met:
Client sent an HTTP request to A. A did not have the corresponding
object in his local cache, so he queried B and C through ICP. Sibling B
replied with an UDP_MISS, which was a normal behavior. What confused us
was what machine C did:
1186606560.930 0 IP_OF_MACHINE_A UDP_HIT/000 115 ICP_QUERY
http://www.example.com/dynamic.js - NONE/- -
1186606560.932 0 IP_OF_MACHINE_A TCP_MISS/504 1949 GET
http://www.example.com/dynamic.js - NONE/- text/html
What following UDP_HIT was a TCP_MISS/504, which means that machine C
had that object in his local cache, but A failed to fetch it due to some
weird timeout error.
I'm not sure where this 504 came from, and I don't think it's a
configuration problem, becase it was just 2 ms later than the
corresponding UDP_HIT message, and I have never set any timeout related
value to that extreme.
Then, machine C released the object (504 error message instead of the
expected content?) from memory:
1186606560.932 RELEASE -1 FFFFFFFF
381F892DF3928A903A3DF921D2FF27A9 504 1186606560 0 1186606560
text/html 1650/1896 GET http://www.example.com/dynamic.js
Below are the corresponding logs from machine A:
access.log:
1186606561.024 93 IP_OF_CLIENT_MACHINE TCP_MISS/200 10939 GET
http://www.example.com/dynamic.js - DIRECT/IP_OF_BACKEND_SERVER
application/x-javascript
store.log:
1186606561.024 RELEASE -1 FFFFFFFF
9771AFBBB9036CA86486A7DE01F33538 200 1186606560 -1 1186649760
application/x-javascript -1/10675 GET
http://www.example.com/dynamic.js
Which means machine A fetched the object from backend server, served it
to the requesting client, and then released it from memory
*immediately*.
Squid-2.5.STABLE14[1] on Linux 2.6.18-4-amd64; A, B and C are all
connected to the same switch, so there is little chance for that to be a
network problem.
Timeout related settings:
icp_query_timeout 50
maximum_icp_query_timeout 50
forward_timeout 4 minutes
connect_timeout 1 minute
peer_connect_timeout 30 seconds
read_timeout 15 minutes
request_timeout 5 minutes
persistent_request_timeout 1 minute
pconn_timeout 120 seconds
Anyone has any clue? Thanks very much!
- Ding Deng
[1] Yes, we know that we should try v2.6 first and see if the problem
still occurs, but it's difficult to do that in a production environment
(you know that, right? ;-), and our boss is way harder to persuade than
you may imagine ;-(
"andre wang" <andre.ease@gmail.com> writes:
> HI ALL:
>
> We are running Squid 2.5STABLE14 on Linux machines trying to run a
> cluster of caches in a siblings peering arrangement using multicast
> for ICP queries. The caches seem to be talking to each other fine.
>
> When the client sends a HTTP requested that isn't cached on the
> configured cache, the cache sends out an ICP multicast query, all
> other caches recieve this fine and respond. Either with UDP_MISS or
> UDP_HIT. The problem is, if the other caches respond with a UDP_HIT
> the orginal cache still fetches the object directly, rather than
> fetching the object from the sibling. Why?
>
> And I have checked the access.log, got these:
>
> On the first cache (172.19.0.229) 1187773057.113 3 222.220.132.48
> TCP_MISS/200 315 GEThttp://XXXXX - DIRECT/XXXX
>
> On the sibling cache (172.19.0.228) 1187773057.002 0 172.19.0.229
> UDP_HIT/000 108 ICP_QUERYhttp://XXXXXX - NONE/- -
>
> Any idear?
> Thanks
Received on Thu Aug 23 2007 - 02:40:37 MDT
This archive was generated by hypermail pre-2.1.9 : Sat Sep 01 2007 - 12:00:03 MDT