On Mon, Oct 27, 2008 at 2:56 PM, Henrik Nordstrom
<henrik_at_henriknordstrom.net> wrote:
> On mån, 2008-10-27 at 12:23 -0700, Neil Harkins wrote:
>> The timeout is because the Content-Length header is bigger than the
>> payload it sent.
>> Every http client/server will hang in that situation. This isn't
>> simply a misreported
>> HIT<->MISS in the log, this is absolutely a significant bug where
>> collapsed forwarding is
>> mixing up the metadata from the two branches of our Vary:
>> Accept-Encoding (gzip and not),
>> i.e. giving the headers and content as non-gzip, but the amount of
>> payload it reads from
>> the cache and sends is based on the gzip size. Disabling
>> collapsed_forwarding fixed it.
>
> Please file a bug report on this. Preferably including "squid -k debug"
> cache.log output and "tcpdump -s 0 -w traffic.pcap" traces.
I'd like to help and see this get fixed, but as I said earlier,
it happens on about 16% of our test requests, only when
there's 750~1050 reqs/second going through the box,
and pretty much disappears under 500 reqs/s (off-peak).
I managed to catch only 1 instance of the timeout
in a 17 Gig cache.log with debugging enabled.
Is this except significant?:
2008/10/24 17:09:23| storeLocateVaryRead: accept-encoding="gzip"
0x3b28978 seen_offset=204 buf_offset=0 size=85
2008/10/24 17:09:23| storeLocateVaryRead: Key: 968D51EAA0C2BCF5688EAB92E8F56EE4
2008/10/24 17:09:23| storeLocateVaryRead: 0x3b28978 seen_offset=289 buf_offset=0
2008/10/24 17:09:23| storeClientCopy:
4F3F9F8F4461796C3C469E610B6E2D5C, seen 289, want 289, size 4096, cb
0x46169d, cbdata 0x3b28978
[snip]
2008/10/24 17:09:23| storeLocateVaryRead: accept-encoding="" 0x3b28978
seen_offset=204 buf_offset=0 size=178
2008/10/24 17:09:23| storeLocateVaryRead: Key: 968D51EAA0C2BCF5688EAB92E8F56EE4
2008/10/24 17:09:23| storeLocateVaryRead: MATCH!
968D51EAA0C2BCF5688EAB92E8F56EE4 (null)
2008/10/24 17:09:23| storeClientCopy:
4F3F9F8F4461796C3C469E610B6E2D5C, seen 382, want 382, size 4096, cb
0x46169d, cbdata 0x3b28978
Note that I've since changed our load balancer to rewrite "Accept-Encoding: "
to "Accept-Encoding: identity" in case squid didn't like a null header,
(although the example in the RFC implies that "Accept-Encoding: " is valid:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.3)
but the timeouts still happened, although I didn't grab debugging then.
-neil
Received on Mon Oct 27 2008 - 22:56:18 MDT
This archive was generated by hypermail 2.2.0 : Tue Oct 28 2008 - 12:00:03 MDT