On 28.02.2012 10:36, Francis Fauteux wrote:
> We are using a collection of Squid instances (version 2.7.stable9) as
> caching proxies behind a single gateway IP, processing requests from
> a
> large number of users.
>
By gateway IP do you mean a NAT gateway? or each Squid instance setting
its outgoing IP to the same value?
If this is a NAT gateway could it be simple NAT table pair/tuplet
limitations?
> We've observed that a number of websites throttle our usage when
> requests targeting a given domain are processed by two or more
> instances. This does not occur when all requests to this domain are
> processed by a single instance.
>
> We are trying to find the root cause for this behaviour, and the fact
> that it does not occur with a single Squid instance may help us
> diagnose. From the origin server's perspective, only two changes are
> visible between using a single instance and using two or more:
>
> * The value injected in the 'Via' header differs between Squid
> instances. The web server may not expect requests coming from a
> single
> IP to contain different values for the HTTP 'Via' header. This is
> something we can investigate ourselves, but input would be welcome.
If the web server is in fact doing such checks it is in violation of
HTTP specification. HTTP is message-based in the same model as TCP is
packet-based. Which route the message/packet took is mostly irrelevant,
although they cold be checking it for security access that should not
have side effects like this. Your multiple instances could even share
the same packet connection and expect it to work (er, Squid does
pipelining multiplexing).
There is no way the server can rely on a specific one of these chaining
scenarios:
client->A->server
client->B->server
client->A->B->server
and speaking of those scenarios, it seems more likely to me that third
scenario is happening to you. Each layer of proxying adds latency, so
messages doing the A->B hop could appear slower (throttled?) than when
its not present. The CARP design is specifically tuned to make such
multi-hop layering efficient, but generic peer clusters doing it can
slow things down.
>
> * If each Squid limits the number of connections to a given server,
> using several instances may cause the origin server to see a number
> of
> connections which exceeds what they expect to see from a single IP.
> This is the question for this forum: does Squid actually limit the
> number of per-server connections? Is this number configurable (either
> in squid.conf or by rebuilding)?
The default is not to limit. You can configure a limit on clients if
you wish.
If this is relevant it would be in the form of a client connection
limit at the server end.
>
> Note that each affected website resolves to a single IP; Squid
> instances are not receiving different IPs from DNS servers.
>
Amos
Received on Mon Feb 27 2012 - 22:31:06 MST
This archive was generated by hypermail 2.2.0 : Tue Feb 28 2012 - 12:00:10 MST