Re: html prefetching from Andres Kroonmaa on 2000-06-02 (squid-dev)

From: Andres Kroonmaa <andre@dont-contact.us>
Date: Sat, 3 Jun 2000 01:42:15 +0200

On 3 Jun 2000, at 0:05, Henrik Nordstrom <hno@hem.passagen.se> wrote:

> Andres Kroonmaa wrote:
>
> > I believe that assumtion that tags found in html would be
> > fetched shortly has very high probability of being true.
> >
> > What thoughts you have about hacking such feature into squid?
>
> Basically don't like it.
>
> However, some measurements are acceptable I think:
>
> a) Making sure the destination host is known (DNS lookup)
>
> b) Make sure there is a connection ready to handle the request when (if)
> it arrives. But watch out for initial request timeouts...
>
> I don't think prefetching of the actual objects is a good idea, and
> certainly not in doing it in quick parallell bursts. Doing so only adds
> to the overall overload of the networks. I.e. you gain some benefit at
> the cost of all others. Evil greedines.

That was my first reaction also. But they have a point. lets see.
First purpose of caching was to conserve bandwidth, to allow for more
content be transferred over same link and thus get more for less.
Enduser's perceived speedup occurs only on some 30% of content that is
cacheable, and works nice only for small usergroups.
What endusers doesn't care about any more is availale bandwidth, all
they care about is decent performance for content that is NOT already
cached. They don't measure bits/sec, they measure their wasted time.
Bandwidth itself is also of much less concern with highspeed SAT links,
much more concern comes from unavoidable latency between user and origin
server. Very simple calculation: given rtt 300msec and 50 gifs on a page,
it takes at least 15 secs to fetch whole page even in theory. In reality
anywhere from 15-60+ secs. This is simply becoming unacceptable.
Cache hierarchy adds more latency per object, link's congestion ditto.
Point is to mask dirty work from enduser, look one step ahead. Basically,
this is an acceleration method for misses, that would make it look like
hits to user.
Parallel fetching is very good for SAT links. With serial access it is
very difficult to saturate. 20Mbps SAT link with 300msec rtt requires
over 750KB of tcp buffering to saturate. Only very many concurrent
users can saturate the link, far more many than the 20M link can satisfy
from bandwidth point of view. so, nasty oversubscription is coded into
high-latency links, no matter how fast, and the only way to utilise them
in full is by parallel streams. http is full of parallelism. Unfortunately,
browsers and their OSes are written by guys who think that the whole
internet is composed only from gbit ethernet, so I don't expect browsers
to do that by themselves very soon

As per adding to network overload, this isn't totally true. Every added
parallel tcp connection adds no more than 2 packets for each start and 2
for each teardown of a connection, some 200 bytes. All other traffic
amount is the same. This can't be serious overhead.

What would happen is that a single person can fetch more in less time,
surf faster, go more places. This could be a problem for a link owner,
although it shouldn't. Here one has to make it clear whether the goal
is to conserve bandwidth (by means of slowing down users) or to provide
faster web access. People are not robots, if they fetch a page, they
won't blast further right away, they'll read the page for at least few
seconds. I don't recall any studies about average time spent on a page,
but I believe this time pretty much flattens the sudden bursts of
traffic. So, for "normal" users, parallel fetches would not increase
network overload. But people would be happier.
You can view it as gathering long and low spikes of traffic into high
but short spikes, where the average does not change. Given that you
have to download 60KB of content from the web page, it doesnt matter
to network too much whether you do it in 2 secs or 200, but to user
it matters.

OF course, prefetching could be made somewhat conservative. Fetching
html doesn't always mean that gifs would be fetched. (lynx, wap?), but
for now, it is almost 99% sure. Besides, while fetching parent html
we can use time taken to judge whether there is any point at all to
prefetch anything. Perhaps server is so close and fast that it gives
nothing.

------------------------------------
Andres Kroonmaa <andre@online.ee>
Network Development Manager
Delfi Online
Tel: 6501 731, Fax: 6501 708
Pärnu mnt. 158, Tallinn,
11317 Estonia
Received on Fri Jun 02 2000 - 17:44:18 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:28 MST