On Thu, Apr 17, 2008 at 08:11:51AM +0800, Adrian Chadd wrote:
> The problem with caching Youtube (and other CDN content) is that
> the same content is found at lots of different URLs/hosts. This
> unfortunately means you'll end up caching multiple copies of the
> same content and (almost!) never see hits.
>
> Squid-2.7 -should- be quite stable. I'd suggest just running it from
> source. Hopefully Henrik will find some spare time to roll 2.6.STABLE19
> and 2.7.STABLE1 soon so 2.7 will appear in distributions.
Thanks Adrian. FYI I got this to work with 2.7 (latest) based off the
instructions you provided earlier. Here is my final config and the
perl script used to generate the storage URL:
http_port 3128
append_domain .esri.com
acl apache rep_header Server ^Apache
broken_vary_encoding allow apache
maximum_object_size 4194240 KB
maximum_object_size_in_memory 1024 KB
access_log /usr/local/squid/var/logs/access.log squid
# Some refresh patterns including YouTube -- although YouTube probably needs to
# be adjusted.
refresh_pattern ^ftp: 1440 20% 10080
refresh_pattern ^gopher: 1440 0% 1440
refresh_pattern -i \.flv$ 10080 90% 999999 ignore-no-cache override-expire ignore-private
refresh_pattern ^http://sjl-v[0-9]+\.sjl\.youtube\.com 10080 90% 999999 ignore-no-cache override-expire ignore-private
refresh_pattern get_video\?video_id 10080 90% 999999 ignore-no-cache override-expire ignore-private
refresh_pattern youtube\.com/get_video\? 10080 90% 999999 ignore-no-cache override-expire ignore-private
refresh_pattern . 0 20% 4320
acl all src 0.0.0.0/0.0.0.0
acl esri src 10.0.0.0/255.0.0.0
acl manager proto cache_object
acl localhost src 127.0.0.1/255.255.255.255
acl to_localhost dst 127.0.0.0/8
acl SSL_ports port 443
acl Safe_ports port 80 # http
acl Safe_ports port 21 # ftp
acl Safe_ports port 443 # https
acl Safe_ports port 70 # gopher
acl Safe_ports port 210 # wais
acl Safe_ports port 1025-65535 # unregistered ports
acl Safe_ports port 280 # http-mgmt
acl Safe_ports port 488 # gss-http
acl Safe_ports port 591 # filemaker
acl Safe_ports port 777 # multiling http
acl CONNECT method CONNECT
# Some Youtube ACL's
acl youtube dstdomain .youtube.com .googlevideo.com .video.google.com .video.google.com.au
acl youtubeip dst 74.125.15.0/24
acl youtubeip dst 64.15.0.0/16
cache allow youtube
cache allow youtubeip
cache allow esri
# These are from http://wiki.squid-cache.org/Features/StoreUrlRewrite
acl store_rewrite_list dstdomain mt.google.com mt0.google.com mt1.google.com mt2.google.com
acl store_rewrite_list dstdomain mt3.google.com
acl store_rewrite_list dstdomain kh.google.com kh0.google.com kh1.google.com kh2.google.com
acl store_rewrite_list dstdomain kh3.google.com
acl store_rewrite_list dstdomain kh.google.com.au kh0.google.com.au kh1.google.com.au
acl store_rewrite_list dstdomain kh2.google.com.au kh3.google.com.au
# This needs to be narrowed down quite a bit!
acl store_rewrite_list dstdomain .youtube.com
storeurl_access allow store_rewrite_list
storeurl_access deny all
storeurl_rewrite_program /usr/local/bin/store_url_rewrite
http_access allow manager localhost
http_access deny manager
http_access deny !Safe_ports
http_access deny CONNECT !SSL_ports
http_access allow localhost
http_access allow esri
http_access deny all
http_reply_access allow all
icp_access allow all
coredump_dir /usr/local/squid/var/cache
# YouTube options.
quick_abort_min -1 KB
# This will block other streaming media. Maybe we don't want this, but using
# it for now.
hierarchy_stoplist cgi-bin ?
acl QUERY urlpath_regex cgi-bin \?
cache deny QUERY
And here is the store_url_rewrite script. I added some logging:
#!/usr/bin/perl
use IO::File;
use IO::Socket::INET;
use IO::Pipe;
$| = 1;
$fh = new IO::File("/tmp/debug.log", "a");
$fh->print("Hello!\n");
$fh->flush();
while (<>) {
chomp;
#print LOG "Orig URL: " . $_ . "\n";
$fh->print("Orig URL: " . $_ . "\n");
if (m/kh(.*?)\.google\.com(.*?)\/(.*?) /) {
print "http://keyhole-srv.google.com" . $2 . ".SQUIDINTERNAL/" . $3 . "\n";
# print STDERR "KEYHOLE\n";
} elsif (m/mt(.*?)\.google\.com(.*?)\/(.*?) /) {
print "http://map-srv.google.com" . $2 . ".SQUIDINTERNAL/" . $3 . "\n";
# print STDERR "MAPSRV\n";
} elsif (m/^http:\/\/([A-Za-z]*?)-(.*?)\.(.*)\.youtube\.com\/get_video\?video_id=([^&]+).* /) {
print "http://video-srv.youtube.com.SQUIDINTERNAL/get_video?video_id=" . $4 . "\n";
$fh->print("http://video-srv.youtube.com.SQUIDINTERNAL/get_video?video_id=" . $4 . "\n");
$fh->flush();
} elsif (m/^http:\/\/([A-Za-z]*?)-(.*?)\.(.*)\.youtube\.com\/get_video\?video_id=(.*) /) {
# http://lax-v290.lax.youtube.com/get_video?video_id=jqx1ZmzX0k0
print "http://video-srv.youtube.com.SQUIDINTERNAL/get_video?video_id=" . $4 . "\n";
} else {
print $_ . "\n";
}
}
Could likely remove the last elsif block at this point as it's catching
on the previous one now. But this is working great! Probably some
tuning yet to be done. Maybe someone could update the wiki with the
new regexp syntax.
Ray
Received on Tue Apr 22 2008 - 16:11:58 MDT
This archive was generated by hypermail 2.2.0 : Thu May 01 2008 - 12:00:04 MDT