Hi,
I hope that someone on this group can give me some pointers. I have a squid proxy setup running version 2.6 stable 17 of squid. I recently upgraded from a very old version of squid, 2.4 something. The proxy sits in front of a search appliance and all search requests goes through the proxy.
One of my requirements is to have all search requests for cache:SOMEURL go to a URL rewrite program that compares the requested URL to a list of URLs that have been blacklisted. These URLs are one per line in a text file. Any line that starts with # or is blank is discarded by the url_rewrite_program. This Perl program seemed to work fine in the old version but now it doesn't work at all.
Here is the relevant portion of my Squid conf file:
-------------------------------------------------------------------------------
http_port 80 defaultsite=linsquid1o.myhost.com accel
url_rewrite_program /webroot/squid/imo/redir.pl
url_rewrite_children 10
cache_peer searchapp3o.myhost.com parent 80 0 no-query originserver name=searchapp proxy-only
cache_peer linsquid1o.myhost.com parent 9000 0 no-query originserver name=searchproxy proxy-only
acl bin urlpath_regex ^/cgi-bin/
cache_peer_access searchproxy allow bin
cache_peer_access searchapp deny bin
Here is the Perl program
-------------------------------------------------------------------------------
#!/usr/bin/perl
$| = 1;
my $CACHE_DENIED_URL = "http://www.mysite.com/mypage/pageDenied.intel";
my $PATTERNS_FILE = "/webroot/squid/blocked.txt";
my $UPDATE_FREQ_SECONDS = 60;
my $last_update = 0;
my $last_modified = 0;
my $match_function;
my $url, $remote_host, $ident, $method, $urlgroup;
my $cache_url;
my @patterns;
while (<>) {
chomp;
($url, $remote_host, $ident, $method, $urlgroup) = split;
&update_patterns();
$cache_url = &cache_url($url);
if ($cache_url) {
&update_patterns();
if (&$match_function($cache_url)) {
$cache_url = &url_encode($cache_url);
print "302:$CACHE_DENIED_URL?URL=$cache_url\n";
next;
}
}
print "\n";
}
sub update_patterns {
my $now = time();
if ($now > $last_update + $UPDATE_FREQ_SECONDS) {
my @a = stat($PATTERNS_FILE);
my $mtime = $a[9];
if ($mtime != $last_modified) {
@patterns = &get_patterns();
$match_function = build_match_function(@patterns);
$last_modified = $mtime;
}
}
}
sub get_patterns {
my @p = ();
my $p = "";
open PATTERNS, "< $PATTERNS_FILE" or die "Unable to open patterns file. $!";
while (<PATTERNS>) {
chomp;
if (!/^\s*#/ && !/^\s*$/) { # disregard comments and empty lines.
$p = $_;
$p =~ s#\/#\\/#g;
$p =~ s/^\s+//g;
$p =~ s/\s+$//g;
if (&is_valid_pattern($p)) {
push(@p, $p);
}
}
}
close PATTERNS;
return @p;
}
sub is_valid_pattern {
my $pat = shift;
return eval { "" =~ m|$pat|; 1 } || 0;
}
sub build_match_function {
my @p = @_;
my $expr = join(' || ', map { "\$_[0] =~ m/$p[$_]/io" } (0..$#p));
my $mf = eval "sub { $expr }";
die "Failed to build match function: $@" if $@;
return $mf;
}
sub cache_url {
my $url = @_[0];
(my $script, $qs) = split(/\?/, $url);
if ($qs) {
my $param, $name, $value;
my @params = split(/&/, $qs);
foreach $param (@params) {
($name, $value) = split(/=/, $param);
$value =~ tr/+/ /;
$value =~ s/%([\dA-Fa-f][\dA-Fa-f])/pack("C", hex($1))/eg;
if ($value =~ /cache:([A-z0-9]{7,20}:)?([A-z]+:\/\/)?([^ ]+)/) {
if ($2) {
return $2 . $3;
} else {
# return "http://" . $3;
return $3;
}
}
}
}
return "";
}
sub url_encode {
my $str = @_[0];
$str =~ tr/ /+/;
$str =~ s/([\?&=:\/#])/sprintf("%%%02x", ord($1))/eg;
return $str;
}
Below is a sample of the blocked URLs file
################################################################################
#
# URL Patterns to be Blocked
#---------------------------
# This file contains URL patterns which should be blocked
# in requests to the Google cache.
#
# The URL patterns should be entered one per line.
# Blank lines and lines that begin with a hash mark (#)
# are ignored.
#
# Anything that will work inside a Perl regular expression
# should work.
#
# Examples:
# http://www.bad.host/bad_directory/
# ^ftp:
# bad_file.html$
################################################################################
# Enter URLs below this line
################################################################################
www.badsite.com/
So my question, is there a better way of doing this?
Does someone see anything wrong that is keeping this from working in 2.6?
Thanks,
Martin C. Jacobson (Jake)
Received on Thu Jul 03 2008 - 18:35:10 MDT
This archive was generated by hypermail 2.2.0 : Fri Jul 04 2008 - 12:00:02 MDT