Re: [squid-users] Extracting selected data from logfile

From: Frog <frog_at_rsf1.net>
Date: Fri, 20 Mar 2009 10:48:19 +0000 (GMT)

Hello all,

Thank you Chris for the suggestion. It helped enormously. I have extracted the data I was looking for by using the following:

tail -n 5000 access.log | grep "403" | awk '{print $1}' | uniq -d > file.txt

Best regards
Frog.

----- Original Message -----
From: "Chris Robertson" <crobertson_at_xxxxxx>
To: squid-users_at_squid-cache.org
Sent: Thursday, 19 March, 2009 21:37:25 GMT +00:00 GMT Britain, Ireland, Portugal
Subject: Re: [squid-users] Extracting selected data from logfile

Frog wrote:
> Hello All,
>
> Hopefully someone may be able to assist me.
>
> I have Squid setup here as a reverse proxy. I have logging configured using the following settings in squid.conf:
>
> logformat combined %>a %ui %un [%tl] "%rm %ru HTTP/%rv" %Hs %<st "%{Referer}>h" "%{User-Agent}>h" %Ss:%Sh
> access_log /var/log/squid/access.log combined
>
> To block certain bots and bad user agents I have the following:
>
> acl badbrowsers browser "/etc/squid/badbrowsers.conf"
> http_access deny badbrowsers
>
> The http_access deny returns a 403 to a visitor that meets the criteria in badbrowsers.conf and this works perfectly. But I would like to take this one step further. I would like to build a blacklist in real time if possible of IP addresses that have been served a 403 error.
>
> Unfortunately my knowledge of most of the popular scripting languages is non-existent so I was wondering if something like a redirector could be configured to meet my needs?
>
> I have looked at fail2ban however it doesn't seem to parse my log files even if I change the squid log format to common.
>
> Basically I am wondering if there is a way to parse the logfile to append to a new file any IP address that was served a 403.
>

Something like...

tail -n 5000 /path/to/access.log |grep "HTTP/[^"]*\" 403" |awk '{print $1}

...run from the command line should (on my GNU/Linux machine) search the
last 5000 lines (tail -n 5000) of the file at /path/to/access.log for
the string "HTTP/" followed by any number of characters that are NOT a
double quote, followed by a double quote, a space, and the string "403"
(grep "HTTP...). The first column from any lines with a matching
pattern will be printed (awk '{print$1}).

This is in no way tested, and obviously does not append to a file or run
automatically.

> Thank you in advance for any pointers.
>
> Frog..
>

Chris
Received on Fri Mar 20 2009 - 10:45:21 MDT

This archive was generated by hypermail 2.2.0 : Fri Mar 20 2009 - 12:00:03 MDT