Erm sorry, I did something wrong that it ran so much.
Actually the script given by Kirk is great, and fast too. :p
Sorry for any inconvenience.
(It was checking some wrong fields, I am not sure why it ran for so long,
but after correcting it
it finished the 60Mb file in like 10-20 seconds).
"Endre Szekely-Bencedi"
<Endre.Szekely-Bencedi@h To: Kirk Schneider <kschneider@raytheon.com>
u-tcs.com> cc: squid-users@squid-cache.org
Subject: Re: [squid-users] Calamaris
03/02/2004 10:41 AM
Great script and advices indeed.
Now the other problem, running this 'cleaning' script takes now an
estimated 40 minutes on my proxy
machine (60Mb of logs, result of 2 very active days, normally a week).
Should I perhaps rotate the squidlogs
then? And run this script daily in a crontab on the freshly rotated log
only? I think this would be a solution,
any other ideas?
Thanks,
Endre.
Kirk Schneider
<kschneider@ray To: Endre Szekely-Bencedi
<Endre.Szekely-Bencedi@hu-tcs.com>
theon.com> cc:
squid-users@squid-cache.org
Subject: Re: [squid-users]
Calamaris
03/01/2004
07:11 PM
Endre,
I have contacted the Calamaris author before on this and he has
suggested filtering the extra fields that smartfilter adds at
the end.
Now I run this on all my logs before piping to calamaris:
awk '{print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10}' access.log |calamaris
-- Kirk Schneider 972-952-4645 (work) Raytheon Corporate IT Security 214-912-8679 (cell) kschneider@raytheon.com 888-431-7621 (pager) "If you think the problem is bad now just wait until we've solved it." -------- Original Message -------- Subject: [squid-users] Calamaris Date: Mon, 1 Mar 2004 17:43:52 +0100 From: Endre Szekely-Bencedi <Endre.Szekely-Bencedi@hu-tcs.com> To: squid-users@squid-cache.org Hello List, I have a problem with Calamaris (v2.58). I am using squid 2.5stable3, compiled from sources, with SmartFilter plugin. As far as I know, I have to use the squid-extended input type for this. But this will give some errors: [root@localhost logs]# date;cat test.log | /usr/local/squid/bin/calamaris -f squid-extended -F html > /var/www/html/calamaris2.html;date Mon Mar 1 17:44:08 CET 2004 Malformed UTF-8 character (unexpected non-continuation byte 0x31, immediately after start byte 0xf3) in split at (eval 1) line 20, <> line 369578. Malformed UTF-8 character (unexpected non-continuation byte 0x31, immediately after start byte 0xf3) in split at (eval 1) line 20, <> line 369578. Split loop at (eval 1) line 20, <> line 369578. Mon Mar 1 17:48:05 CET 2004 [root@localhost logs]# Generated log shows: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <META http-equiv=Content-Type content="text/html; charset=iso-8859-1"></HEAD> <BODY></BODY></HTML> Which is an empty page. A sample from the logfile: 1077780471.441 93 3.227.65.74 TCP_MISS/302 476 GET http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html ALLOW Portal Sites 1077780471.466 64 3.227.65.74 TCP_MISS/200 1722 GET http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html ALLOW Port al Sites 1077780471.479 72 3.227.65.74 TCP_MISS/302 477 GET http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html ALLOW Portal Sites 1077780471.508 59 3.227.65.74 TCP_MISS/302 477 GET http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html ALLOW Portal Sites 1077780471.699 73 3.227.65.74 TCP_MISS/200 1585 GET http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html ALLOW Port al Sites 1077780471.713 83 3.227.65.74 TCP_MISS/200 1607 GET http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html ALLOW Port al Sites 1077780471.726 86 3.227.65.74 TCP_MISS/200 1589 GET http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html ALLOW Port al Sites 1077780471.885 256 3.227.65.74 TCP_MISS/200 726 GET http://as.fotexnet.hu/adserver.ads/153/0///937480 - DEFAULT_PARENT/10.20.20.254 text/ht ml text/html ALLOW 1077780473.212 229 3.227.65.74 TCP_MISS/200 23713 GET http://index.hu/ad/lipton/banner1_120x240.swf? - DEFAULT_PARENT/10.20.20.254 applicat ion/x-shockwave-flash application/x-shockwave-flash ALLOW Portal Sites 1077780473.298 72 3.227.65.74 TCP_MISS/302 477 GET http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html ALLOW Portal Sites 1077780473.388 279 3.227.65.74 TCP_MISS/200 17697 GET http://index.hu/ad/microsoft_wss.swf? - DEFAULT_PARENT/10.20.20.254 application/x-sho ckwave-flash application/x-shockwave-flash ALLOW Portal Sites 1077780473.439 106 3.227.65.74 TCP_MISS/302 476 GET http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html ALLOW Portal Sites 1077780473.458 47 3.227.65.74 TCP_MISS/302 476 GET http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html ALLOW Portal Sites 1077780473.480 368 3.227.65.74 TCP_MISS/200 4292 GET http://as.fotexnet.hu/adserver.ads/196/0///27236 - DEFAULT_PARENT/10.20.20.254 text/ht ml text/html ALLOW 1077780473.643 162 3.227.65.74 TCP_MISS/302 477 GET http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html ALLOW Portal Sites 1077780473.646 144 3.227.65.74 TCP_MISS/302 477 GET http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html ALLOW Portal Sites 1077780473.673 487 3.227.65.74 TCP_MISS/200 10319 GET http://as.fotexnet.hu/adserver.ads/200/0///378158 - DEFAULT_PARENT/10.20.20.254 text/ html text/html ALLOW 1077780473.799 280 3.227.65.74 TCP_MISS/200 26216 GET http://index.hu/ad/teluzoallo_120x240.swf? - DEFAULT_PARENT/10.20.20.254 application/ x-shockwave-flash application/x-shockwave-flash ALLOW Portal Sites 1077780473.819 122 3.227.65.74 TCP_MISS/200 216 GET http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html ALLOW Porta l Sites 1077780473.824 124 3.227.65.74 TCP_MISS/200 355 GET http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html ALLOW Porta l Sites 1077780473.842 136 3.227.65.74 TCP_MISS/200 1603 GET http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html ALLOW Port al Sites 1077780473.846 47 3.227.65.74 TCP_MISS/200 353 GET http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html ALLOW Porta l Sites Am I doing something wrong? Thanks, Endre. "THIS E-MAIL MESSAGE ALONG WITH ANY ATTACHMENTS IS INTENDED ONLY FOR THE ADDRESSEE and may contain confidential and privileged information. If the reader of this message is not the intended recipient, you are notified that any dissemination, distribution or copy of this communication is strictly prohibited. If you have received this message by error, please notify us immediately, return the original mail to the sender and delete the message from your system." "THIS E-MAIL MESSAGE ALONG WITH ANY ATTACHMENTS IS INTENDED ONLY FOR THE ADDRESSEE and may contain confidential and privileged information. If the reader of this message is not the intended recipient, you are notified that any dissemination, distribution or copy of this communication is strictly prohibited. If you have received this message by error, please notify us immediately, return the original mail to the sender and delete the message from your system."Received on Tue Mar 02 2004 - 04:29:05 MST
This archive was generated by hypermail pre-2.1.9 : Thu Apr 01 2004 - 12:00:01 MST