Hi All:
Today I read recent 3 years' squid mailing list archieve on apache combined log format support issue: including 2.5 STABLE1 patch and squid2combined.pl etc...but I think most end users are administrators but not coder. For some sysadmin maight think combined log can be slove by use gnu textutils such as pr to merge these three log file.
Produce a combined like log as following:
1 enable referer in compile and config as following:
emulate_httpd_log on
referer_log /usr/local/squid/var/logs/referer.log
2 make combined log via 'pr' in merge mode and use awk adjust output field:
%pr -mJt access.log referer.log | awk '{print $1" "$2" "$3" "$4" "$5" "$6" "$7" "$8" "$9" "$10" \x22"$14"\x22 \x22"$11"\x22"}'
-m merge
-J join line
-t omit header and footer
The reason of not use useragent.log is it contains many without escaping user agents info and We can use "TCP_IMS_HIT:NONE" act as user agent for cache hit ratio statistic.
the output as following
...
192.168.0.10 - - [11/May/2003:01:13:21 +0800] "GET http://ant.chedong.com/images/jw_ec_logo_winner2002.gif HTTP/1.1" 304 206 "http://ant.chedong.com/projects.html" TCP_MISS:DIRECT
192.168.0.10 - - [11/May/2003:01:14:02 +0800] "GET http://ant.chedong.com/projects.html HTTP/1.1" 304 208 "http://ant.chedong.com/projects.html" TCP_IMS_HIT:NONE
192.168.0.10 - - [11/May/2003:01:14:02 +0800] "GET http://ant.chedong.com/images/jakarta-logo.gif HTTP/1.1" 304 207 "http://ant.chedong.com/projects.html" TCP_MISS:DIRECT
192.168.0.10 - - [11/May/2003:01:14:02 +0800] "GET http://ant.chedong.com/images/sdm_productivity_award.gif HTTP/1.1" 304 207 "http://ant.chedong.com/projects.html" TCP_MISS:DIRECT
192.168.0.10 - - [11/May/2003:01:14:02 +0800] "GET http://ant.chedong.com/images/jw_ec_logo_winner2002.gif HTTP/1.1" 304 206 "http://ant.chedong.com/projects.html" TCP_MISS:DIRECT
192.168.0.10 - - [11/May/2003:01:14:02 +0800] "GET http://ant.chedong.com/images/ant_logo_large.gif HTTP/1.1" 304 207 "http://ant.chedong.com/projects.html" TCP_MISS:DIRECT
192.168.0.10 - - [11/May/2003:01:14:03 +0800] "GET http://ant.chedong.com/projects.html HTTP/1.1" 304 208 "" TCP_IMS_HIT:NONE
192.168.0.10 - - [11/May/2003:01:14:03 +0800] "GET http://ant.chedong.com/images/jakarta-logo.gif HTTP/1.1" 304 207 "" TCP_MISS:DIRECT
...
PLEASE NOTICE the last few lines lost referer: referer log omitted direct access "-". and I checked lines of access.log and useragent.log is not equal to referer.log
%wc -l access.log useragent.log referer.log
44 access.log
44 useragent.log
38 referer.log <== lost direct access
126 total
I think if referer-log module logging direct access as "-" can correct above problem.
I checked the source code: client_side.c
find the httpHeaderGetStr may return null. so if browser access directly without referer the referer will not logging.
patch by added log "-" as default case:
@@ -980,11 +980,16 @@
#if USE_USERAGENT_LOG
if ((str = httpHeaderGetStr(req_hdr, HDR_USER_AGENT)))
logUserAgent(fqdnFromAddr(http->conn->log_addr), str);
+ else
+ logUserAgent(fqdnFromAddr(http->conn->log_addr), "-");
#endif
#if USE_REFERER_LOG
if ((str = httpHeaderGetStr(req_hdr, HDR_REFERER)))
logReferer(fqdnFromAddr(http->conn->log_addr), str,
http->log_uri);
+ else
+ logReferer(fqdnFromAddr(http->conn->log_addr), "-",
+ http->log_uri);
#endif
#if FORW_VIA_DB
if (httpHeaderHas(req_hdr, HDR_X_FORWARDED_FOR)) {
recompiled ok, and lines of referer.log equals to access.log now.
wc -l access.log referer.log
5 access.log
5 referer.log
10 total
If Dan Reif's combined log patch can't be patched into 2.5 release please think about this referer-log patch and add a tip in document on combined log produce.
Regards
Che, Dong
http://www.chedong.com
Received on Sat May 10 2003 - 14:22:03 MDT
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:16:34 MST