This is where Perl shines...
Assuming you or someone in your org speaks Perl, then
Assemble a hash with the key equal to the URL , gif or no gif
and increament the value evertime you see it.
ie
Get a line
parse the line
get the URL
$Counter{$URL}++
At the end, when the dust settles, you have a hash containing all pages
and how many times. Then in your analysis phase, you can ignore URLs that
are of non-html type or whatever.
Andrea Lain wrote:
> Hello all,
>
> Thanks for the help received in analyzing certain entries in the access.log
> file. Now I have one more question along these same lines:
>
> When filtering the access.log file with a script, we would like to be able
> to differentiate between unique user requests and those generated by the
> system. For example, when we hit a site such as Microsoft.com, the
> access.log file shows a GET for the initial page, then subsequent CONNECT
> requests (sometimes also GET requests) for items such as images.
>
> We have built a custom script that filters this information to display only
> requests for text/html and application data by URL. However, we are still
> being presented with quite a bit of data and some of it is redundant. So we
> are wondering if there's a more efficient way to filter the access.log to
> return only requests initiated by the user and not include any subsequent
> requests (like graphic accesses) that the system will initiate. Is there a
> different field we can filter by that will narrow down this criteria?
>
> We have looked at the log analysis script files available from the Squid
> site and haven't found anything that accomplishes what we're looking for, so
> at this point we're wondering if it's even possible to refine the data
> output further.
>
> All suggestions are welcome! And if you need more info on this issue,
> please don't hesitate to drop an email to:
>
> alain@dsra.com
>
> CC: ljohnson@dsra.com
>
> Thanks in advance!
>
> _____________
> Andrea Lain
> alain@dsra.com
> Decision Sciences
>
> --
> To unsubscribe, see http://www.squid-cache.org/mailing-lists.html
-- ======================================================================= Medi Montaseri, medi@sc.prepass.com, 408-450-7114 Lockheed Martin IMS (Prepass), IT/Operations, Software Eng. ======================================================================= -- To unsubscribe, see http://www.squid-cache.org/mailing-lists.htmlReceived on Thu Dec 07 2000 - 17:51:11 MST
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:56:53 MST