I welcome the release of the new, improved proxy MIB. Especialy since I was
supposed to provide a proposal for the TF-Cache describing such a beast. :-)
This seems much better than the current Squid MIB, and it makes may job
a whole lot easier, because criticising is easier than working. :-) :-)
Most of my remarks are based on the stuff I measure with my mrtg-for-squid
scripts, and on the conclusions I got from such measurements. Since I regard
them as the next best thing to SNMP monitoring (derived as they are from the
wonderfull selection of cachemgr available data and realtime logfile analysis)
I think they are relevant to MIB discussion.
Here are some comments I accumulated while going through your MIB:
a) Medians over time - your MIB states ".10 gives 10 min medians". It is not
stated how many, or which of the entries will be available. Since you need to
keep a time aged queue of requests to calculate this correctly, I find it
hard to believe the user can just give an arbitrary number - such queues
can grow pretty big even on a moderately busy cache. I know because people
complained about the size of mrtg-anal process on their caches.
b) IMO it would be useful if information on peers included data on how many
times a peer was queried vs. how many times it was selected. (That would
give you object hit rate for the peer). Also interesting to see would be
a cumulative counter traffic from the peer (which would enable you to
calculate volume hit rate for the peer). Of course this means you'd have to
have a "number of all objects requested from peers", and "size of all
objects
received from peers" global counter. I guess that is cacheIcpKbRecv
and cacheIcpPktsSent if I interpret them correctly? I guess it would not be
feasible to monitor the size of things requested from the peers as you don't
know the size until the request is answered...
c) As far as I can see, CPU is monitored only cummulatively - as cacheCpuUsage
which is in seconds, so it has to be converted into ticks (the unit for
uptime), and divided to get lifetime CPU usage. IMHO this isn't enough. Some
form of "what is the CPU usage NOW" query should be possible, and is IMHO
needed. As an example, my proxy lifetime CPU (as reported by cachemgr) is
currently 7% - but my realtime CPU monitoring scripts report peaks in the
low twenties. Being aware of peaks is important when you are planing
upgrades. (Cisco routers report 1sec, 5 sec, and 1 minute CPU load, I
think).
d) For hit rates it should be clearly stated how hits are measured: by object
or by object size. In my experience, the two hit rates differ dramaticaly
(at least in Squid 1.1), with object hit rate being quite stable and volume
hit rate swinging all over the place. Also, neither hit rate appears in the
CacheMedian group (the one where medians over time seem to be kept).
I think it is very important to see how hit rate varies over the smaller
units of time: "Do I get high hit rates when traffic is high or when traffic
is low?")
(If the MIB doesn't clearly state how hits should be measured, different
caches might measure it differently, which would make it useless for
comparison purposes).
e) I'd ask for a count of auxiliry processes (like ftpget, dns servers,
redirectors, etc. but I guess that might be interpreted as too squid
specific).
Another thing worth mentioning: When being monitored by cachemgr (and thus
my mrtg-for-squid scripts :-) squid shows how much of it's store log it has
processed by using the ammount of space accounted for as the storage space
ammount. I find this a very convenient feature. If the MIB doesn't cast this in
stone, other caches might just monitor the "df" of the disk, making that
statistic much less usefull.
Thank you again for making Squid so easy to monitor.
Best regards,
Matija Grabnar
-- "My name is Not Important. Not to friends. But you can call me mr. Important" - Not J. Important Matija.Grabnar@arnes.siReceived on Wed Sep 09 1998 - 01:59:05 MDT
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:22:05 MST