On 3/07/2013 3:51 p.m., Hussam Al-Tayeb wrote:
> this is squid 3.3.6
>
> [root_at_LARS squid]# tail -n 30 /var/log/squid/cache.log
> 2013/07/03 06:34:59| Target number of buckets: 40392
> 2013/07/03 06:34:59| Using 65536 Store buckets
> 2013/07/03 06:34:59| Max Mem size: 262144 KB
> 2013/07/03 06:34:59| Max Swap size: 10240000 KB
> 2013/07/03 06:34:59| Rebuilding storage in /home/squid (clean log)
> 2013/07/03 06:34:59| Using Least Load store dir selection
> 2013/07/03 06:34:59| Set Current Directory to /home/squid
> 2013/07/03 06:34:59| Loaded Icons.
> 2013/07/03 06:34:59| HTCP Disabled.
> 2013/07/03 06:34:59| Squid plugin modules loaded: 0
> 2013/07/03 06:34:59| Accepting NAT intercepted HTTP Socket connections at
> local=0.0.0.0:3128 remote=[::] FD 16 flags=41
> 2013/07/03 06:34:59| Accepting HTTP Socket connections at local=127.0.0.1:3129
> remote=[::] FD 17 flags=9
> 2013/07/03 06:34:59| Store rebuilding is 1.33% complete
> 2013/07/03 06:35:00| Done reading /home/squid swaplog (299709 entries)
> 2013/07/03 06:35:00| Finished rebuilding storage from disk.
> 2013/07/03 06:35:00| 299709 Entries scanned
> 2013/07/03 06:35:00| 0 Invalid entries.
> 2013/07/03 06:35:00| 0 With invalid flags.
> 2013/07/03 06:35:00| 299709 Objects loaded.
> 2013/07/03 06:35:00| 0 Objects expired.
> 2013/07/03 06:35:00| 0 Objects cancelled.
> 2013/07/03 06:35:00| 0 Duplicate URLs purged.
> 2013/07/03 06:35:00| 0 Swapfile clashes avoided.
> 2013/07/03 06:35:00| Took 1.36 seconds (220842.74 objects/sec).
> 2013/07/03 06:35:00| Beginning Validation Procedure
> 2013/07/03 06:35:00| 262144 Entries Validated so far.
> 2013/07/03 06:35:00| Completed Validation Procedure
> 2013/07/03 06:35:00| Validated 299707 Entries
> 2013/07/03 06:35:00| store_swap_size = 8047432.00 KB
> 2013/07/03 06:35:01| storeLateRelease: released 0 objects
> [root_at_LARS squid]# find . -type f | wc -l
> 299706
>
> this means there are 299706 files in /home/squid including the swap.state file
> so a total of 299705 objects on disk
> but swap.state thinks there are 299709 files.
Interesting. If you scan swap.state and pull out the "fileno" field it
should map to the directory "L1/L2/file" structure. Can you identify
which objects the overlap is about and what swap.state is saying about them?
I suspect this is related to the MD5 mismatches via shutdown_timeout
side effects. ie, that they got recorded to swap.state as SWAPOUT, but
the I/O sending them to disk got aborted along with Squid before the
file was written by the OS.
Or that could be way off base and it be something very weird instead...
> another thing I found.
> cat /var/log/squid/cache.log | grep WARNING | grep swapin
>
> 2013/06/08 23:35:45| WARNING: 1 swapin MD5 mismatches
> 2013/06/09 00:28:59| WARNING: 10 swapin MD5 mismatches
> 2013/06/09 12:20:56| WARNING: 1 swapin MD5 mismatches
> 2013/06/09 12:25:46| WARNING: 10 swapin MD5 mismatches
> 2013/06/09 14:40:18| WARNING: 1 swapin MD5 mismatches
> 2013/06/10 02:31:02| WARNING: 1 swapin MD5 mismatches
> 2013/06/10 13:00:37| WARNING: 1 swapin MD5 mismatches
> 2013/06/10 22:59:53| WARNING: 1 swapin MD5 mismatches
> 2013/06/12 14:41:23| WARNING: 1 swapin MD5 mismatches
> 2013/06/15 04:37:03| WARNING: 1 swapin MD5 mismatches
> 2013/06/28 07:07:33| WARNING: 1 swapin MD5 mismatches
> 2013/06/28 21:51:19| WARNING: 1 swapin MD5 mismatches
> 2013/06/29 23:41:49| WARNING: 1 swapin MD5 mismatches
> 2013/06/30 04:14:24| WARNING: 10 swapin MD5 mismatches
> 2013/06/30 19:10:49| WARNING: 1 swapin MD5 mismatches
> 2013/06/30 19:10:49| WARNING: 10 swapin MD5 mismatches
> 2013/07/02 13:44:11| WARNING: 1 swapin MD5 mismatches
> 2013/07/02 13:48:35| WARNING: 10 swapin MD5 mismatches
> 2013/07/02 23:27:32| WARNING: 100 swapin MD5 mismatches
> 2013/07/03 06:02:18| WARNING: 1 swapin MD5 mismatches
> 2013/07/03 06:20:43| WARNING: 1 swapin MD5 mismatches
> 2013/07/03 06:26:59| WARNING: 10 swapin MD5 mismatches
>
>
> 1) Any of the above is something to worry about?
MD5 mismatches are. It means corrupt cache content. The usual causes (in
order of commonality) are:
* shutdown_timeout set too short - this timeout is designed to allow
client traffic to complete and close cleanly. Aborting Squid earlier
corrupts any incomplete I/O. This is bad as it affects clients live
traffic, not just cache files (which can recover easily as seen above).
* disk failure - the usual bit-loss as disks die slowly. The large count
of failures showing up is a worry if that is the cause.
* somebody playing with the cache content directly.
* I/O errors when saving content to cache. Rare during normal operation
but they do occur (IME usually when RAID is slowing the disk I/O down!)
> 2) Does squid resolve the file mismatch eventually as I reach the max size of
> cache dir?
If Squid thinks there are more files on disk than actually there, then I
would not expect there to be any problem. If anything Squid would be
*under* capacity by the size of the phantom files. The cache size
maintenance should still work normally. There may be some form of
self-correction if the phantom files get evicted from the index.
> 3) the swapin MD5 mismatches problem. Is it something I can fix? If so, how?
see above for causes. Anything you can do to determine which cause is
happening and resolve it.
> Any other information I can post to help detect where the problem is?
Unfortunately the debug which shows what files are failing the checksum
has been disabled for unknown reasons. So not really.
Amos
Received on Thu Jul 04 2013 - 05:55:49 MDT
This archive was generated by hypermail 2.2.0 : Thu Jul 04 2013 - 12:00:06 MDT