G'day folks,
Recently, we installed three new Squid (2.4STABLE3) servers for a client,
all running Solaris 8 on Ultra 10s. Each machine has 1GB of RAM with the
cache_dirs sitting on external SCSI drives.
One of the machines has started complaining about one of the filesystems
being full. Here's a snippet from cache.log:
> 2002/02/09 00:17:48| Store rebuilding is 17.7% complete
> 2002/02/09 00:17:54| comm_accept: FD 15: (130) Software caused connection
> abort
> 2002/02/09 00:17:54| httpAccept: FD 15: accept failure: (130) Software
> caused connection abort
> 2002/02/09 00:18:42| Store rebuilding is 18.0% complete
> 2002/02/09 00:19:34| diskHandleWrite: FD 13: disk write error: (28) No
> space left on device
> FATAL: Write failure -- check your disk space and cache.log
> Squid Cache (Version 2.4.STABLE3): Terminated abnormally.
> CPU Usage: 138.000 seconds = 49.400 user + 88.600 sys
> Maximum Resident Size: 0 KB
> Page faults with physical i/o: 205567
Similarly, here's the relevant log entry from /var/adm/messages:
> Feb 9 21:32:09 <HOST> ufs: [ID 845546 kern.notice] NOTICE: alloc:
> /var/spool/cache3: file system full
> Feb 9 21:32:09 <HOST> squid[20504]: [ID 702911 user.alert] Write failure
> -- check your disk space and cache.log
But, a df on the relevant file systems shows usage space at about 85%:
Filesystem kbytes used avail capacity Mounted on
/dev/dsk/c1t1d0s2 15683444 315964 15210646 3% /var/spool/cache1
/dev/dsk/c1t2d0s2 15683444 12037584 3489026 78% /var/spool/cache2
/dev/dsk/c1t3d0s2 15683444 13022361 2494129 84% /var/spool/cache3
Actually, the above df output is after restarting squid with a "clean"
swap.state file to keep it going. This action keeps the machine running for
about a week (or until another cache_dir fills). The swap.state file on a
"full" cache_dir site at about 52MB.
Whenever the failure occurs, the cache_dir in question (it could be any of
them) is at about 84%-85% capacity. I don't know of any relevance that may
have.
The cache_mem setting is 256MB and the three cache_dirs are all defined as:
cache_dir ufs /var/spool/cache1 14000 128 256
I don't believe it's a limitation in the inodes either as a 'df -e' shows
plenty of free inodes. If it was a problem with the "spread" counters (128
and 256) we've used in defining the cache_dirs, I'd expect squid to complain
but not the kernel.
We were using aufs - but I made the change to ufs last night in a last-ditch
effort to keep things running until I could track the problem.
The above problem is only happening on one of the three systems.
Unfortunately, it happens to be the heaviest of the three - and is running
Solaris 8 7/01 (the other two are running 10/01).
So, has anybody seen the above problem? Better yet, cam anybody point me at
a solution? Any help would be much appreciated.
By the way, does anybody know how to reduce the incidence of the "httpAccept"
and "conn_accept" pair of errors? Not critical - just annoying.
As always, thanks for your time. Ciao.
-- -------------------------------------------------------+--------------------- Daniel Baldoni BAppSc, PGradDipCompSci | Technical Director require 'std/disclaimer.pl' | LcdS Pty. Ltd. -------------------------------------------------------+ 856B Canning Hwy Phone/FAX: +61-8-9364-8171 | Applecross Mobile: 041-888-9794 | WA 6153 URL: http://www.lcds.com.au/ | Australia -------------------------------------------------------+--------------------- "Any time there's something so ridiculous that no rational systems programmer would even consider trying it, they send for me."; paraphrased from "King Of The Murgos" by David Eddings. (I'm not good, just crazy)Received on Sat Feb 09 2002 - 06:52:08 MST
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:06:13 MST