Re: [squid-users] RAID is good

From: Amos Jeffries <squid3@dont-contact.us>
Date: Wed, 26 Mar 2008 10:59:39 +1300

Matus UHLAR - fantomas wrote:
> On 25.03.08 10:23, Marcus Kool wrote:
>> I wish that the wiki for RIAD is rewritten.
>
> I think that (nearly) anyone can rewrite it, but...
>
>> Companies depend on internet access and a working Squid proxy
>> and therefore the advocated "no problem if a single disk fails"
>> is not from today's reality.
>
> That's a different problem similar to one that was raised already: squid
> should be able to work if one of configured cache_dirs is unavailable.
> Ability to remove the cache_dir if it fails is just enhancing of this
> functionality.
>
> If there is no bugreport for this, it's time to create one...

Run-time recovery of HDD errors is on the RoadMap wishlist, awaiting
someones interest.

Which as you say is a seperate problem from the non-existent one the
wiki refers too: Recovery of data which is simply a redundant mirror of
easily accessible data elsewhere.

>
>> One should also consider the difference between
>> simple RAID and extremely advanced RAID disk systems
>> (i.e. EMC and other arrays).
>> The external disk arrays like EMC with internal RAID5 are simply faster
>> than a JBOD of internal disks.

How many write-cycles does EMC use to backup data after one system-used
write cycle?
How may CPU cycles does EMC spend figuring out which disk the file-slice
is located on, _after_ squid has already hashed the file location to
figure out which disk the file is located on?

Regardless of speed, unless you can provide a RAID system which has less
than one hardware disk-io read/write per system disk-io read/write you
hit these theoretical limits.

>
> in such case RAID1 of such disks would be even faster, if you need
> reliability (for now) or raid0 or maybe JBOD using the EMD...
>

True RAID is becomming faster overall, or at least the servers it runs
on are.

But its not so much a problem of human-noticable absolute-time as a
problem of underlying duplicated disk-io-cycles and processor-io-cycles
and processor delays remains.

For now the CPU half of the problem gets masked by the
single-threadedness of squid (never though you'd see that being a major
benefit eh?). If squid begins using all the CPU threads the OS will
loose out on its spare CPU cycles on dual-core machines and RAID may
become a noticable problem there.

Halving the lifetime of HDD for no benefit is not a good idea, even in
wealthy large setups. And the guys running squid in high-performance
situations would agree that any speed reduction is not good.

For the background;

Before I wrote that wiki page I had tested Squid on a 2.6GHz single-CPU
box with RAID-mirrored drives. It runs noticably slower (and louder)
than an equivalent 1.2GHz box without the RAID.

Followed last year by numerous performance help requests here in
squid-users from people trying squid with RAID and seeing its removal as
a large immediate performance boost.

What I have laid out in the text is the theory behind squid+RAID. If you
are going to obsolete any of the information there, please provide
hardware specs and run the math before doing so. You might be
unpleasantly surprised.

Amos

-- 
Please use Squid 2.6STABLE17+ or 3.0STABLE1+
There are serious security advisories out on all earlier releases.
Received on Tue Mar 25 2008 - 15:59:43 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Apr 01 2008 - 13:00:05 MDT