Archive

Archive for the ‘Predictive Failure’ Category

Failure rates and MTBFs calculated on 2400 hours usage annually

January 27th, 2009 No comments

“AFR and MTBF specifications are based on the following assumptions for desktop personal computer environments: .. 2400 power-on-hours per year” , so reads this screenshot is from the Seagate Barracuda 7200.11 product manual.  Read more…

How to contact Seagate 24×7 critical data recovery services

January 25th, 2009 3 comments

The link to Seagate data recovery services is https://services.seagate.com/online_request_form.aspx.  Fill out the form, and they send you information.  If you drop the disk drive off to a UPS store, then they pick up the costs so shipping is free (unless you want to overnight it).  The two recovery centers in the USA are in Chicago and Santa Clara. The Seagate (US) 24 Hour Critical Response Number is: Read more…

Seagate boot-of-death analysis – nothing but overhyped FUD

January 25th, 2009 29 comments

The nature and scope of the Seagate 7200.11 boot-of-death problem has been blown way out of proportion, and people are making grossly incorrect assumptions.  Seagate recently released a failure analysis report under non-disclosure to some (or all, I don’t know) OEM partners and distributors that describes the issue in great detail. Why under NDA? In my opinion, full knowledge of the problem could potentially create a blueprint for virus writers who want to go beyond just erasing files on targeted machines. So to be safe, full specifics aren’t being disclosed (but you can now find a little more info on Tom’s Hardware if you know where to look). 

As part of the manufacturing process,  Seagate writes diagnostic information to reserved areas of the disk drives.  These bit patterns work with the test equipment and drive firmware to perform diagnostic actions such as placing it in a secure lockdown modeRead more…

Disk failures in the real world: What does MTBF of 1M hrs mean to you?

January 20th, 2009 No comments

From 5th USENIX Conference on File and Storage Technologies

Component failure in large-scale IT installations is becoming an ever larger problem as the number of components in a single cluster approaches a million.In this paper, we present and analyze field-gathered disk replacement data from a number of large production systems, including high-performance computing sites and internet services sites. About 100,000 disks are covered by this data, some for an entire lifetime of five years. The data include drives with SCSI and FC, as well as SATA interfaces. The mean time to failure (MTTF) of those drives, as specified in their datasheets, ranges from 1,000,000 to 1,500,000 hours, suggesting a nominal annual failure rate of at most 0.88%. Read more…

Seagate’s boot-of-death identification software now offline

January 19th, 2009 No comments

Surely this is a temporary thing.  The Seagate boot-of-death disk drive identification site worked just fine up till Sunday evening (update … Still offline as of Wed Jan 21st 10:14 PM)

sncheck

Using Seagate’s online tool to see if your disk is destined to become a brick.

January 18th, 2009 3 comments

In light of the sev-1 Seagate firmware bug that bricks both consumer and enterprise-class SATA disks, I ran the Seagate online tool that tells people if any of my disks are affected by the boot-of-death bug.

Is my drive destined to become a brick?
Is my drive destined to become a brick?

Seagate 'cuda test results

Yup, this just isn’t my day. Worse, my affected barracuda drives are running in a Solaris system as part of a zfs-based software RAID file system, and Read more…

Alert! Seagate barracuda & DiamondMax drives are dying en masse due to firmware bug. Seagate reacts.

January 18th, 2009 9 comments

Seagate hard drives from the Barracuda 7200.11, DiamondMax 22, Barracuda ES.2 SATA, and SV35 families, Seagate FreeAgent® , and Maxtor OneTouch® 4 may become inaccessible when the host system is powered on.  In other words, they turn into bricks.  If you are unfortunate to have one of these products and have not upgraded the firmware (i.e. if you are unfortunate enough to have one of these products and don’t cruise the Seagate support site on a regular basis), then a firmware bug will instruct the disk to turn itself into a brick some day when you power it up.  Do NOT power off any computer that has the following disk drives until you check the firmware.  Seagate is quietly offering free disaster recovery assistance, firmware updates, and software to determine if you have a disk that is running the evil firmware.  This “boot-of-death” bug rivals the infamous IBM Deathstar which lead to a successful class-action lawsuit. Read more…

Google disk reliability paper

January 5th, 2009 No comments

Google released a study of 100,000 consumer-class ATA disk drives that revealed a wealth of information including S.M.A.R.T. data analysis; drive temperature vs. disk failure rates; annualized failure rates; and survival probabilities. 

Percentage of failed drives with S.M.A.R.T. errors

Percentage of failed drives with S.M.A.R.T. errors

Read more…

Disk drive temperature coolers may be waste of money.

January 5th, 2009 No comments

These charts from the google study of 100,000 consumer-class ATA disk drive study show that you are probably throwing money away on disk drive coolers, as disks fail more often at LOWER temperatures. At the very least, have the drive cooling vendors supply data that proves that cooler disk drives last longer. :) Read more…