Archive for the ‘Failure Analysis’ Category

Failure rates and MTBFs calculated on 2400 hours usage annually

January 27th, 2009 No comments

“AFR and MTBF specifications are based on the following assumptions for desktop personal computer environments: .. 2400 power-on-hours per year” , so reads this screenshot is from the Seagate Barracuda 7200.11 product manual.  Read more…

How to contact Seagate 24×7 critical data recovery services

January 25th, 2009 3 comments

The link to Seagate data recovery services is  Fill out the form, and they send you information.  If you drop the disk drive off to a UPS store, then they pick up the costs so shipping is free (unless you want to overnight it).  The two recovery centers in the USA are in Chicago and Santa Clara. The Seagate (US) 24 Hour Critical Response Number is: Read more…

Seagate boot-of-death analysis – nothing but overhyped FUD

January 25th, 2009 29 comments

The nature and scope of the Seagate 7200.11 boot-of-death problem has been blown way out of proportion, and people are making grossly incorrect assumptions.  Seagate recently released a failure analysis report under non-disclosure to some (or all, I don’t know) OEM partners and distributors that describes the issue in great detail. Why under NDA? In my opinion, full knowledge of the problem could potentially create a blueprint for virus writers who want to go beyond just erasing files on targeted machines. So to be safe, full specifics aren’t being disclosed (but you can now find a little more info on Tom’s Hardware if you know where to look). 

As part of the manufacturing process,  Seagate writes diagnostic information to reserved areas of the disk drives.  These bit patterns work with the test equipment and drive firmware to perform diagnostic actions such as placing it in a secure lockdown modeRead more…

Disk failures in the real world: What does MTBF of 1M hrs mean to you?

January 20th, 2009 No comments

From 5th USENIX Conference on File and Storage Technologies

Component failure in large-scale IT installations is becoming an ever larger problem as the number of components in a single cluster approaches a million.In this paper, we present and analyze field-gathered disk replacement data from a number of large production systems, including high-performance computing sites and internet services sites. About 100,000 disks are covered by this data, some for an entire lifetime of five years. The data include drives with SCSI and FC, as well as SATA interfaces. The mean time to failure (MTTF) of those drives, as specified in their datasheets, ranges from 1,000,000 to 1,500,000 hours, suggesting a nominal annual failure rate of at most 0.88%. Read more…

Google disk reliability paper

January 5th, 2009 No comments

Google released a study of 100,000 consumer-class ATA disk drives that revealed a wealth of information including S.M.A.R.T. data analysis; drive temperature vs. disk failure rates; annualized failure rates; and survival probabilities. 

Percentage of failed drives with S.M.A.R.T. errors

Percentage of failed drives with S.M.A.R.T. errors

Read more…

Disk drive temperature coolers may be waste of money.

January 5th, 2009 No comments

These charts from the google study of 100,000 consumer-class ATA disk drive study show that you are probably throwing money away on disk drive coolers, as disks fail more often at LOWER temperatures. At the very least, have the drive cooling vendors supply data that proves that cooler disk drives last longer. :) Read more…