Archive

Posts Tagged ‘Failure analysis’

SANtool storage diagnostic device (Pre-announcement)

April 25th, 2009 No comments

Getting close to releasing what I believe will be the new standard in storage diagnostics for sysadmins and service people …  Let me know what you think!!

The SANtool™ is a multi-platform portable storage diagnostic tool. The SANtool enables the administrator/technician to efficiently diagnose, test, tune, break and/or repair storage peripherals. Unlike traditional software diagnostics there is nothing to license, “install” or remove. Plug the SANtool USB Flash stick into a machine running a supported Windows or UNIX/LINUX operating system, and start using the software. The diagnostics are performed by SANtools® command-line program, (SMARTMon-UX), and controlled via a web browser over a secure (SSL) connection. All HTML, Javascript files, images, the embedded web server, and O/S-specific executables are included on the SANtool.  No java runtime, external DLLs, drivers, or web servers are required.

This is picture of the SANtool desktop

This is picture of the SANtool desktop

Read more…

Disk failures in the real world: What does MTBF of 1M hrs mean to you?

January 20th, 2009 No comments

From 5th USENIX Conference on File and Storage Technologies

Component failure in large-scale IT installations is becoming an ever larger problem as the number of components in a single cluster approaches a million.In this paper, we present and analyze field-gathered disk replacement data from a number of large production systems, including high-performance computing sites and internet services sites. About 100,000 disks are covered by this data, some for an entire lifetime of five years. The data include drives with SCSI and FC, as well as SATA interfaces. The mean time to failure (MTTF) of those drives, as specified in their datasheets, ranges from 1,000,000 to 1,500,000 hours, suggesting a nominal annual failure rate of at most 0.88%. Read more…

Using Seagate’s online tool to see if your disk is destined to become a brick.

January 18th, 2009 3 comments

In light of the sev-1 Seagate firmware bug that bricks both consumer and enterprise-class SATA disks, I ran the Seagate online tool that tells people if any of my disks are affected by the boot-of-death bug.

Is my drive destined to become a brick?
Is my drive destined to become a brick?

Seagate 'cuda test results

Yup, this just isn’t my day. Worse, my affected barracuda drives are running in a Solaris system as part of a zfs-based software RAID file system, and Read more…

Alert! Seagate barracuda & DiamondMax drives are dying en masse due to firmware bug. Seagate reacts.

January 18th, 2009 9 comments

Seagate hard drives from the Barracuda 7200.11, DiamondMax 22, Barracuda ES.2 SATA, and SV35 families, Seagate FreeAgent® , and Maxtor OneTouch® 4 may become inaccessible when the host system is powered on.  In other words, they turn into bricks.  If you are unfortunate to have one of these products and have not upgraded the firmware (i.e. if you are unfortunate enough to have one of these products and don’t cruise the Seagate support site on a regular basis), then a firmware bug will instruct the disk to turn itself into a brick some day when you power it up.  Do NOT power off any computer that has the following disk drives until you check the firmware.  Seagate is quietly offering free disaster recovery assistance, firmware updates, and software to determine if you have a disk that is running the evil firmware.  This “boot-of-death” bug rivals the infamous IBM Deathstar which lead to a successful class-action lawsuit. Read more…

How do you diagnose problems with tape drives and/or autochangers?

January 17th, 2009 No comments

TapeAlert is the street-name for the ANSI specification that governs hardware diagnostics for tape drives, libraries and autochangers.  It was “invented” by HP, and well established as an industry spec.  Pretty much everything from IBM half-million dollar robotic systems to consumer-class entry-level DAT drives from HP support the spec.  More information on the spec can be found at the TAPEALERT.ORG Read more…

Google disk reliability paper

January 5th, 2009 No comments

Google released a study of 100,000 consumer-class ATA disk drives that revealed a wealth of information including S.M.A.R.T. data analysis; drive temperature vs. disk failure rates; annualized failure rates; and survival probabilities. 

Percentage of failed drives with S.M.A.R.T. errors

Percentage of failed drives with S.M.A.R.T. errors

Read more…

Disk drive temperature coolers may be waste of money.

January 5th, 2009 No comments

These charts from the google study of 100,000 consumer-class ATA disk drive study show that you are probably throwing money away on disk drive coolers, as disks fail more often at LOWER temperatures. At the very least, have the drive cooling vendors supply data that proves that cooler disk drives last longer. :) Read more…