Getting close to releasing what I believe will be the new standard in storage diagnostics for sysadmins and service people … Let me know what you think!!
The SANtool™ is a multi-platform portable storage diagnostic tool. The SANtool enables the administrator/technician to efficiently diagnose, test, tune, break and/or repair storage peripherals. Unlike traditional software diagnostics there is nothing to license, “install” or remove. Plug the SANtool USB Flash stick into a machine running a supported Windows or UNIX/LINUX operating system, and start using the software. The diagnostics are performed by SANtools® command-line program, (SMARTMon-UX), and controlled via a web browser over a secure (SSL) connection. All HTML, Javascript files, images, the embedded web server, and O/S-specific executables are included on the SANtool. No java runtime, external DLLs, drivers, or web servers are required.

This is picture of the SANtool desktop
Read more…
AdminPosts, Diagnostics, Industry News
Disk hacking, Failure analysis
From 5th USENIX Conference on File and Storage Technologies
Component failure in large-scale IT installations is becoming an ever larger problem as the number of components in a single cluster approaches a million.In this paper, we present and analyze field-gathered disk replacement data from a number of large production systems, including high-performance computing sites and internet services sites. About 100,000 disks are covered by this data, some for an entire lifetime of five years. The data include drives with SCSI and FC, as well as SATA interfaces. The mean time to failure (MTTF) of those drives, as specified in their datasheets, ranges from 1,000,000 to 1,500,000 hours, suggesting a nominal annual failure rate of at most 0.88%. Read more…
Failure Analysis
Failure analysis
In light of the sev-1 Seagate firmware bug that bricks both consumer and enterprise-class SATA disks, I ran the Seagate online tool that tells people if any of my disks are affected by the boot-of-death bug.

- Is my drive destined to become a brick?

Yup, this just isn’t my day. Worse, my affected barracuda drives are running in a Solaris system as part of a zfs-based software RAID file system, and Read more…
Diagnostics, Disk Drives, Predictive Failure
boot-of-death, Failure analysis, Firmware
Seagate hard drives from the Barracuda 7200.11, DiamondMax 22, Barracuda ES.2 SATA, and SV35 families, Seagate FreeAgent® , and Maxtor OneTouch® 4 may become inaccessible when the host system is powered on. In other words, they turn into bricks. If you are unfortunate to have one of these products and have not upgraded the firmware (i.e. if you are unfortunate enough to have one of these products and don’t cruise the Seagate support site on a regular basis), then a firmware bug will instruct the disk to turn itself into a brick some day when you power it up. Do NOT power off any computer that has the following disk drives until you check the firmware. Seagate is quietly offering free disaster recovery assistance, firmware updates, and software to determine if you have a disk that is running the evil firmware. This “boot-of-death” bug rivals the infamous IBM Deathstar which lead to a successful class-action lawsuit. Read more…
Diagnostics, Disk Drives, Predictive Failure
boot-of-death, Failure analysis, Firmware
TapeAlert is the street-name for the ANSI specification that governs hardware diagnostics for tape drives, libraries and autochangers. It was “invented” by HP, and well established as an industry spec. Pretty much everything from IBM half-million dollar robotic systems to consumer-class entry-level DAT drives from HP support the spec. More information on the spec can be found at the TAPEALERT.ORG Read more…
Diagnostics, Tapes/Changers/Libraries
Failure analysis, Predictive Failure, Specifications
Google released a study of 100,000 consumer-class ATA disk drives that revealed a wealth of information including S.M.A.R.T. data analysis; drive temperature vs. disk failure rates; annualized failure rates; and survival probabilities.

Percentage of failed drives with S.M.A.R.T. errors
Read more…
Failure Analysis, Predictive Failure, S.M.A.R.T. Technology
Failure analysis, Predictive Failure
These charts from the google study of 100,000 consumer-class ATA disk drive study show that you are probably throwing money away on disk drive coolers, as disks fail more often at LOWER temperatures. At the very least, have the drive cooling vendors supply data that proves that cooler disk drives last longer.
Read more…
Failure Analysis, Predictive Failure
Failure analysis, Predictive Failure