When a hard drive is struck with a bad sector, although some of the data may be lost, it’s often possible to reformat the hard drive and get it working again. The bad sectors are removed from the array, and the hard drive will work as before, albeit a few megabytes smaller. Current hard drives are able to work around bad sectors and only store data in the good areas of the drive. Although this saves money vs buying a new drive, is it a good idea?
First a little background on hard drive crashes. While consumers may see a hard drive crash as a frustrating once-in-a-decade experience, for most businesses with large server farms, it’s a common occurrence. The question is not whether there will be a crash, rather the questions are: how do we respond to crashes?, how quickly can we get back online when the hard drives do crash?, and what do we do with the old drives?
The key elements in responding to a hard drive crash are redundant hard drives arrays and a good backup strategy. Redundant arrays can automatically fail-over to a parallel hard drive if one of the drives crashes, and making sure that backups are run on a regular basis will prevent most significant data loss. It’s also important to create a detailed action-plan with all necessary steps for bare-metal recovery in order to take the dread out of more serious hardware problems like controller failure.
What are the key factors, however, that influence hard drive failure? This problem was tackled methodically by Google in their paper, “Failure Trends in a Large Disk Drive Population.” The goal of the study was to help reduce data center costs and improve efficiency. Google’s novel data center strategy was to use huge numbers of mid and low-end servers to power their search engine, as opposed to a few extremely expensive and powerful machines. Since their servers were powered by standard hard drives, the network support team regularly had to replace the drives that were failing in the array. Armed with a hard-drive analysis technology called “SMART”, they were able to track hard drive temperature, disk reads, and a variety of other factors, and compare that to the hard drive failure rate.
The interesting conclusion of the paper was that hard drives are quite resilient, and fail for random reasons. In fact, hard drives can be kept in very high temperatures – up to 115 degrees Fahrenheit, without increasing their failure rate significantly. The only noticeable factor in determining whether or not a hard drive will fail again soon, was if it had failed previously. It does not matter the scope of the failure. If there was a scan error at some point, that hard drive is ten times as likely to crash in the future.
So the conclusion? Do not reuse your previously crashed hard drive. It will likely break again soon. Invest in a new hard drive and prevent the “gnashing of teeth” that is sure to occur again in the near future from another hard drive failure.
Written by Andrew Palczewski
About the Author
Andrew Palczewski is CEO of apHarmony, a Chicago software development company. He holds a Master's degree in Computer Engineering from the University of Illinois at Urbana-Champaign and has over ten years' experience in managing development of software projects.