Where Did My Data Go?
Companies adopting flash-based SSDs as a cornerstone to the data center storage systems are risking “massive data loss” due to power outages, according to a new study titled “Understanding the Robustness of SSDs Under Power Fault” by researchers from the University of Ohio and HP Labs. In exposing 15 SSDs from five different vendors to power loss, researchers found that 13 suffered such failures as bit corruption, metadata corruption, and total device failure. The paper did not specify which vendors’ drives were used.
Researchers Mai Zheng, Joseph Tucek, Feng Qin, and Mark Lillibridge conducted the study to assess how SSDs behave when power is cut unexpectedly during operation, noting that SSDs are gradually replacing spinning disks in data centers. SSD enthusiasts claim the drives are faster, more affordable, and more reliable than traditional hard drives. Unfortunately, SSDs may be more susceptible to damage from a simple power failure than data center operators realized.
“Although loss of power seems like an easy fault to prevent, recent experience shows that a simple loss of power is still a distressingly frequent occurrence even for sophisticated data center operators like Amazon,” according to the paper.
Researchers subjected the 15 SSDs to more than 3,000 fault injection cycles in all, and found that 13 — including “supposedly ‘enterprise-class’ devices” exhibited failure behavior. All of them lost some amount of data that researchers had expected to survive the fault. Two units “became massively corrupted, with one no longer registering on the SAS bus at all,” while another saw one-third of its blocks becoming inaccessible after eight fault cycles.
Overall, researchers observed five failure types: bit corruption, shorn writes, unserializable writes, metadata corruption, and dead devices. “The block-level behavior of SSDs exposed in our experiments has important implications for the design of storage systems,” according to the researchers. “For example, the frequency of both bit corruption and shorn writes make update-in-place to a sole copy of data that needs to survive power failure inadvisable. Because many storage systems like filesystems and databases rely on the correct order of operations to maintain consistency, serialization errors are particularly problematic.”
The researchers’ conclusion: “SSDs offer the promise of vastly higher performance operation; our results show that they do not provide reliable durability under even the simplest of faults: loss of power.”
They recommend that “system builders either not use SSDs for important information that needs to be durable or that they test their actual SSD models carefully under actual power failures beforehand. Failure to do so risks massive data loss.”
This article, “Test your SSDs or risk massive data loss, researchers warn,” was originally published at InfoWorld.com. Get the first word on what the important tech news really means with the InfoWorld Tech Watch blog. For the latest business technology news, follow InfoWorld.com on Twitter.