A Study of SSD Reliability in Large Scale Enterprise Storage Deployments

Authors: 

Stathis Maneas and Kaveh Mahdaviani, University of Toronto; Tim Emami, NetApp; Bianca Schroeder, University of Toronto
Awarded Best Paper!

Abstract: 

This paper presents the first large-scale field study of NAND-based SSDs in enterprise storage systems (in contrast to drives in distributed data center storage systems). The study is based on a very comprehensive set of field data, covering 1.4 million SSDs of a major storage vendor (NetApp). The drives comprise three different manufacturers, 18 different models, 12 different capacities, and all major flash technologies (SLC, cMLC, eMLC, 3D-TLC). The data allows us to study a large number of factors that were not studied in previous works, including the effect of firmware versions, the reliability of TLC NAND, and correlations between drives within a RAID system. This paper presents our analysis, along with a number of practical implications derived from it.

FAST '20 Open Access Sponsored by NetApp

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {246174,
author = {Stathis Maneas and Kaveh Mahdaviani and Tim Emami and Bianca Schroeder},
title = {A Study of {SSD} Reliability in Large Scale Enterprise Storage Deployments},
booktitle = {18th USENIX Conference on File and Storage Technologies (FAST 20)},
year = {2020},
isbn = {978-1-939133-12-0},
address = {Santa Clara, CA},
pages = {137--149},
url = {https://www.usenix.org/conference/fast20/presentation/maneas},
publisher = {USENIX Association},
month = feb
}

Presentation Video