AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
Similar to modern processors that innovate beneath unchanged instruction sets, a semantic disk level implementation facilitates ease of deployment and inter-operability with unchanged client infrastructure, perhaps making it more pragmatic

Improving storage system availability with D-GRAID

ACM Transactions on Storage (TOS), no. 2 (2005): 133-170

Cited by: 154|Views143
EI

Abstract

We present the design, implementation, and evaluation of D-GRAID, a gracefully degrading and quickly recovering RAID storage array. D-GRAID ensures that most files within the file system remain available even when an unexpectedly high number of faults occur. D-GRAID achieves high availability through aggressive replication of semantically...More

Code:

Data:

0
Introduction
  • This “availability cliff” is a result of the storage system laying out blocks oblivious of their semantic importance or relationship; most files become corrupted or inaccessible after just one extra disk failure
Highlights
  • “If a tree falls in the forest and no one hears it, does it make a sound?” George Berkeley
  • Storage systems comprised of multiple disks are the backbone of modern computing centers, and when the storage system is down, the entire center can grind to a halt
  • We find that the construction of D-GRAID is feasible; even with imperfect semantic knowledge, powerful functionality can be implemented within a block-based storage array
  • The ext2 file system is an intellectual descendant of the Berkeley Fast File System (FFS) [28]
  • Similar to modern processors that innovate beneath unchanged instruction sets, a semantic disk level implementation facilitates ease of deployment and inter-operability with unchanged client infrastructure, perhaps making it more pragmatic
  • A test run across the data blocks of our file system indicates that only a small fraction of data blocks would pass the test; only those blocks that pass the test and are reallocated from a file data block to an indirect block would be misclassified
  • An alternative approach is to change the interface between file systems and storage, to convey richer information across both layers
Methods
  • D-GRAID Expectations

    the authors discuss the design of D-GRAID. The authors present background information on file systems, the data layout strategy required to enable graceful degradation, the important design issues that arise due to the new layout, and the process of fast recovery.

    3.1 File System Background

    Semantic knowledge is system specific; the authors discuss D-GRAID design and implementation for two widely differing file systems: Linux ext2 [45] and Microsoft VFAT [30] file system.
  • The authors present background information on file systems, the data layout strategy required to enable graceful degradation, the important design issues that arise due to the new layout, and the process of fast recovery.
  • Semantically-related blocks: With fault-isolated data placement, D-GRAID places a logical unit of file system data within a fault-isolated container.
  • With directory-based grouping, D-GRAID ensures that the files of a directory are all placed within the same unit of fault containment.
  • Isolated placement improves availability but introduces the problem of load balancing, which has both space and time components
Results
  • The space overhead due to popular directory replication is minimal for a reasonably sized file system; for this trace, such directories account for about 143 MB, less than 0.1% of the total file system size.
  • A test run across the data blocks of the file system indicates that only a small fraction of data blocks would pass the test; only those blocks that pass the test and are reallocated from a file data block to an indirect block would be misclassified.
  • With a scaling factor of 3×, the operation throughput lagged slightly behind, with D-GRAID showing a slowdown of up to 19.2% during the first one-third of the trace execution, after which it caught up due to idle time
Conclusion
  • The authors first compare the semantic-disk based approach to alternative methods of implementing DGRAID, and discuss some possible concerns about the commercial feasibility of such semantic disk systems.

    8.1 Alternative Approaches

    The authors' semantic disk based approach is one of few different ways of implementing D-GRAID, each with its own trade-offs.
  • Similar to modern processors that innovate beneath unchanged instruction sets, a semantic disk level implementation facilitates ease of deployment and inter-operability with unchanged client infrastructure, perhaps making it more pragmatic
  • The cost of this approach, is the complexity in rediscovering semantic knowledge and being tolerant to inaccuracies.
  • The file system could tag each write with a logical fault-container ID, which can be used by the storage system to implement faultisolated data placement
  • These techniques, while being intrusive on existing infrastructure and software base, are conceivably less complex than the approach
Tables
  • Table1: Space Overhead of Selective Meta-data Replication. The table shows the space overheads of selective metadata replication as a percentage of total user data, and as the level of naming and system meta-data replication increases. In the leftmost column, the percentage space overhead without any meta-data replication is shown. The next two columns depict the costs of modest (4-way) and paranoid (16-way) schemes. Each row shows the overhead for a particular file system, either ext2 or VFAT, with block size set to 1 KB or 4 KB
  • Table2: Performance on postmark. The table compares the performance of D-GRAID Level 0 with RAID-0 on the Postmark benchmark. Each row marked D-GRAID indicates a specific level of metadata replication. The first column reports the benchmark run-time and the second column shows the number of disk writes incurred. The third column shows the number of disk writes that were to metadata blocks, and the fourth column indicates the number of unique metadata blocks that are written. The experiment was run over 4 disks
  • Table3: Code size for Alexander implementation. The number of lines of code needed to implement Alexander is shown. The first column shows the number of semicolons and the second column shows the total number of lines, including white-spaces and comments
Download tables as Excel
Related work
  • D-GRAID draws on related work from a number of different areas, including distributed file systems and traditional RAID systems. We discuss each in turn. Distributed File Systems: Designers of distributed file systems have long ago realized the problems that arise when spreading a directory tree across different machines in a system. For example, Walker et al discuss the importance of directory namespace replication within the Locus distributed system [35]. The Coda mobile file system also takes explicit care with regard to the directory tree [27]. Specifically, if a file is cached, Coda makes sure to cache every directory up to the root of the directory tree. By doing so, Coda can guarantee that a file remains accessible should a disconnection occur. Perhaps an interesting extension to our work would be to reconsider host-based inmemory caching with availability in mind. Also, Slice [3] tries to route namespace operations for all files in a directory to the same server.
Funding
  • Finally, we thank the Computer Systems Lab for providing a terrific environment for computer science research. This work is sponsored by NSF CCR-0092840, CCR0133456, CCR-0098274, NGS-0103670, ITR-0086044, ITR-0325267, IBM, EMC, and the Wisconsin Alumni Research Foundation
Reference
  • A. Acharya, M. Uysal, and J. Saltz. Active Disks: programming model, algorithms and evaluation. In ASPLOS VIII, San Jose, CA, October 1998.
    Google ScholarLocate open access versionFindings
  • G. A. Alvarez, W. A. Burkhard, and F. Cristian. Tolerating multiple failures in RAID architectures with optimal storage and uniform declustering. In ISCA ’97, pages 62–72, 1997.
    Google ScholarLocate open access versionFindings
  • D. Anderson, J. Chase, and A. Vahdat. Interposed Request Routing for Scalable Network Storage. ACM Transactions on Computer Systems, 20(1), February 2002.
    Google ScholarLocate open access versionFindings
  • D. Bitton and J. Gray. Disk shadowing. In VLDB 14, pages 331– 338, Los Angeles, CA, August 1988.
    Google ScholarLocate open access versionFindings
  • H. Boehm and M. Weiser. Garbage Collection in an Uncooperative Environment. Software—Practice and Experience, 18(9):807– 820, September 1988.
    Google ScholarLocate open access versionFindings
  • W. Burkhard and J. Menon. Disk Array Storage System Reliability. In FTCS-23, pages 432–441, Toulouse, France, June 1993.
    Google ScholarLocate open access versionFindings
  • J. Chapin, M. Rosenblum, S. Devine, T. Lahiri, D. Teodosiu, and A. Gupta. Hive: Fault Containment for Shared-Memory Multiprocessors. In SOSP ’95, December 1995.
    Google ScholarLocate open access versionFindings
  • P. M. Chen, E. K. Lee, G. A. Gibson, R. H. Katz, and D. A. Patterson. RAID: high-performance, reliable secondary storage. ACM Computing Surveys, 26(2):145–185, June 1994.
    Google ScholarLocate open access versionFindings
  • T. E. Denehy, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Bridging the Information Gap in Storage Protocol Stacks. In USENIX ’02, June 2002.
    Google ScholarLocate open access versionFindings
  • I. Dowse and D. Malone. Recent Filesystem Optimisations on FreeBSD. In FREENIX ’02, Monterey, CA, June 2002.
    Google ScholarLocate open access versionFindings
  • EMC Corporation. Symmetrix Enterprise Information Storage Systems. http://www.emc.com, 2002.
    Findings
  • R. M. English and A. A. Stepanov. Loge: A Self-Organizing Disk Controller. In USENIX Winter ’92, January 1992.
    Google ScholarLocate open access versionFindings
  • G. R. Ganger. Blurring the Line Between Oses and Storage Devices. Technical Report CMU-CS-01-166, Carnegie Mellon University, December 2001.
    Google ScholarFindings
  • G. R. Ganger, M. K. McKusick, C. A. Soules, and Y. N. Patt. Soft Updates: A Solution to the Metadata Update Problem in File Systems. ACM TOCS, 18(2), May 2000.
    Google ScholarLocate open access versionFindings
  • G. R. Ganger, B. L. Worthington, R. Y. Hou, and Y. N. Patt. Disk Subsystem Load Balancing: Disk Striping vs. Conventional Data Placement. In HICSS ’93, 1993.
    Google ScholarLocate open access versionFindings
  • G. A. Gibson, D. F. Nagle, K. Amiri, J. Butler, F. W. Chang, H. Gobioff, C. Hardin, E. Riedel, D. Rochberg, and J. Zelenka. A CostEffective, High-Bandwidth Storage Architecture. In ASPLOS VIII, October 1998.
    Google ScholarFindings
  • J. Gray. Why Do Computers Stop and What Can We Do About It? In 6th International Conference on Reliability and Distributed Databases, June 1987.
    Google ScholarLocate open access versionFindings
  • J. Gray, B. Horst, and M. Walker. Parity Striping of Disc Arrays: Low-cost Reliable Storage with Acceptable Throughput. In Proceedings of the 16th International Conference on Very Large Data Bases (VLDB 16), pages 148–159, Brisbane, Australia, August 1990.
    Google ScholarLocate open access versionFindings
  • S. D. Gribble. Robustness in Complex Systems. In HotOS VIII, Schloss Elmau, Germany, May 2001.
    Google ScholarFindings
  • R. Hagmann. Reimplementing the Cedar File System Using Logging and Group Commit. In SOSP ’87, November 1987.
    Google ScholarLocate open access versionFindings
  • M. Holland, G. Gibson, and D. Siewiorek. Fast, on-line failure recovery in redundant disk arrays. In FTCS-23, France, 1993.
    Google ScholarLocate open access versionFindings
  • H.-I. Hsiao and D. DeWitt. Chained Declustering: A New Availability Strategy for Multiprocessor Database Machines. In 6th International Data Engineering Conference, 1990.
    Google ScholarLocate open access versionFindings
  • IBM. ServeRAID - Recovering from multiple disk failures. http://www.pc.ibm.com/qtechinfo/MIGR-39144.html, 2001.
    Findings
  • M. Ji, E. Felten, R. Wang, and J. P. Singh. Archipelago: An IslandBased File System For Highly Available And Scalable Internet Services. In 4th USENIX Windows Symposium, August 2000.
    Google ScholarLocate open access versionFindings
  • J. Katcher. PostMark: A New File System Benchmark. Technical Report TR-3022, Network Appliance Inc., Oct 1997.
    Google ScholarFindings
  • K. Keeton and J. Wilkes. Automating data dependability. In Proceedings of the 10th ACM-SIGOPS European Workshop, pages 93–100, Saint-Emilion, France, September 2002.
    Google ScholarLocate open access versionFindings
  • J. Kistler and M. Satyanarayanan. Disconnected Operation in the Coda File System. ACM TOCS, 10(1), February 1992.
    Google ScholarLocate open access versionFindings
  • M. K. McKusick, W. N. Joy, S. J. Leffler, and R. S. Fabry. A Fast File System for UNIX. ACM TOCS, 2(3):181–197, August 1984.
    Google ScholarLocate open access versionFindings
  • J. Menon and D. Mattson. Comparison of Sparing Alternatives for Disk Arrays. In ISCA ’92, Gold Coast, Australia, May 1992.
    Google ScholarFindings
  • Microsoft Corporation. http://www.microsoft.com/hwdev/, December 2000.
    Findings
  • C. U. Orji and J. A. Solworth. Doubly Distorted Mirrors. In SIGMOD ’93, Washington, DC, May 1993.
    Google ScholarLocate open access versionFindings
  • A. Park and K. Balasubramanian. Providing fault tolerance in parallel secondary storage systems. Technical Report CS-TR-057-86, Princeton, November 1986.
    Google ScholarFindings
  • D. Patterson, G. Gibson, and R. Katz. A Case for Redundant Arrays of Inexpensive Disks (RAID). In SIGMOD ’88, June 1988.
    Google ScholarLocate open access versionFindings
  • D. A. Patterson. Availability and Maintainability >> Performance: New Focus for a New Century. Key Note at FAST ’02, January 2002.
    Google ScholarLocate open access versionFindings
  • G. Popek, B. Walker, J. Chow, D. Edwards, C. Kline, G. Rudisin, and G. Thiel. LOCUS: A Network Transparent, High Reliability Distributed System. In SOSP ’81, December 1981.
    Google ScholarLocate open access versionFindings
  • A. L. N. Reddy and P. Banerjee. Gracefully Degradable Disk Arrays. In FTCS-21, pages 401–408, Montreal, Canada, June 1991.
    Google ScholarLocate open access versionFindings
  • E. Riedel, G. Gibson, and C. Faloutsos. Active Storage For LargeScale Data Mining and Multimedia. In Proceedings of the 24th International Conference on Very Large Databases (VLDB 24), New York, New York, August 1998.
    Google ScholarLocate open access versionFindings
  • E. Riedel, M. Kallahalla, and R. Swaminathan. A Framework for Evaluating Storage System Security. In FAST ’02, pages 14–29, Monterey, CA, January 2002.
    Google ScholarLocate open access versionFindings
  • M. Rosenblum and J. Ousterhout. The Design and Implementation of a Log-Structured File System. ACM Transactions on Computer Systems, 10(1):26–52, February 1992.
    Google ScholarLocate open access versionFindings
  • A. Rowstron and P. Druschel. Storage Management and Caching in PAST, A Large-scale, Persistent Peer-to-peer Storage Utility. In SOSP ’01, Banff, Canada, October 2001.
    Google ScholarFindings
  • C. Ruemmler and J. Wilkes. Disk Shuffling. Technical Report HPL-91-156, Hewlett Packard Laboratories, 1991.
    Google ScholarFindings
  • Y. Saito, C. Karamanolis, M. Karlsson, and M. Mahalingam. Taming aggressive replication in the Pangaea wide-area file system. In OSDI ’02, Boston, MA, December 2002.
    Google ScholarLocate open access versionFindings
  • S. Savage and J. Wilkes. AFRAID — A Frequently Redundant Array of Independent Disks. In USENIX 1996, pages 27–39, San Diego, CA, January 1996.
    Google ScholarLocate open access versionFindings
  • M. Sivathanu, V. Prabhakaran, F. Popovici, T. E. Denehy, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Semantically-Smart Disk Systems. In FAST ’03, San Francisco, CA, March 2003.
    Google ScholarLocate open access versionFindings
  • T. Ts’o and S. Tweedie. Future Directions for the Ext2/3 Filesystem. In FREENIX ’02, Monterey, CA, June 2002.
    Google ScholarLocate open access versionFindings
  • R. Wang, T. E. Anderson, and D. A. Patterson. Virtual Log-Based File Systems for a Programmable Disk. In OSDI ’99, New Orleans, LA, February 1999.
    Google ScholarLocate open access versionFindings
  • J. Wilkes, R. Golding, C. Staelin, and T. Sullivan. The HP AutoRAID Hierarchical Storage System. ACM Transactions on Computer Systems, 14(1):108–136, February 1996.
    Google ScholarLocate open access versionFindings
  • J. L. Wolf. The Placement Optimization Problem: A Practical Solution to the Disk File Assignment Problem. In SIGMETRICS ’89, pages 1–10, Berkeley, CA, May 1989.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科