AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
What is the accuracy of the Storage Model? Second, how accurately does David predict benchmark runtime and what storage space savings does it provide? Third, can David scale to large target devices including RAID? what is the memory and CPU overhead of David?

Emulating Goliath storage systems with David

ACM Transactions on Storage (TOS), no. 4 (2012): ArticleNo.12-ArticleNo.12

Cited by: 20|Views193
EI

Abstract

Benchmarking file and storage systems on large file-system images is important, but difficult and often infeasible. Typically, running benchmarks on such large disk setups is a frequent source of frustration for file-system evaluators; the scale alone acts as a strong deterrent against using larger albeit realistic benchmarks. To address ...More

Code:

Data:

0
Introduction
  • File and storage systems are currently difficult to benchmark. Ideally, one would like to use a benchmark workload that is a realistic approximation of a known application.
  • Developing scalable yet practical benchmarks has long been a challenge for the storage systems community [16].
  • Benchmarks such as GraySort [1] and SPECmail2009 [22] are compelling yet difficult to set up and use currently, requiring around 100 TB for GraySort and anywhere from 100 GB to 2 TB for SPECmail2009
Highlights
  • File and storage systems are currently difficult to benchmark
  • David makes it practical to experiment with benchmarks that were otherwise infeasible to run on a given system
  • What is the accuracy of the Storage Model? Second, how accurately does David predict benchmark runtime and what storage space savings does it provide? Third, can David scale to large target devices including RAID? what is the memory and CPU overhead of David?
  • Since our aim is to validate the accuracy of the Storage Model alone, we run David in a model only mode where we disable block classification, remapping and data squashing
  • David makes it practical to experiment with benchmarks that were otherwise infeasible to run on a given system, by transparently scaling down the storage capacity required to run the workload
  • We believe David will be a useful emulator for file and storage system evaluation
Methods
  • Design Goals for

    David

    Scalability: Emulating a large device requires David to maintain additional data structures and mimic several operations; the goal is to ensure that it works well as the underlying storage capacity scales.

    Model accuracy: An important goal is to model a storage device and accurately predict performance.
  • For the rest of the paper, the authors use the terms target to denote the hypothetical larger storage device, and available to denote the physically available system on which David is running, as shown in Figure 2.
  • It shows a schematic of how David makes use of metadata remapping and data squashing to free up a large percentage of the required storage space; a much smaller backing store can service the requests of the benchmark.
Results
  • The authors seek to answer four important questions. First, what is the accuracy of the Storage Model? Second, how accurately does David predict benchmark runtime and what storage space savings does it provide? Third, can David scale to large target devices including RAID? what is the memory and CPU overhead of David?

    7.1 Experimental Platform

    The authors have developed David for the Linux operating system.
  • The authors validate the accuracy of Storage Model in predicting the benchmark runtime on the target system.
  • Similar to the emulation of scale in a storage system, Gupta et al from UCSD propose a technique called time dilation for emulating network speeds orders of magnitude faster than available [11].
  • Time dilation allows one to experiment with unmodified applications running on commodity operating systems by subjecting them to much faster network speeds than available
Conclusion
  • David is born out of the frustration in doing large-scale experimentation on realistic storage hardware – a problem many in the storage community face.
  • David makes it practical to experiment with benchmarks that were otherwise infeasible to run on a given system, by transparently scaling down the storage capacity required to run the workload.
  • David ensures accuracy of benchmarking results by using a detailed storage model to predict the runtime.
  • The authors plan to extend David to include support for a number of other useful storage devices and configurations.
  • The authors believe David will be a useful emulator for file and storage system evaluation
Tables
  • Table1: Storage Model Parameters in David. Lists important parameters obtained to model disks Hitachi HDS728080PLA380 (H1) and Hitachi HDS721010KLA330 (H2). ∗denotes parameters of I/O request queue (IORQ)
  • Table2: David Performance and Accuracy. Shows savings in capacity, accuracy of runtime prediction, and the overhead of storage modeling for different workloads. Webserver and varmail are generated using FileBench; virus scan using AVG
  • Table3: David Software RAID-1 Emulation. Shows IOPS for a software RAID-1 setup using David with memory as backing store; workload issues 20000 read and write requests through concurrent processes which equal the number of disks in the experiment. 1 disk experiments run w/o RAID-1
Download tables as Excel
Related work
  • Memulator [10] makes a great case for why storage emulation provides the unique ability to explore nonexistent storage components and take end-to-end measurements. Memulator is a “timing-accurate” storage emulator that allows a simulated storage component to be plugged into a real system running real applications. Memulator can use the memory of either a networked machine or the local machine as the storage media of the emulated disk, enabling full system evaluation of hypothetical storage devices. Although this provides flexibility in device emulation, high-capacity devices requires an equivalent amount of memory; David provides the necessary scalability to emulate such devices. In turn, David can benefit from the networked-emulation capabilities of Memulator in scenarios when either the host machine has limited CPU and memory resources, or when the interference of running David on the same machine competing for the same resources is unacceptable.

    One alternate to emulation is to simply buy a larger capacity or newer device and use it to run the benchmarks. This is sometimes feasible, but often not desirable. Even if one buys a larger disk, in the future they would need an even larger one; David allows one to keep up with this arms race without always investing in new devices. Note that we chose 1 TB as the upper limit for evaluation in this paper because we could validate our results for that size. Having a large disk will also not address the issue of emulating much faster devices such as SSDs or RAID configurations. David emulates faster devices through an efficient use of memory as backing store.
Funding
  • The first author thanks the members of the Storage Systems Group at NEC Labs for their comments and feedback. This material is based upon work supported by the National Science Foundation under the following grants: CCF-0621487, CNS-0509474, CNS-0834392, CCF-0811697, CCF-0811697, CCF-0937959, as well as by generous donations from NetApp, Sun Microsystems, and Google
Reference
  • GraySort Benchmark. http://sortbenchmark.org/FAQ.htm#gray.
    Findings
  • N. Agrawal, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Generating Realistic Impressions for File-System Benchmarking. In Proceedings of the 7th Conference on File and Storage Technologies (FAST ’09), San Francisco, CA, February 2009.
    Google ScholarLocate open access versionFindings
  • N. Agrawal, W. J. Bolosky, J. R. Douceur, and J. R. Lorch. A Five-Year Study of File-System Metadata. In Proceedings of the 5th USENIX Symposium on File and Storage Technologies (FAST ’07), San Jose, California, February 2007.
    Google ScholarLocate open access versionFindings
  • N. Agrawal, W. J. Bolosky, J. R. Douceur, and J. R. Lorch. A five-year study of file-system metadata: Microsoft longitudinal dataset. http://iotta.snia.org/traces/list/Static, 2007.
    Findings
  • N. Agrawal, V. Prabhakaran, T. Wobber, J. D. Davis, M. Manasse, and R. Panigrahy. Design Tradeoffs for SSD Performance. In Proceedings of the Usenix Annual Technical Conference (USENIX ’08), Boston, MA, June 2008.
    Google ScholarLocate open access versionFindings
  • E. Anderson. Simple table-based modeling of storage devices. Technical Report HPL-SSP-2001-04, HP Laboratories, July 2001.
    Google ScholarFindings
  • J. S. Bucy and G. R. Ganger. The DiskSim Simulation Environment Version 3.0 Reference Manual. Technical Report CMU-CS-03-102, Carnegie Mellon University, January 2003.
    Google ScholarFindings
  • P. M. Chen and D. A. Patterson. A New Approach to I/O Performance Evaluation–Self-Scaling I/O Benchmarks, Predicted I/O Performance. In Proceedings of the 1993 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (SIGMETRICS ’93), pages 1–12, Santa Clara, California, May 1993.
    Google ScholarLocate open access versionFindings
  • G. R. Ganger and Y. N. Patt. Using system-level models to evaluate i/o subsystem designs. IEEE Trans. Comput., 47(6):667–678, 1998.
    Google ScholarLocate open access versionFindings
  • J. L. Griffin, J. Schindler, S. W. Schlosser, J. S. Bucy, and G. R. Ganger. Timing-accurate Storage Emulation. In Proceedings of the 1st USENIX Symposium on File and Storage Technologies (FAST ’02), Monterey, California, January 2002.
    Google ScholarLocate open access versionFindings
  • D. Gupta, K. Yocum, M. McNett, A. C. Snoeren, A. Vahdat, and G. M. Voelker. To infinity and beyond: time-warped network emulation. In Proceedings of the 3rd conference on Networked Systems Design and Implementation ( NSDI’06 ), San Jose, CA, 2006.
    Google ScholarLocate open access versionFindings
  • M. F. Kaashoek, D. R. Engler, G. R. Ganger, H. Briceno, R. Hunt, D. Mazieres, T. Pinckney, R. Grimm, J. Jannotti, and K. Mackenzie. Application Performance and Flexibility on Exokernel Systems. In Proceedings of the 16th ACM Symposium on Operating Systems Principles (SOSP ’97), pages 52–65, Saint-Malo, France, October 1997.
    Google ScholarLocate open access versionFindings
  • J. Katcher. PostMark: A New File System Benchmark. Technical Report TR-3022, Network Appliance Inc., oct 1997.
    Google ScholarFindings
  • J. Mayfield, T. Finin, and M. Hall. Using automatic memoization as a software engineering tool in realworld ai systems. Artificial Intelligence for Applications, Conference on, 0:87, 1995. //www.solarisinternals.com/si/tools/
    Locate open access versionFindings
  • [16] E. L. Miller. Towards scalable benchmarks for mass storage systems. In 5th NASA Goddard Conference on Mass Storage Systems and Technologies, 1996.
    Google ScholarLocate open access versionFindings
  • [17] E. Riedel, M. Kallahalla, and R. Swaminathan. A Framework for Evaluating Storage System Security. In Proceedings of the 1st USENIX Symposium on File and Storage Technologies (FAST ’02), pages 14–29, Monterey, California, January 2002.
    Google ScholarLocate open access versionFindings
  • [18] M. Rinard, C. Cadar, D. Dumitran, D. M. Roy, T. Leu, and J. William S. Beebe. Enhancing Server Availability and Security Through FailureOblivious Computing. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI ’04), San Francicso, CA, December 2004.
    Google ScholarLocate open access versionFindings
  • [19] C. Ruemmler and J. Wilkes. An Introduction to Disk Drive Modeling. IEEE Computer, 27(3):17– 28, March 1994.
    Google ScholarLocate open access versionFindings
  • [20] E. Shriver. Performance modeling for realistic storage devices. PhD thesis, New York, NY, USA, 1997.
    Google ScholarFindings
  • [21] M. Sivathanu, V. Prabhakaran, F. I. Popovici, T. E. Denehy, A. C. Arpaci-Dusseau, and R. H. ArpaciDusseau. Semantically-Smart Disk Systems. In Proceedings of the 2nd USENIX Symposium on File and Storage Technologies (FAST ’03), pages 73–88, San Francisco, California, April 2003.
    Google ScholarLocate open access versionFindings
  • [22] Standard Performance Evaluation Corporation. SPECmail2009 Benchmark. http://www.spec.org/mail2009/.
    Findings
  • [23] A. Sweeney, D. Doucette, W. Hu, C. Anderson, M. Nishimoto, and G. Peck. Scalability in the XFS File System. In Proceedings of the USENIX Annual Technical Conference (USENIX ’96), San Diego, California, January 1996.
    Google ScholarLocate open access versionFindings
  • [24] A. Traeger and E. Zadok. How to cheat at benchmarking. In USENIX FAST Birds of a feather session, San Francisco, CA, February 2009.
    Google ScholarFindings
  • [25] S. C. Tweedie. Journaling the Linux ext2fs File System. In The Fourth Annual Linux Expo, Durham, North Carolina, May 1998.
    Google ScholarFindings
  • [26] Wikipedia. Btrfs. en.wikipedia.org/wiki/Btrfs, 2009.
    Google ScholarFindings
  • [27] M. Wittle and B. E. Keith. LADDIS: The next generation in NFS file server benchmarking. In USENIX Summer, pages 111–128, 1993.
    Google ScholarLocate open access versionFindings
  • [28] E. Zadok. File and storage systems benchmarking workshop. UC Santa Cruz, CA, May 2008.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科