AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
One key aspect of this problem is generating realistic filesystem state, with due emphasis given to file-system metadata and file content

Generating realistic impressions for file-system benchmarking

TOS, no. 4 (2009): 125-138

Cited by: 111|Views141
EI

Abstract

The performance of file systems and related software depends on characteristics of the underlying file-system image (i.e., file-system metadata and file contents). Unfortunately, rather than benchmarking with realistic file-system images, most system designers and evaluators rely on ad hoc assumptions and (often inaccurate) rules of thumb...More

Code:

Data:

0
Introduction
  • File system benchmarking is in a state of disarray.
  • In spite of tremendous advances in file system design, the approaches for benchmarking still lag far behind.
  • The two main challenges in achieving this goal are generating representative workloads, and creating realistic filesystem state.
  • While creating representative workloads is not an entirely solved problem, significant steps have been taken towards this goal.
  • Empirical studies of file-system access patterns [4, 19, 33] and file-system activity traces [38, 45] have led to work on synthetic workload generators [2, 14] and methods for trace replay [3, 26]
Highlights
  • File system benchmarking is in a state of disarray
  • We evaluate two desktop search applications: open-source Beagle [5] and Google’s Desktop for Linux (GDL) [16]
  • One key aspect of this problem is generating realistic filesystem state, with due emphasis given to file-system metadata and file content
  • We develop Impressions, a statistical framework to generate realistic and configurable file-system images
  • We find Impressions easy to use and well suited for a number of tasks
Results
  • Each line in the graph represents an independent trial, starting at a y-axis value equal to the sum of its initially sampled file sizes.
  • Note that in this example, the initial sum differs from the desired sum by more than a 100% in several cases.
  • Impressions successfully converges the initial sample set to the desired sum with an average oversampling rate α less than 5%
Conclusion
  • One key aspect of this problem is generating realistic filesystem state, with due emphasis given to file-system metadata and file content
  • To address this problem, the authors develop Impressions, a statistical framework to generate realistic and configurable file-system images.
  • It enables application developers to tune their systems to the file system characteristics likely found in their target users
  • Impressions makes it feasible to compare performance of systems by standardizing and reporting all used parameters, a requirement necessary for benchmarking.
  • Please check http://www.cs.wisc.edu/adsl/ Software/Impressions/ to obtain a copy
Tables
  • Table1: Choice of file system parameters in prior research
  • Table2: Parameters and default values in Impressions. List of distributions and their parameter values used in the
  • Table3: Statistical accuracy of generated images
  • Table4: Summary of resolving multiple constraints. Shows average rate and accuracy of convergence after resolving multiple constraints for different values of desired file system size. β: % error between the desired and generated sum, α: % of oversamples required, D is the test statistic for the K-S test representing the maximum difference between generated and desired empirical cumulative distributions. Averages are for 20 trials. Success is the number of trials having final β ≤ 5%, and D passing the K-S test
  • Table5: Accuracy of interpolation and extrapolation
  • Table6: Performance of Impressions. Shows time taken to create file-system images with break down for individual features. Image1: 4.55 GB, 20000 files, 4000 dirs. Image2: 12.0 GB, 52000 files, 4000 dirs. Other parameters are default. The two entries for additional parameters are shown only for Image1 and represent times in addition to default times
Download tables as Excel
Related work
  • We discuss previous research in four areas related to file system benchmarking and usage of file system metadata.

    First, Impressions enables file system measurement studies to be put into practice. Besides the metadata studies on Windows workstations [1, 12], previous work in non-Windows environment includes Satyanarayanan’s study of a Digital PDP-10 [41], Irlam’s and Mullender’s studies of Unix systems [21, 29], and the study of HP-UX systems at Hewlett-Packard [42]. These studies provide valuable data for designers of file systems and related software, and can be easily incorporated in Impressions.

    Second, several models have been proposed to explain observed file-system phenomena. Mitzenmacher proposed a generative model, called the Recursive Forest File model [27] to explain the behavior of file size distributions. The model accounts for the hybrid distribution of file sizes with a lognormal body and Pareto tail. Downey’s Multiplicative File Size model [13] is based on the assumption that new files are created by using older files as templates e.g., by copying, editing or filtering an old file. The size of the new file in this model is given by the size of the old file multiplied by an independent factor. These models provide an intuitive understanding of the underlying phenomena, and are also easier for computer simulation. In future, Impressions can be enhanced by incorporating more such models.
Funding
  • Finally, we would like to thank Valerie Aurora Henson (our shepherd) and the anonymous reviewers for their excellent feedback and comments. This material is based upon work supported by the National Science Foundation under the following grants: CCF-0621487, CNS-0509474, as well as by generous donations from Network Appliance and Sun Microsystems
Reference
  • N. Agrawal, W. J. Bolosky, J. R. Douceur, and J. R. Lorch. A Five-Year Study of File-System Metadata. In FAST ’07, San Jose, CA, February 2007.
    Google ScholarLocate open access versionFindings
  • D. Anderson and J. Chase. Fstress: A flexible network file service benchmark. In TR, Duke University, May 2002.
    Google ScholarLocate open access versionFindings
  • E. Anderson, M. Kallahalla, M. Uysal, and R. Swaminathan. Buttress: A toolkit for flexible and high fidelity I/O benchmarking. In FAST ’04, San Francisco, CA, April 2004.
    Google ScholarLocate open access versionFindings
  • M. Baker, J. Hartman, M. Kupfer, K. Shirriff, and J. Ousterhout. Measurements of a Distributed File System. In SOSP ’91, pages 198–212, Pacific Grove, CA, October 1991.
    Google ScholarLocate open access versionFindings
  • Beagle Project. Beagle Desktop Search. http://www.beagle-project.org/.
    Findings
  • P. M. Chen and D. A. Patterson. A New Approach to I/O Performance Evaluation–Self-Scaling I/O Benchmarks, Predicted I/O Performance. In SIGMETRICS ’93, pages 1–12, Santa Clara, CA, May 1993.
    Google ScholarLocate open access versionFindings
  • J. Cipar, M. D. Corner, and E. D. Berger. Tfs: a transparent file system for contributory storage. In FAST ’07, pages 28–28, Berkeley, CA, USA, 200USENIX Association.
    Google ScholarLocate open access versionFindings
  • T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. MIT Press and McGraw-Hill, second edition, 2001. 35.5: The subset-sum problem.
    Google ScholarLocate open access versionFindings
  • L. P. Cox, C. D. Murray, and B. D. Noble. Pastiche: making backup cheap and easy. SIGOPS Oper. Syst. Rev., 36, 2002.
    Google ScholarLocate open access versionFindings
  • L. P. Cox and B. D. Noble. Samsara: honor among thieves in peer-to-peer storage. In SOSP ’03: Proceedings of the nineteenth ACM symposium on Operating systems principles, pages 120– 132, New York, NY, USA, 2003. ACM.
    Google ScholarLocate open access versionFindings
  • M. D. Dahlin, R. Y. Wang, T. E. Anderson, and D. A. Patterson. Cooperative Caching: Using Remote Client Memory to Improve File System Performance. In OSDI ’94, Monterey, CA, November 1994.
    Google ScholarLocate open access versionFindings
  • J. R. Douceur and W. J. Bolosky. A large-scale study of filesystem contents. In Proceedings of the 1999 Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), pages 59–70, Atlanta, GA, May 1999.
    Google ScholarLocate open access versionFindings
  • A. B. Downey. The structural cause of file size distributions. In Ninth MASCOTS’01, Los Alamitos, CA, USA, 2001.
    Google ScholarLocate open access versionFindings
  • M. R. Ebling and M. Satyanarayanan. Synrgen: an extensible file reference generator. In SIGMETRICS ’94: Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems, New York, NY, 1994.
    Google ScholarLocate open access versionFindings
  • K. Fu, M. F. Kaashoek, and D. Mazieres. Fast and secure distributed read-only file system. ACM Trans. Comput. Syst., 20(1):1–24, 2002.
    Google ScholarLocate open access versionFindings
  • Google Corp. Google Desktop for Linux. http://desktop.google.com/linux/index.html.
    Findings
  • B. Gopal and U. Manber. Integrating content-based access mechanisms with hierarchical file systems. In OSDI ’99: Third symposium on Operating Systems Design and Implementation, 1999.
    Google ScholarLocate open access versionFindings
  • GraphApp. GraphApp Toolkit. http://enchantia.com/software/graphapp/.
    Findings
  • S. D. Gribble, G. S. Manku, D. S. Roselli, E. A. Brewer, T. J. Gibson, and E. L. Miller. Self-similarity in file systems. In Proceedings of the 1998 Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), pages 141–150, Madison, WI, June 1998.
    Google ScholarLocate open access versionFindings
  • N. C. Hutchinson, S. Manley, M. Federwisch, G. Harris, D. Hitz, S. Kleiman, and S. O’Malley. Logical vs. Physical File System Backup. In OSDI ’99, New Orleans, LA, February 1999.
    Google ScholarLocate open access versionFindings
  • G. Irlam. Unix file size survey – 1993. Available at http://www.base.com/gordoni/ufs93.html.
    Findings
  • John McCutchan and Robert Love. inotify for linux. http://www.linuxjournal.com/article/8478.
    Findings
  • Jonathan Corbet. LWN Article: SEEK HOLE or FIEMAP? http://lwn.net/Articles/260795/.
    Findings
  • J. Katcher. PostMark: A New File System Benchmark. Technical Report TR-3022, Network Appliance Inc., October 1997.
    Google ScholarFindings
  • A. W. Leung, S. Pasupathy, G. Goodson, and E. L. Miller. Measurement and Analysis of Large-Scale Network File System Workloads. In Proceedings of the USENIX Annual Technical Conference, Boston, MA, June 2008.
    Google ScholarLocate open access versionFindings
  • M. P. Mesnier, M. Wachs, R. R. Sambasivan, J. Lopez, J. Hendricks, G. R. Ganger, and D. O’Hallaron. trace: parallel trace replay with approximate causal events. In FAST ’07, San Jose, CA, February 2007.
    Google ScholarLocate open access versionFindings
  • M. Mitzenmacher. Dynamic models for file sizes and double pareto distributions. In Internet Mathematics, 2002.
    Google ScholarLocate open access versionFindings
  • Mplayer. The MPlayer movie player. http://www.mplayerhq.hu/.
    Findings
  • S. J. Mullender and A. S. Tanenbaum. Immediate files. Software—Practice and Experience, 14(4):365–368, April 1984.
    Google ScholarLocate open access versionFindings
  • A. Muthitacharoen, B. Chen, and D. Mazieres. A LowBandwidth Network File System. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP-01), pages 174–187, Banff, Canada, October 2001.
    Google ScholarLocate open access versionFindings
  • Myers Carpenter. Id3v2: A command line editor for id3v2 tags. http://id3v2.sourceforge.net/. http://trec.nist.gov/data, 2007.
    Findings
  • [33] J. K. Ousterhout, H. D. Costa, D. Harrison, J. A. Kunze, M. Kupfer, and J. G. Thompson. A Trace-Driven Analysis of the UNIX 4.2 BSD File System. In SOSP ’85, pages 15–24, Orcas Island, WA, December 1985.
    Google ScholarLocate open access versionFindings
  • [34] Y. Padioleau and O. Ridoux. A logic file system. In USENIX Annual Technical Conference, San Antonio, TX, June 2003.
    Google ScholarLocate open access versionFindings
  • [35] D. Patterson, G. Gibson, and R. Katz. A Case for Redundant Arrays of Inexpensive Disks (RAID). In SIGMOD ’88, pages 109–116, Chicago, IL, June 1988.
    Google ScholarLocate open access versionFindings
  • [36] V. Prabhakaran, L. N. Bairavasundaram, N. Agrawal, H. S. Gunawi, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. IRON File Systems. In SOSP ’05, pages 206–220, Brighton, UK, October 2005.
    Google ScholarLocate open access versionFindings
  • [37] B. Przydatek. A Fast Approximation Algorithm for the Subsetsum Problem. International Transactions in Operational Research, 9(4):437–459, 2002.
    Google ScholarLocate open access versionFindings
  • [38] E. Riedel, M. Kallahalla, and R. Swaminathan. A Framework for Evaluating Storage System Security. In FAST ’02, pages 14–29, Monterey, CA, January 2002.
    Google ScholarLocate open access versionFindings
  • [39] D. Roselli, J. R. Lorch, and T. E. Anderson. A Comparison of File System Workloads. In USENIX ’00, pages 41–54, San Diego, CA, June 2000.
    Google ScholarLocate open access versionFindings
  • [40] A. Rowstron and P. Druschel. Storage Management and Caching in PAST, A Large-scale, Persistent Peer-to-peer Storage Utility. In SOSP ’01, Banff, Canada, October 2001.
    Google ScholarFindings
  • [41] M. Satyanarayanan. A study of file sizes and functional lifetimes. In Proceedings of the 8th ACM Symposium on Operating Systems Principles (SOSP), pages 96–108, Pacific Grove, CA, December 1981.
    Google ScholarLocate open access versionFindings
  • [42] T. F. Sienknecht, R. J. Friedrich, J. J. Martinka, and P. M. Friedenbach. The implications of distributed data in a commercial environment on the design of hierarchical storage management. Performance Evaluation, 20(1–3):3–25, May 1994.
    Google ScholarLocate open access versionFindings
  • [43] B. Sigurd, M. Eeg-Olofsson, and J. van de Weijer. Word length, sentence length and frequency – Zipf revisited. Studia Linguistica, 58(1):37–52, 2004.
    Google ScholarLocate open access versionFindings
  • [44] K. Smith and M. I. Seltzer. File System Aging. In Proceedings of the 1997 Sigmetrics Conference, Seattle, WA, June 1997.
    Google ScholarLocate open access versionFindings
  • [45] SNIA. Storage network industry association: Iotta repository. http://iotta.snia.org, 2007.
    Findings
  • [46] S. Sobti, N. Garg, F. Zheng, J. Lai, Y. Shao, C. Zhang, W. Ziskind, and A. Krishnamurthy. Segank: A Distributed Mobile Storage System. In FAST ’04, pages 239–252, San Francisco, CA, April 2004.
    Google ScholarLocate open access versionFindings
  • [47] M. W. Storer, K. M. Greenan, E. L. Miller, and K. Voruganti. Pergamum: replacing tape with energy efficient, reliable, diskbased archival storage. In FAST’08: Proceedings of the 6th USENIX Conference on File and Storage Technologies, pages 1– 16, Berkeley, CA, USA, 2008. USENIX Association.
    Google ScholarLocate open access versionFindings
  • [48] C. P. Wright, N. Joukov, D. Kulkarni, Y. Miretskiy, and E. Zadok. Auto-pilot: A platform for system software benchmarking. In Proceedings of the Annual USENIX Technical Conference, FREENIX Track, Anaheim, CA, April 2005.
    Google ScholarLocate open access versionFindings
  • [49] Z. Zhang and K. Ghose. yfs: A journaling file system design for handling large data sets with reduced seeking. In FAST ’03, pages 59–72, Berkeley, CA, USA, 2003. USENIX Association.
    Google ScholarLocate open access versionFindings
  • [50] N. Zhu, J. Chen, and T.-C. Chiueh. Tbbt: scalable and accurate trace replay for file server evaluation. In Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies, pages 24–24, Berkeley, CA, USA, 2005. USENIX Association.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科