AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We have described and characterized Pond, the OceanStore prototype

Pond: the oceanstore prototype

FAST, pp.1-1, (2003)

Cited by: 664|Views150
EI
Full Text
Bibtex
Weibo

Abstract

OceanStore is an Internet-scale, persistent data store designed for incremental scalability, secure sharing, and long-term durability. Pond is the OceanStore prototype; it contains many of the features of a complete system including location-independent routing, Byzantine update commitment, push-based update of cached copies through an ov...More

Code:

Data:

0
Introduction
  • One of the dominant costs of storage today is management: maintaining the health and performance characteristics of data over the long term.
  • Disk storage capacity per unit cost has skyrocketed; assuming growth continues according to Moore’s law, a terabyte of EIDE storage will cost $100 US in under three years.
  • These trends present a unique opportunity for file system designers: for the first time, one can imagine providing truly durable, selfmaintaining storage to every computer user.
  • OceanStore [14, 26] is an Internet-scale, cooperative file system designed to harness these trends to provide
Highlights
  • One of the dominant costs of storage today is management: maintaining the health and performance characteristics of data over the long term
  • The rise of the Internet over the last decade has spawned the advent of universal connectivity; the average computer user today is increasingly likely to be connected to the Internet via a high-bandwidth link
  • Disk storage capacity per unit cost has skyrocketed; assuming growth continues according to Moore’s law, a terabyte of EIDE storage will cost $100 US in under three years. These trends present a unique opportunity for file system designers: for the first time, one can imagine providing truly durable, selfmaintaining storage to every computer user
  • Create Object Update Object happen often, it can add several seconds of delay to a task normally measured in tens of milliseconds. To adjust for these anomalies, we report the median value and the 0th and 95th percentile values for experiments that are severely effected by garbage collection instead of the more typical mean and standard deviation
  • We have described and characterized Pond, the OceanStore prototype
  • Threshold signatures have proven far more costly than we anticipated, requiring an order of magnitude more time to compute than regular public key signatures
Methods
  • The authors use two experimental test beds to measure the system.
  • The first test bed consists of a local cluster of fortytwo machines at Berkeley.
  • Each machine in the cluster is a IBM xSeries 330 1U rackmount PC with two 1.0 GHz Pentium III CPUs, 1.5 GB ECC PC133 SDRAM, and.
  • Cauchy Rate 1/2, 32 Fragments No Archive Object Size.
  • The operating system on each node is Debian GNU/Linux 3.0, running the Linux 2.4.18 SMP kernel.
Results
  • The authors present a detailed performance analysis of Pond. The authors' results demonstrate the performance characteristics of the system and highlight promising areas for further research.
  • As discussed in Section 2, the data object is represented as a B-tree with metadata appended to the top block.
  • When the user data portion of the data object is smaller than the block size, the overhead of the top block dominates the storage overhead.
  • As the user data increases in size, the overhead of the top block and any interior blocks becomes negligible.
  • Figure 4 shows the overhead due to the B-tree for varying data sizes
Conclusion
  • Conclusions and Future Work

    The authors have described and characterized Pond, the OceanStore prototype.
  • While many important challenges remain, this prototype is a working subset of the vision presented in the original OceanStore paper [14].
  • Building this prototype has refined the plans for future research.
  • While the latency overhead of Tapestry has been examined before [24], quantifying the additional storage costs it imposes is a topic for future research
Tables
  • Table1: Summary of Globally Unique Identifiers (GUIDs)
  • Table2: Results of the Latency Microbenchmark in the Local Area. All nodes are hosted on the cluster. Ping latency between nodes in the cluster is 0.2 ms. We run with the archive enabled and disabled while varying the update size and key length
  • Table3: Latency Breakdown of an Update. The majority of the time in a small update performed on the cluster is spent computing the threshold signature share over the result. With larger updates, the time to apply and archive the update dominates signature time
  • Table4: Results of the Latency Microbenchmark Run in the Wide Area. All tests were run with the archive enabled using 1024-bit keys. “Avg. Ping” is the average ping time in milliseconds from the client machine to each of the inner ring servers. UCSD is the University of California at San Diego
  • Table5: Throughput in the Wide Area. The throughput for a distributed ring is limited by the wide-area bandwidth. All tests are run with the archive on and 1024-bit keys
  • Table6: Results of the Tag Microbenchmark. Each experiment was run at least three times, and the standard deviation across experiments was less than 10% of the mean. All experiments are run using 1024-bit keys and with the archive disabled
  • Table7: Results of the Andrew Benchmark. All experiments are run with the archive disabled using 512 or 1024-bit keys, as indicated by the column headers. Times are in seconds, and each data point is an average over at least three trials. The standard deviation for all points was less than 7.5% of the mean
Download tables as Excel
Related work
  • A number of distributed storage systems have preceded OceanStore; notable examples include [31, 13, 8]. More recently, as the unreliability of hosts in a distributed setting has been studied, Byzantine fault-tolerant services have become popular. FarSite [3] aims to build an enterprise-scale distributed file system, using Byzantine fault-tolerance for directories only. The ITTC project [40] and the COCA project [42] both build certificate authorities (CAs) using threshold signatures; the later combines this scheme with a quorum-based Byzantine fault-tolerant algorithm. The Fleet [16] persistent object system also uses a quorum-based algorithm.

    Quorum-based Byzantine agreement requires less communication per replica than the state-machine based agreement used in OceanStore; however, it tolerates proportionally less faults. It was this tradeoff that led us to our architecture; we use primary-copy replication [10] to reduce communication costs, but implement the primary replica as a small set servers using state-machine Byzantine agreement to achieve fault tolerance.
Funding
  • Dennis Geels is supported by the Fannie and John Hertz Foundation
Reference
  • R. Anderson. The eternity service. In Proceedings of Pragocrypt, 1996.
    Google ScholarLocate open access versionFindings
  • J. Bloemer et al. An XOR-based erasure-resilient coding scheme. Technical Report TR-95-048, The International Computer Science Institute, Berkeley, CA, 1995.
    Google ScholarFindings
  • W. Bolosky, J. Douceur, D. Ely, and M. Theimer. Feasibility of a serverless distributed file system deployed on an existing set of desktop PCs. In Proc. of Sigmetrics, June 2000.
    Google ScholarLocate open access versionFindings
  • M. Castro and B. Liskov. Proactive recovery in a byzantine-fault-tolerant system. In Proc. of OSDI, 2000.
    Google ScholarLocate open access versionFindings
  • Y. Chen, R. Katz, and J. Kubiatowicz. SCAN: A dynamic, scalable, and efficient content distribution network. In Proc. of International Conference on Pervasive Computing, 2002.
    Google ScholarLocate open access versionFindings
  • I. Clark, O. Sandberg, B. Wiley, and T. Hong. Freenet: A distributed anonymous information storage and retrieval system. In Proc. of the Workshop on Design Issues in Anonymity and Unobservability, pages 311–320, Berkeley, CA, July 2000.
    Google ScholarLocate open access versionFindings
  • F. Dabek, M. F. Kaashoek, D. Karger, R. Morris, and I. Stoica. Wide-area cooperative storage with CFS. In Proc. of ACM SOSP, October 2001.
    Google ScholarLocate open access versionFindings
  • A. Demers et al. The Bayou architecture: Support for data sharing among mobile users. In Proc. of IEEE Workshop on Mobile Computing Systems & Applications, 1994.
    Google ScholarLocate open access versionFindings
  • A. Goldberg and P. Yianilos. Towards an archival intermemory. In Proc. of IEEE ADL, pages 147–156, April 1998.
    Google ScholarLocate open access versionFindings
  • J. Gray, P. Helland, P. O’Neil, and D. Shasha. The dangers of replication and a solution. In Proc. of ACM SIGMOD Conf., June 1996.
    Google ScholarLocate open access versionFindings
  • S. Hand and T. Roscoe. Mnemosyne: Peer-to-peer steganographic storage. In Proc. of IPTPS, March 2002.
    Google ScholarLocate open access versionFindings
  • K. Hildrum, J. Kubiatowicz, S. Rao, and B. Zhao. Distributed object location in a dynamic network. In Proc. of ACM SPAA, pages 41–52, August 2002.
    Google ScholarLocate open access versionFindings
  • J. Kistler and M. Satyanarayanan. Disconnected operation in the Coda file system. ACM Transactions on Computer Systems, 10(1):3–25, February 1992.
    Google ScholarLocate open access versionFindings
  • J. Kubiatowicz et al. Oceanstore: An architecture for global-scale persistent storage. In Proc. of ASPLOS, 2000.
    Google ScholarLocate open access versionFindings
  • A. Rubin M. Waldman and L. Cranor. Publius: A robust, tamper-evident, censorship-resistant, web publishing system. In Proc. 9th USENIX Security Symposium, 2000.
    Google ScholarLocate open access versionFindings
  • D. Malkhi, M. K. Reiter, D. Tulone, and E. Ziskind. Persistent objects in the fleet system. In DISCEX II, 2001.
    Google ScholarLocate open access versionFindings
  • Dahlia Malkhi, Moni Naor, and David Ratajczak. Viceroy: A scalable and dynamic emulation of the butterfly. In Proc. of ACM PODC Symp., 2002.
    Google ScholarLocate open access versionFindings
  • Petar Maymounkov and David Mazieres. Kademlia: A peer-to-peer information system based on the XOR metric. In Proc. of IPTPS, 2002.
    Google ScholarLocate open access versionFindings
  • D. Mazieres. A toolkit for user-level file systems. In Proc. of USENIX Summer Technical Conf., June 2001.
    Google ScholarLocate open access versionFindings
  • R. Merkle. A digital signature based on a conventional encryption function. In Proc. of CRYPTO, pages 369– 378. Springer-Verlag, 1988.
    Google ScholarLocate open access versionFindings
  • A. Muthitacharoen, R. Morris, T. Gil, and B. Chen. Ivy: A read/write peer-to-peer file system. In Proc. of OSDI, 2002.
    Google ScholarLocate open access versionFindings
  • T. Rabin. A simplified approach to threshold and proactive RSA. In Proceedings of Crypto, 1998.
    Google ScholarLocate open access versionFindings
  • S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Schenker. A scalable content-addressable network. In Proceedings of SIGCOMM. ACM, August 2001.
    Google ScholarLocate open access versionFindings
  • S. Rhea and J. Kubiatowicz. Probabilistic location and routing. In Proc. of INFOCOM. IEEE, June 2002.
    Google ScholarLocate open access versionFindings
  • S. Rhea, T. Roscoe, and J. Kubiatowicz. DHTs need application-driven benchmarks. In Proc. of IPTPS, 2003.
    Google ScholarLocate open access versionFindings
  • S. Rhea, C. Wells, P. Eaton, D. Geels, B. Zhao, H. Weatherspoon, and J. Kubiatowicz. Maintenance free global storage in oceanstore. In Proc. of IEEE Internet Computing. IEEE, September 2001.
    Google ScholarLocate open access versionFindings
  • A. Rowstron and P. Druschel. Pastry: Scalable, distributed object location and routing for large scale peerto-peer systems. In Proc. of IFIP/ACM Middleware, November 2001.
    Google ScholarLocate open access versionFindings
  • A. Rowstron and P. Druschel. Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility. In Proc. of ACM SOSP, 2001.
    Google ScholarLocate open access versionFindings
  • Y. Saito, C. Karamanolis, M. Karlsson, and M. Mahalingam. Taming aggressive replication in the pangaea wide-area file system. In Proc. of OSDI, 2002.
    Google ScholarLocate open access versionFindings
  • D. Santry, M. Feeley, N. Hutchinson, A. Veitch, R. Carton, and J. Ofir. Deciding when to forget in the Elephant file system. In Proc. of ACM SOSP, December 1999.
    Google ScholarLocate open access versionFindings
  • M. Satyanarayanan. Scalable, secure, and highly available distributed file access. IEEE Computer, 23(5), May 1990.
    Google ScholarLocate open access versionFindings
  • V. Shoup. Practical threshold signatures. In Proc. of EUROCRYPT, 2000.
    Google ScholarLocate open access versionFindings
  • I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of SIGCOMM. ACM, August 2001.
    Google ScholarLocate open access versionFindings
  • M. Stonebraker. The design of the Postgres storage system. In Proc. of Intl. Conf. on VLDB, September 1987.
    Google ScholarLocate open access versionFindings
  • H. Weatherspoon and J. Kubiatowicz. Efficient heartbeats and repair of softstate in decentralized object location and routing systems. In Proc. of SIGOPS European Workshop, 2002.
    Google ScholarLocate open access versionFindings
  • H. Weatherspoon and J. Kubiatowicz. Erasure coding vs. replication: A quantitative comparison. In Proc. of IPTPS, March 2002.
    Google ScholarLocate open access versionFindings
  • H. Weatherspoon, T. Moscovitz, and J. Kubiatowicz. Introspective failure analysis: Avoiding correlated failures in peer-to-peer systems. In Proc. of International Workshop on Reliable Peer-to-Peer Distributed Systems, October 2002.
    Google ScholarLocate open access versionFindings
  • H. Weatherspoon, C. Wells, and J. Kubiatowicz. Naming and integrity: Self-verifying data in peer-to-peer systems. In Proc. of International Workshop on Future Directions of Distributed Systems, 2002.
    Google ScholarLocate open access versionFindings
  • M. Welsh, D. Culler, and E. Brewer. SEDA: An architecture for well-conditioned, scalable internet services. In Proc. of ACM SOSP, October 2001.
    Google ScholarLocate open access versionFindings
  • T. Wu, M. Malkin, and D. Boneh. Building intrusiontolerant applications. In Proc. of USENIX Security Symp., August 1999.
    Google ScholarLocate open access versionFindings
  • J. Wylie, M. Bigrigg, J. Strunk, G. Ganger, H. Kiliccote, and P. Khosla. Survivable information storage systems. IEEE Computer, 33(8):61–68, August 2000.
    Google ScholarLocate open access versionFindings
  • L. Zhou, F. Schneider, and R. van Renesse. Coca: A secure distributed on-line certification authority. Technical Report 2000-1828, Department of Computer Science, Cornell University, Ithaca, NY USA, 2000.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科