AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
We have described and characterized Pond, the OceanStore prototype
Pond: the oceanstore prototype
FAST, pp.1-1, (2003)
OceanStore is an Internet-scale, persistent data store designed for incremental scalability, secure sharing, and long-term durability. Pond is the OceanStore prototype; it contains many of the features of a complete system including location-independent routing, Byzantine update commitment, push-based update of cached copies through an ov...More
PPT (Upload PPT)
- One of the dominant costs of storage today is management: maintaining the health and performance characteristics of data over the long term.
- Disk storage capacity per unit cost has skyrocketed; assuming growth continues according to Moore’s law, a terabyte of EIDE storage will cost $100 US in under three years.
- These trends present a unique opportunity for file system designers: for the first time, one can imagine providing truly durable, selfmaintaining storage to every computer user.
- OceanStore [14, 26] is an Internet-scale, cooperative file system designed to harness these trends to provide
- One of the dominant costs of storage today is management: maintaining the health and performance characteristics of data over the long term
- The rise of the Internet over the last decade has spawned the advent of universal connectivity; the average computer user today is increasingly likely to be connected to the Internet via a high-bandwidth link
- Disk storage capacity per unit cost has skyrocketed; assuming growth continues according to Moore’s law, a terabyte of EIDE storage will cost $100 US in under three years. These trends present a unique opportunity for file system designers: for the first time, one can imagine providing truly durable, selfmaintaining storage to every computer user
- Create Object Update Object happen often, it can add several seconds of delay to a task normally measured in tens of milliseconds. To adjust for these anomalies, we report the median value and the 0th and 95th percentile values for experiments that are severely effected by garbage collection instead of the more typical mean and standard deviation
- We have described and characterized Pond, the OceanStore prototype
- Threshold signatures have proven far more costly than we anticipated, requiring an order of magnitude more time to compute than regular public key signatures
- The authors use two experimental test beds to measure the system.
- The first test bed consists of a local cluster of fortytwo machines at Berkeley.
- Each machine in the cluster is a IBM xSeries 330 1U rackmount PC with two 1.0 GHz Pentium III CPUs, 1.5 GB ECC PC133 SDRAM, and.
- Cauchy Rate 1/2, 32 Fragments No Archive Object Size.
- The operating system on each node is Debian GNU/Linux 3.0, running the Linux 2.4.18 SMP kernel.
- The authors present a detailed performance analysis of Pond. The authors' results demonstrate the performance characteristics of the system and highlight promising areas for further research.
- As discussed in Section 2, the data object is represented as a B-tree with metadata appended to the top block.
- When the user data portion of the data object is smaller than the block size, the overhead of the top block dominates the storage overhead.
- As the user data increases in size, the overhead of the top block and any interior blocks becomes negligible.
- Figure 4 shows the overhead due to the B-tree for varying data sizes
- Conclusions and Future Work
The authors have described and characterized Pond, the OceanStore prototype.
- While many important challenges remain, this prototype is a working subset of the vision presented in the original OceanStore paper .
- Building this prototype has refined the plans for future research.
- While the latency overhead of Tapestry has been examined before , quantifying the additional storage costs it imposes is a topic for future research
- Table1: Summary of Globally Unique Identifiers (GUIDs)
- Table2: Results of the Latency Microbenchmark in the Local Area. All nodes are hosted on the cluster. Ping latency between nodes in the cluster is 0.2 ms. We run with the archive enabled and disabled while varying the update size and key length
- Table3: Latency Breakdown of an Update. The majority of the time in a small update performed on the cluster is spent computing the threshold signature share over the result. With larger updates, the time to apply and archive the update dominates signature time
- Table4: Results of the Latency Microbenchmark Run in the Wide Area. All tests were run with the archive enabled using 1024-bit keys. “Avg. Ping” is the average ping time in milliseconds from the client machine to each of the inner ring servers. UCSD is the University of California at San Diego
- Table5: Throughput in the Wide Area. The throughput for a distributed ring is limited by the wide-area bandwidth. All tests are run with the archive on and 1024-bit keys
- Table6: Results of the Tag Microbenchmark. Each experiment was run at least three times, and the standard deviation across experiments was less than 10% of the mean. All experiments are run using 1024-bit keys and with the archive disabled
- Table7: Results of the Andrew Benchmark. All experiments are run with the archive disabled using 512 or 1024-bit keys, as indicated by the column headers. Times are in seconds, and each data point is an average over at least three trials. The standard deviation for all points was less than 7.5% of the mean
- A number of distributed storage systems have preceded OceanStore; notable examples include [31, 13, 8]. More recently, as the unreliability of hosts in a distributed setting has been studied, Byzantine fault-tolerant services have become popular. FarSite  aims to build an enterprise-scale distributed file system, using Byzantine fault-tolerance for directories only. The ITTC project  and the COCA project  both build certificate authorities (CAs) using threshold signatures; the later combines this scheme with a quorum-based Byzantine fault-tolerant algorithm. The Fleet  persistent object system also uses a quorum-based algorithm.
Quorum-based Byzantine agreement requires less communication per replica than the state-machine based agreement used in OceanStore; however, it tolerates proportionally less faults. It was this tradeoff that led us to our architecture; we use primary-copy replication  to reduce communication costs, but implement the primary replica as a small set servers using state-machine Byzantine agreement to achieve fault tolerance.
- Dennis Geels is supported by the Fannie and John Hertz Foundation
- R. Anderson. The eternity service. In Proceedings of Pragocrypt, 1996.
- J. Bloemer et al. An XOR-based erasure-resilient coding scheme. Technical Report TR-95-048, The International Computer Science Institute, Berkeley, CA, 1995.
- W. Bolosky, J. Douceur, D. Ely, and M. Theimer. Feasibility of a serverless distributed file system deployed on an existing set of desktop PCs. In Proc. of Sigmetrics, June 2000.
- M. Castro and B. Liskov. Proactive recovery in a byzantine-fault-tolerant system. In Proc. of OSDI, 2000.
- Y. Chen, R. Katz, and J. Kubiatowicz. SCAN: A dynamic, scalable, and efficient content distribution network. In Proc. of International Conference on Pervasive Computing, 2002.
- I. Clark, O. Sandberg, B. Wiley, and T. Hong. Freenet: A distributed anonymous information storage and retrieval system. In Proc. of the Workshop on Design Issues in Anonymity and Unobservability, pages 311–320, Berkeley, CA, July 2000.
- F. Dabek, M. F. Kaashoek, D. Karger, R. Morris, and I. Stoica. Wide-area cooperative storage with CFS. In Proc. of ACM SOSP, October 2001.
- A. Demers et al. The Bayou architecture: Support for data sharing among mobile users. In Proc. of IEEE Workshop on Mobile Computing Systems & Applications, 1994.
- A. Goldberg and P. Yianilos. Towards an archival intermemory. In Proc. of IEEE ADL, pages 147–156, April 1998.
- J. Gray, P. Helland, P. O’Neil, and D. Shasha. The dangers of replication and a solution. In Proc. of ACM SIGMOD Conf., June 1996.
- S. Hand and T. Roscoe. Mnemosyne: Peer-to-peer steganographic storage. In Proc. of IPTPS, March 2002.
- K. Hildrum, J. Kubiatowicz, S. Rao, and B. Zhao. Distributed object location in a dynamic network. In Proc. of ACM SPAA, pages 41–52, August 2002.
- J. Kistler and M. Satyanarayanan. Disconnected operation in the Coda file system. ACM Transactions on Computer Systems, 10(1):3–25, February 1992.
- J. Kubiatowicz et al. Oceanstore: An architecture for global-scale persistent storage. In Proc. of ASPLOS, 2000.
- A. Rubin M. Waldman and L. Cranor. Publius: A robust, tamper-evident, censorship-resistant, web publishing system. In Proc. 9th USENIX Security Symposium, 2000.
- D. Malkhi, M. K. Reiter, D. Tulone, and E. Ziskind. Persistent objects in the fleet system. In DISCEX II, 2001.
- Dahlia Malkhi, Moni Naor, and David Ratajczak. Viceroy: A scalable and dynamic emulation of the butterfly. In Proc. of ACM PODC Symp., 2002.
- Petar Maymounkov and David Mazieres. Kademlia: A peer-to-peer information system based on the XOR metric. In Proc. of IPTPS, 2002.
- D. Mazieres. A toolkit for user-level file systems. In Proc. of USENIX Summer Technical Conf., June 2001.
- R. Merkle. A digital signature based on a conventional encryption function. In Proc. of CRYPTO, pages 369– 378. Springer-Verlag, 1988.
- A. Muthitacharoen, R. Morris, T. Gil, and B. Chen. Ivy: A read/write peer-to-peer file system. In Proc. of OSDI, 2002.
- T. Rabin. A simplified approach to threshold and proactive RSA. In Proceedings of Crypto, 1998.
- S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Schenker. A scalable content-addressable network. In Proceedings of SIGCOMM. ACM, August 2001.
- S. Rhea and J. Kubiatowicz. Probabilistic location and routing. In Proc. of INFOCOM. IEEE, June 2002.
- S. Rhea, T. Roscoe, and J. Kubiatowicz. DHTs need application-driven benchmarks. In Proc. of IPTPS, 2003.
- S. Rhea, C. Wells, P. Eaton, D. Geels, B. Zhao, H. Weatherspoon, and J. Kubiatowicz. Maintenance free global storage in oceanstore. In Proc. of IEEE Internet Computing. IEEE, September 2001.
- A. Rowstron and P. Druschel. Pastry: Scalable, distributed object location and routing for large scale peerto-peer systems. In Proc. of IFIP/ACM Middleware, November 2001.
- A. Rowstron and P. Druschel. Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility. In Proc. of ACM SOSP, 2001.
- Y. Saito, C. Karamanolis, M. Karlsson, and M. Mahalingam. Taming aggressive replication in the pangaea wide-area file system. In Proc. of OSDI, 2002.
- D. Santry, M. Feeley, N. Hutchinson, A. Veitch, R. Carton, and J. Ofir. Deciding when to forget in the Elephant file system. In Proc. of ACM SOSP, December 1999.
- M. Satyanarayanan. Scalable, secure, and highly available distributed file access. IEEE Computer, 23(5), May 1990.
- V. Shoup. Practical threshold signatures. In Proc. of EUROCRYPT, 2000.
- I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of SIGCOMM. ACM, August 2001.
- M. Stonebraker. The design of the Postgres storage system. In Proc. of Intl. Conf. on VLDB, September 1987.
- H. Weatherspoon and J. Kubiatowicz. Efficient heartbeats and repair of softstate in decentralized object location and routing systems. In Proc. of SIGOPS European Workshop, 2002.
- H. Weatherspoon and J. Kubiatowicz. Erasure coding vs. replication: A quantitative comparison. In Proc. of IPTPS, March 2002.
- H. Weatherspoon, T. Moscovitz, and J. Kubiatowicz. Introspective failure analysis: Avoiding correlated failures in peer-to-peer systems. In Proc. of International Workshop on Reliable Peer-to-Peer Distributed Systems, October 2002.
- H. Weatherspoon, C. Wells, and J. Kubiatowicz. Naming and integrity: Self-verifying data in peer-to-peer systems. In Proc. of International Workshop on Future Directions of Distributed Systems, 2002.
- M. Welsh, D. Culler, and E. Brewer. SEDA: An architecture for well-conditioned, scalable internet services. In Proc. of ACM SOSP, October 2001.
- T. Wu, M. Malkin, and D. Boneh. Building intrusiontolerant applications. In Proc. of USENIX Security Symp., August 1999.
- J. Wylie, M. Bigrigg, J. Strunk, G. Ganger, H. Kiliccote, and P. Khosla. Survivable information storage systems. IEEE Computer, 33(8):61–68, August 2000.
- L. Zhou, F. Schneider, and R. van Renesse. Coca: A secure distributed on-line certification authority. Technical Report 2000-1828, Department of Computer Science, Cornell University, Ithaca, NY USA, 2000.