The Hadoop Distributed File System

Konstantin Shvachko,Hairong Kuang,Sanjay Radia,Robert Chansler

MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)（2010）

引用 7228|浏览6

暂无评分

摘要

The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. By distributing storage and computation across many servers, the resource can grow with demand while remaining economical at every size. We describe the architecture of HDFS and report on experience using HDFS to manage 25 petabytes of enterprise data at Yahoo!.

查看译文

关键词

large data,large cluster,user application,enterprise data,file system,high bandwidth,user application task,pipelines,layout,clustering algorithms,writing,data storage,servers,distributed databases,concurrent computing,distributed storage,computer architecture,bandwidth,file servers,protocols,distributed file system,distributed computing,internet

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要