Hadoop Distributed File System – HDFS in Hadoop

HDFS – Hadoop Distributed File System is the primary storage system used by Hadoop application. HDFS is a Distributed File System that provides high-performance access to data across on Hadoop Clusters. When HDFS takes in data, it breaks the information into smaller parts called blocks.  Allowing for parallel processing.




HDFS is built to support the application with large data sets, including individual files.

It uses Master/Slave architecture

 

Features of HDFS:

A)Fault Tolerance

In HDFS refers to the working strength of a system in unfavorable conditions and how that system can handle such related situations. HDFS is highly faulted tolerant, in HDFS that data is divided into blocks and multiple copies of blocks are created on different machines in the cluster.

B)High Availability

HDFS is a high availability file system, data gets replicated among the nodes in the HDFS cluster by creating a replica of the blocks on the other slaves present in HDFS cluster. When your node failure, the user can access their data from other nodes. Because duplicate copies of blocks which contain user data are created on the other nodes present in the HDFS cluster.

C)Replication

In HDFS Data Replication is the most important and unique feature of HDFS. In HDFS replication of data is done to solve the problem of data loss in unfavorable conditions like crashing of a node or hardware failure etc. Data replicated across a number of machines in the cluster by creating blocks.




D)Reliability

HDFS is a distributed file system which provides reliable data storage. HDFS can store data in the range of 100s of petabytes. It stores data reliably on the cluster on nodes. HDFS divides the data into blocks these blocks are stored on nodes present in HDFS cluster. It stores data reliable by creating a replica of each and every block present on the nodes present in the cluster and hence provide fault tolerance.

E)Distributed Storage

In HDFS all the features are achieved by via distributed storage and replication. In HDFS data in stored in distributed wisely across the nodes in HDFS Cluster.