What are the different Hadoop Components and Definitions

What are the Different Hadoop Components in Hadoop Eco-System





HDFS – Filesystem of Hadoop ( Hadoop Distributed File System)
MapReduce – Processing of Large Datasets

HBase – Database (Hadoop+dataBase)

Apache Oozie – Workflow Scheduler

Apache Mahout – Machine learning and Data mining

Apache Hue – Hadoop user interface, Browser for HDFS, HBase, Query editors for Hive, etc.
Flume – To integrate other data source

Sqoop – Export / Import data from RDBMS to HDFS and HDFS to RDBMS

What is HDFS?

HDFS (Hadoop Distributed File System) is a filesystem that can store very large data sets by scaling out across a cluster of hosts.

What is Map Reduce?

MapReduce is a programming model and it is implemented for processing and generating large data sets. It specifies a map function that process a (key, value) pair to generate a set of intermediate(Key, Value) pairs.

What is Hive?




A data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis.

What isĀ  Pig?

Pig is an analyzing large data sets that consist of a high-level (scripting) language for expressing data analysis programs.

What is Flume?

Flume is on top of Hadoop applications, we need to get data from the source into HDFS.

What is Sqoop?

Apache Sqoop is a tool designed for transferring bulk data between Hadoop and structured data stores it means that Export / Import data from RDBMS to HDFS vice versa.

What is HBase?

HBase ( Hadoop + dataBase) is a column-oriented store database layered on top of HDFS.

What is NoSQL database?

NoSQL means that Not Only SQL using traditional relational Data Base Management System.