What are the Different Hadoop Components in Hadoop Eco-System
HDFS – Filesystem of Hadoop ( Hadoop Distributed File System)
MapReduce – Processing of Large Datasets
HBase – Database (Hadoop+dataBase)
Apache Oozie – Workflow Scheduler
Apache Mahout – Machine learning and Data mining
Apache Hue – Hadoop user interface, Browser for HDFS, HBase, Query editors for Hive, etc.
Flume – To integrate other data source
Sqoop – Export / Import data from RDBMS to HDFS and HDFS to RDBMS
What is HDFS?
HDFS (Hadoop Distributed File System) is a filesystem that can store very large data sets by scaling out across a cluster of hosts.
What is Map Reduce?
MapReduce is a programming model and it is implemented for processing and generating large data sets. It specifies a map function that process a (key, value) pair to generate a set of intermediate(Key, Value) pairs.
What is Hive?
A data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis.
What is Pig?
Pig is an analyzing large data sets that consist of a high-level (scripting) language for expressing data analysis programs.
What is Flume?
Flume is on top of Hadoop applications, we need to get data from the source into HDFS.
What is Sqoop?
Apache Sqoop is a tool designed for transferring bulk data between Hadoop and structured data stores it means that Export / Import data from RDBMS to HDFS vice versa.
What is HBase?
HBase ( Hadoop + dataBase) is a column-oriented store database layered on top of HDFS.
What is NoSQL database?
NoSQL means that Not Only SQL using traditional relational Data Base Management System.