1.NoSQL – database(Non relational database) – Only for structured and semi-structured
2. Hadoop – Implementation – structured,semi-structured and unstructured data
3.Hadoop eco-systems and its components for everything.
Hadoop is a parallel system for large data storage and processing. It is a solution for Bigdata.
For Storage purpose HDFS -Hadoop Distributed File System
For Processing purpose MapReduce using simply.
In Hadoop, some keywords are very important for learning scope.
Hadoop Basic Terminology:
3.Hadoop Clustered Node
5. Hadoop Cluster Size
A cluster is a group of all nodes belongs to one common network is called a cluster.
A Clustered Node is a grouping of all individual machines is called a clustered node in Hadoop
3.Hadoop Cluster Node:
A Hadoop Cluster Node is basic storage and processing purpose of a cluster is called as Hadoop Cluster Node.
For storage purpose, we are using the Hadoop Distributed File System.
For processing purpose, we are using MapReduce
A Hadoop Cluster is a collection of “Hadoop Cluster Node” in a common network is called Hadoop Cluster
5.Hadoop Cluster Size:
A Hadoop cluster size is a total no.of node in a Hadoop cluster.
1. Apache Pig – Processing – Pig Scripting
2. Hive – Processing – HiveQL (Query language like SQL)
3.SQOOP – Integration tool – Import and Export data
4.Zookeeper – Coordination – Distribution coordinator
5.Apache Flume – Streaming – log data for streaming purpose
6.Oozie – Scheduling – Open source scheduling jobs
7.HBase – Random Access – Hadoop+dataBASE
8.NoSQL – NotOnlySql – MongoDB, Cassandra
9.Apache Kafka – Messaging – Distributed messaging
10.YARN – Resource Manager – Yet Another Resource Negotiator
Note: Apache Spark is not a part of Hadoop but including nowadays. It is used for Data Processing purpose. Spark 100 times faster than Hadoop MapReduce.
Compatible Operating System for Hadoop Installation:
Different Distributions of Hadoop
1. Cloudera Distribution for Hadoop (CDH)