Bigdata Solutions:
1.NoSQL – database(Non relational database) – Only for structured and semi-structured
2. Hadoop – Implementation – structured,semi-structured and unstructured data
3.Hadoop eco-systems and its components for everything.
Hadoop:
Hadoop is a parallel system for large data storage and processing. It is a solution for Bigdata.
For Storage purpose HDFS -Hadoop Distributed File System
For Processing purpose MapReduce using simply.
In Hadoop, some keywords are very important for learning scope.
Hadoop Basic Terminology:
1.Cluster
2.Clustered Node
3.Hadoop Clustered Node
4.Hadoop cluster
5. Hadoop Cluster Size
1.Cluster:
A cluster is a group of all nodes belongs to one common network is called a cluster.
2.Clustered Node:
A Clustered Node is a grouping of all individual machines is called a clustered node in Hadoop
3.Hadoop Cluster Node:
A Hadoop Cluster Node is basic storage and processing purpose of a cluster is called as Hadoop Cluster Node.
For storage purpose, we are using the Hadoop Distributed File System.
For processing purpose, we are using MapReduce
4.Hadoop Cluster:
A Hadoop Cluster is a collection of “Hadoop Cluster Node” in a common network is called Hadoop Cluster
5.Hadoop Cluster Size:
A Hadoop cluster size is a total no.of node in a Hadoop cluster.
Hadoop Ecosystem:
1. Apache Pig – Processing – Pig Scripting
2. Hive – Processing – HiveQL (Query language like SQL)
3.SQOOP – Integration tool – Import and Export data
4.Zookeeper – Coordination – Distribution coordinator
5.Apache Flume – Streaming – log data for streaming purpose
6.Oozie – Scheduling – Open source scheduling jobs
7.HBase – Random Access – Hadoop+dataBASE
8.NoSQL – NotOnlySql – MongoDB, Cassandra
9.Apache Kafka – Messaging – Distributed messaging
10.YARN – Resource Manager – Yet Another Resource Negotiator
Note: Apache Spark is not a part of Hadoop but including nowadays. It is used for Data Processing purpose. Spark 100 times faster than Hadoop MapReduce.
Compatible Operating System for Hadoop Installation:
1. Linux
2.Mac OS
3.Sun Solaris
4.Windows.
Hadoop Versions:
Hadoop 1.x
Hadoop 2.x
Hadoop 3.x
Different Distributions of Hadoop
1. Cloudera Distribution for Hadoop (CDH)
2.Hortonworks
3.MapR