Bigdata Solutions:

1.NoSQL – database(Non relational database) – Only for structured and semi-structured

2. Hadoop – Implementation – structured,semi-structured and unstructured data

3.Hadoop eco-systems and its components for everything.

Hadoop:

Hadoop is a parallel system for large data storage and processing. It is a solution for Bigdata.

For Storage purpose HDFS -Hadoop Distributed File System

For Processing purpose MapReduce using simply.

In Hadoop, some keywords are very important for learning scope.

Hadoop Basic Terminology:

1.Cluster

2.Clustered Node

3.Hadoop Clustered Node

4.Hadoop cluster

5. Hadoop Cluster Size

1.Cluster:

A cluster is a group of all nodes belongs to one common network is called a cluster.

2.Clustered Node:

A Clustered Node is a grouping of all individual machines is called a clustered node in Hadoop

3.Hadoop Cluster Node:

A Hadoop Cluster Node is basic storage and processing purpose of a cluster is called as Hadoop Cluster Node.

For storage purpose, we are using the Hadoop Distributed File System.

For processing purpose, we are using MapReduce

4.Hadoop Cluster:

A Hadoop Cluster is a collection of “Hadoop Cluster Node” in a common network is called Hadoop Cluster

5.Hadoop Cluster Size:

A Hadoop cluster size is a total no.of node in a Hadoop cluster.

Hadoop Ecosystem:

1. Apache Pig – Processing – Pig Scripting

2. Hive – Processing – HiveQL (Query language like SQL)

3.SQOOP – Integration tool – Import and Export data

4.Zookeeper – Coordination – Distribution coordinator

5.Apache Flume – Streaming – log data for streaming purpose

6.Oozie – Scheduling – Open source scheduling jobs

7.HBase – Random Access – Hadoop+dataBASE

8.NoSQL – NotOnlySql – MongoDB, Cassandra

9.Apache Kafka – Messaging – Distributed messaging

10.YARN – Resource Manager – Yet Another Resource Negotiator

Note: Apache Spark is not a part of Hadoop but including nowadays. It is used for Data Processing purpose. Spark 100 times faster than Hadoop MapReduce.

Compatible Operating System for Hadoop Installation:

1. Linux

2.Mac OS

3.Sun Solaris

4.Windows.

Hadoop Versions:

Hadoop 1.x

Hadoop 2.x

Hadoop 3.x

Different Distributions of Hadoop

1. Cloudera Distribution for Hadoop (CDH)

2.Hortonworks

3.MapR

Tag: oozie

Basics of BigData and Hadoop for beginners

Hadoop:

Hadoop Basic Terminology:

Hadoop Ecosystem:

Hadoop Versions:

Different Distributions of Hadoop