Hadoop Cluster VS Storm Cluster

Hadoop Cluster :

Hadoop is an open source framework for storage and processing in distributed systems on large clusters of commodity hardware.

1. In Hadoop we run Map-Reduce jobs

2. Map Reduce job starts processes and ends eventually

3. Master Node runs a daemon called Job Tracker

4. Hadoop Map Reduce job runs 2 tasks in a job 1 mapper and 1 reducer task. This is restricting you can only have 2 tasks and they have to be only mapper and reducer only.

Storm Cluster :

Apache Storm is an open source distributed real-time computation system. It works on the continuous stream of data instead of stored data in a persistent storage system. It is also a framework for interacts with a running application

1. In storm, we run Statistics.

2. Topology once started is intended to keep on processing live data forever which it keeps on getting from data sources like zmq, Kafka, etc.

3. Master Node runs a daemon called Nimbus

4. Storm, in contrast, runs 2 task Spouts and Bolts in a topology spout will act as data received from external sources and creator of Streams for bolts for actual processing. Bolts can be chained serially or in parallel depending on what kind of processing we want to do.

Components of Storm:

Topology

Topology is a network of spouts and bolts. It is analogous to a Map Reduce Job in Hadoop. It is a graph of computation consisting of spouts and bolts.

Spouts as data stream source tasks and Bolts as actual processing tasks. Each node in the graph contains some processing logic and links. In topology is submits the topology.

When a topology is submitted to a Storm Cluster, Nimbus service on master node consults the supervisor services on different worker nodes and submits the topology. Each supervisor, creates one or more worker processes each having its own separate jvm per each one.