What is Storm?
Apache Storm is distributed framework for real time processing of Big Data like Hadoop is a distributed framework for batch processing.
Advantages of Storm:
Fault Tolerance – where if worker threads die or a node goes down the worker s are automatically restarted.
Scalability – Where throughput rates throughput of even one million 100 byte messages per second per node can be achieved and ease of use in deploying and operating the system.
Architecture of Storm:
Apache Storm does not have its own state managing capabilities. Instead of uses Apache Zookeeper to manage the Cluster state all coordination between Nimbus and the Supervisors such as message acknowledgments, processing status, etc is done through a Zookeeper Cluster. Nimbus daemon and Supervisor daemons are stateless; all state is kept in Zookeeper or on the local disk.
Storm makes use of zeromq library for interprocess communication between different worker processes but after it was adopted as an Apache, storm developers replaced zeromq with Netty.
Explanation of the Components :
Nimbus is a master node of Storm Cluster. All other nodes in the cluster are called as worker nodes. The master node is responsible for distributing data among all the worker nodes, assign tasks to worker nodes and monitoring failures.
The nodes that follow instructions given by the nimbus are called as Supervisors. A supervisor has multiple worker processes and it governs worker processes to complete the task assigned by the nimbus.
A worker process will execute tasks related to a specific topology. A worker process will not run a task by itself, instead, it creates executors and asks them to be to perform a particular task. A worker process will have multiple executors.
An executor is nothing but a single thread by a worker process and it runs one or more tasks but only for a specific spout or bolt.
Here task performs actual data processing