Spark Context is a class defined in the Spark library and main entry point into the Spark library. Spark Context will run in a program called “Driver Program” is the main program in Spark.
Spark application must create an instance of the Spark Context class.
An Application can have only one active instance of Spark context. An instance of the Spark Context can be created as below:
val sc=new SparkContext()
Here SparkContext gets configuration settings like the address of the Spark master, application name, and other settings from system properties.
val config =new SparkConf().setMaster(“localhost:port”).setAppName(“Spark”)
val sc=new SparkContext(config)
2. Cluster Manager:
It allocates resources across the application of cluster to run on a cluster, the Spark Context connected several ways of Cluster managers
Spark acquires executors on nodes in the cluster, which are processes that run computations and store data of application then the cluster manager sends your application code to the executors.
Different Cluster Managers in Spark Architecture:
In Spark Architecture there are 3 types of Cluster Managers:
A) Standalone Mode
A) Standalone Mode:
Spark’s Default cluster environment it is the easiest way to run your Spark applications in a clustered environment. Here mostly Spark Master is a resource manager for the Standalone and Spark Worker is a worker in Spark standalone mode. In this mode, Spark allocates resources based on cores.
Apache Mesos is a general purpose cluster manager that can run both analytics workloads and log running services on a cluster.
YARN is a cluster manager introduced in Hadoop 2.x that allows drivers data processing frameworks to run on a shared resources pool and is typically installed on the same nodes as the HDFS.
Running Spark on YARN in these environments is useful because it lets Spark access HDFS data quickly, on the same nodes where the data is stored. To use Spark on Hadoop YARN.