Spark Integration with Hive | Spark Integration with NoSQL




Spark integration with Hive in simple steps:

First, how to integrate with Spark and Hive in a Hadoop Cluster with below simple steps:
1. Copied Hive-site.xml file into $SPARK_HOME/conf Directory
(After copied hive-site XML file into Spark configuration path then Spark to get Hive Meta store information)
2.Copied Hdfs-site.xml file into $SPARK_HOME/conf Directory
(Here Spark to get HDFS Replication information from Hdfs-site.xml)
3.Copied core-site.xml file into $SPARK_HOME/conf directory
(Here Spark to get Hadoop Namenode information for storage purpose from core-site.xml)
First to create a database in the hive from Spark shell

scala> spark.sql("create database demo_db")
res14: org.apache.spark.sqlDataFrame = []

To create the table in Hive from Spark shell

scala > spark.sql ("create table demo_db.emp"(empid int, ename string, esal double, depnum int)
row format delimited fields terminated by ','
lines terminated by '\n' ")
HiveMetaStore : Location : hdfs :// localhost :8020/user /hive/warehouse/demo_db.emp specified for non external table : emo
res 16: org. apache. spark. sql.DataFrame = []

To load data in Hive table:

scala > spark.sql ("load data local inpath 'file:///home/hadoop/inputfiles/emp.avro' into table demo_db.emp")
res 23:org.apache.spark.sql.DataFrame = []
To read data into dataframe object from hive table:
Scala > var hiveDF = spark.sql ( "select * from demo_dp.emp"
scala > hiveDF.show

We will get the Employee Avro file with structured data-wise.




How to read Hive partitioned data:

scala > var df2 = spark.sql (" select * from demo_db.emp")
scala > df2.show

How to create a Hive table and write data frame object data into the table in orc format (dynamically):

scala > var df3 = df.write.mode ("overwrite") . format("orc").saveAsTable("demo_db_emp")
scala > df3.show

Spark integration with NoSQLCassandra) in the Hadoop cluster:

1.Download apache.Cassandra bin .tar file from Cassandra archives
2.Extracted the tarball by using below command:
tar -xzvf apache cassandra-bin.tar
3.After added $CASSANDRA_HOME in ~/.bashrc file
4.export CASANDRA_HOME = /home/hadoop/apache-cassandra
export PATH = $PATH : $CASSANDRA_HOME/bin
Then started the Cassandra server by using below commands:
cassandra -f => This terminal should not be closed or killed
5. Got into cassandra SQL shell by using below command:
> cqlsh
6. To check existing KEYSPACES (KEYSPACE = DATABASE)
cqlsh > describe KEYSPACES ==> SHOW DATABASES;
7. To create new KEYSPACE
cqlsh > CREATE KEYSPACE DEMO_DB WITH replication = { 'class' : 'SimpleStrategy' , 'replication_factor':3};
8. To use created KEYSPACE
cqlsh > use DEMO_DB;
9. To create a table in Cassandra
cqlsh > CREATE TABLE EMP(ID INT PRIMARY KEY, NAME TEXT, SAL INT);
10. To show tables
cqlsh> DESCRIBE TABLES;





Summary: Here we provided how to integrate Spark with Hive and Spark with NoSQL(Cassandra) with simple steps for BigData developers. Once installation of the Hadoop cluster we must and should install the Spark, Hive, Cassandra services. After once it is done we should integration with different services. Moreover, here we provide how to create the table in Spark with Hive and Cassandra.