Sqoop commands for Spark Certification (CCA 175)

Basically, Spark and Hadoop Developer certification (Cloudera Certification175) is a purely hands-on test and needs to the real-time experience. CCA 175 based on Sqoop export/import, data ingestion, and Spark transformations. Here is provide complete Sqoop import/export commands.



Mandatory Skills:

Data Ingest, Transform:

  • Import/Export data a MySQL DB into HDFS using Sqoop
  • Data Ingest real-time streaming data into HDFS using Sqoop
  • Load data from HDFS and storing vice versa to HDFS using Spark

 

Sqoop Commands:

First, we created the Sqoop directory for data ingestion and validation.

 $ hadoop fs -mkdir /user/cloudera/sqoop_import

How to find out Sqoop available commands on your Linux box

$ sqoop help

Available Commands:

codegen 
create-hive-table 
eval 
help
import
import-all-tables
import-mainframe
job
list-databases
list-tables
merge
metastore
version

How to find out the list of databases in Sqoop:

sqoop list-databases \  --connect jdbc:mysql://localhost:3306 \dataingest 
--username sqoopusername 
--password sqooppassword

How to find out the list of tables in Sqoop:

sqoop list-tables \  --connect jdbc:mysql://localhost:3306 \dataingest 
--username sqoopusername 
--password sqooppassword

How to evaluate the SQL(MySQL) statements in Sqoop:

sqoop eval \  --connect jdbc:mysql://localhost:3306 \dataingest --username sqoopusername --password sqooppassword
--quer "select * from table_name"

List of Sqoop import commands:
How to change the Hive (Import data) as an Avro data file in Sqoop:

sqoop import-all-tables \  --connect jdbc:mysql://localhost:3306 \dataingest --username sqoopusername --password sqooppassword 
--as-avrodatafile \ --warehouse-dir =/user/hive/warehouse/custemoer.db

How to change the Import data as text file format into the target directory in Sqoop:

sqoop import \  --connect jdbc:mysql://localhost:3306 \dataingest --username sqoopusername --password sqooppassword 
--as-textfile \ --target-dir =/user/Sqoop/customer

How to change the Import data file as Parquet file format into the target directory in Sqoop:



sqoop import \  --connect jdbc:mysql://localhost:3306 \dataingest --username sqoopusername --password sqooppassword --as-parquet \ --target-dir =/user/Sqoop/customer

How to import the data in the table in the target directory in Sqoop

sqoop import \  --connect jdbc:mysql://localhost:3306 \dataingest --username sqoopusername --password sqooppassword --table-dir \ --target-dir =/user/Sqoop/customer

How to import table with querying and split in Sqoop import by the command in Sqoop:

sqoop import \  --connect jdbc:mysql://localhost:3306 \dataingest --username sqoopusername --password sqooppassword 
--query = "select * from table where cutomer_id ='123' "
--table-dir \ --target-dir =/user/Sqoop/customer
--split-by customer_id \
--num-mappers default

How to import table with copy into an existing table or append in Sqoop:

sqoop import \  --connect jdbc:mysql://localhost:3306 \dataingest --username sqoopusername --password sqooppassword 
--table department
--table-dir \ --target-dir =/user/hive/warehouse/customer.db/departments
--append\
--fields-terminated-by ',' \
--lines-terminated-by '\n'\
--num-mappers 4

How to import table without a primary key using split by

sqoop import \  --connect jdbc:mysql://localhost:3306 \dataingest --username sqoopusername --password sqooppassword 
--table department
--table-dir \ --target-dir =/user/hive/warehouse/customer.db/departments
--append\
--fields-terminated-by ',' \
--lines-terminated-by '\n'\
--split-by department_id
--num-mappers 4

How to import table with incremental load in Sqoop:

sqoop import \  --connect jdbc:mysql://localhost:3306 \dataingest --username sqoopusername --password sqooppassword 
--table department
--table-dir \ --target-dir =/user/hive/warehouse/customer.db/departments
--append\
--fields-terminated-by ',' \
--lines-terminated-by '\n'\
--check-column "customer_id"
--incremental append\
--last-value 7\
--split-by department_id
--num-mappers default

How to create a Sqoop job using Sqoop import command:

sqoop job --create sqoop_job \
 import \  --connect jdbc:mysql://localhost:3306 \dataingest --username sqoopusername --password sqooppassword 
--table department
--table-dir \ --target-dir =/user/hive/warehouse/customer.db/departments
--append\
--fields-terminated-by ',' \
--lines-terminated-by '\n'\
--check-column "customer_id"
--incremental append\
--last-value default\
--split-by department_id \
--num-mappers default

How to find out Sqoop obs list:

sqoop job --list

How to show the Sqoop job using Sqoop command:

sqoop job --show sqoop_job

How to execute the Sqoop job using Sqoop command:

sqoop job --exec sqoop_job

How to create a Hive table in Sqoop with the command:

sqoop import \  --connect jdbc:mysql://localhost:3306 \dataingest --username sqoopusername --password sqooppassword 
--table department
--fields-terminated-by ',' \
--lines-terminated-by '\n'\
--hive-home /user/hive/warehouse\
--hive import\
--hive-table customer_test\
--create-hive-table\
--num-mappers default

How to connect MySQL and create a database for the reporting database:

--Connect to mysql and create database for reporting database 
--user: root, password:root
mysql -u root -p
create database customer_reporting_db;
grant all on customer_reoporting_db.* to sqoopusername
flush privileges;
use ustomer_reoporting_db;
create table customers as select* from sqoopusername.customers where 5=7;
exit;