Basically, Spark and Hadoop Developer certification (Cloudera Certification175) is a purely hands-on test and needs to the real-time experience. CCA 175 based on Sqoop export/import, data ingestion, and Spark transformations. Here is provide complete Sqoop import/export commands.
Mandatory Skills:
Data Ingest, Transform:
- Import/Export data a MySQL DB into HDFS using Sqoop
- Data Ingest real-time streaming data into HDFS using Sqoop
- Load data from HDFS and storing vice versa to HDFS using Spark
Sqoop Commands:
First, we created the Sqoop directory for data ingestion and validation.
$ hadoop fs -mkdir /user/cloudera/sqoop_import
How to find out Sqoop available commands on your Linux box
$ sqoop help
Available Commands:
codegen create-hive-table eval help import import-all-tables import-mainframe job list-databases list-tables merge metastore version
How to find out the list of databases in Sqoop:
sqoop list-databases \ --connect jdbc:mysql://localhost:3306 \dataingest --username sqoopusername --password sqooppassword
How to find out the list of tables in Sqoop:
sqoop list-tables \ --connect jdbc:mysql://localhost:3306 \dataingest --username sqoopusername --password sqooppassword
How to evaluate the SQL(MySQL) statements in Sqoop:
sqoop eval \ --connect jdbc:mysql://localhost:3306 \dataingest --username sqoopusername --password sqooppassword --quer "select * from table_name"
List of Sqoop import commands:
How to change the Hive (Import data) as an Avro data file in Sqoop:
sqoop import-all-tables \ --connect jdbc:mysql://localhost:3306 \dataingest --username sqoopusername --password sqooppassword --as-avrodatafile \ --warehouse-dir =/user/hive/warehouse/custemoer.db
How to change the Import data as text file format into the target directory in Sqoop:
sqoop import \ --connect jdbc:mysql://localhost:3306 \dataingest --username sqoopusername --password sqooppassword --as-textfile \ --target-dir =/user/Sqoop/customer
How to change the Import data file as Parquet file format into the target directory in Sqoop:
sqoop import \ --connect jdbc:mysql://localhost:3306 \dataingest --username sqoopusername --password sqooppassword --as-parquet \ --target-dir =/user/Sqoop/customer
How to import the data in the table in the target directory in Sqoop
sqoop import \ --connect jdbc:mysql://localhost:3306 \dataingest --username sqoopusername --password sqooppassword --table-dir \ --target-dir =/user/Sqoop/customer
How to import table with querying and split in Sqoop import by the command in Sqoop:
sqoop import \ --connect jdbc:mysql://localhost:3306 \dataingest --username sqoopusername --password sqooppassword --query = "select * from table where cutomer_id ='123' " --table-dir \ --target-dir =/user/Sqoop/customer --split-by customer_id \ --num-mappers default
How to import table with copy into an existing table or append in Sqoop:
sqoop import \ --connect jdbc:mysql://localhost:3306 \dataingest --username sqoopusername --password sqooppassword --table department --table-dir \ --target-dir =/user/hive/warehouse/customer.db/departments --append\ --fields-terminated-by ',' \ --lines-terminated-by '\n'\ --num-mappers 4
How to import table without a primary key using split by
sqoop import \ --connect jdbc:mysql://localhost:3306 \dataingest --username sqoopusername --password sqooppassword --table department --table-dir \ --target-dir =/user/hive/warehouse/customer.db/departments --append\ --fields-terminated-by ',' \ --lines-terminated-by '\n'\ --split-by department_id --num-mappers 4
How to import table with incremental load in Sqoop:
sqoop import \ --connect jdbc:mysql://localhost:3306 \dataingest --username sqoopusername --password sqooppassword --table department --table-dir \ --target-dir =/user/hive/warehouse/customer.db/departments --append\ --fields-terminated-by ',' \ --lines-terminated-by '\n'\ --check-column "customer_id" --incremental append\ --last-value 7\ --split-by department_id --num-mappers default
How to create a Sqoop job using Sqoop import command:
sqoop job --create sqoop_job \ import \ --connect jdbc:mysql://localhost:3306 \dataingest --username sqoopusername --password sqooppassword --table department --table-dir \ --target-dir =/user/hive/warehouse/customer.db/departments --append\ --fields-terminated-by ',' \ --lines-terminated-by '\n'\ --check-column "customer_id" --incremental append\ --last-value default\ --split-by department_id \ --num-mappers default
How to find out Sqoop obs list:
sqoop job --list
How to show the Sqoop job using Sqoop command:
sqoop job --show sqoop_job
How to execute the Sqoop job using Sqoop command:
sqoop job --exec sqoop_job
How to create a Hive table in Sqoop with the command:
sqoop import \ --connect jdbc:mysql://localhost:3306 \dataingest --username sqoopusername --password sqooppassword --table department --fields-terminated-by ',' \ --lines-terminated-by '\n'\ --hive-home /user/hive/warehouse\ --hive import\ --hive-table customer_test\ --create-hive-table\ --num-mappers default
How to connect MySQL and create a database for the reporting database:
--Connect to mysql and create database for reporting database --user: root, password:root mysql -u root -p create database customer_reporting_db; grant all on customer_reoporting_db.* to sqoopusername flush privileges; use ustomer_reoporting_db; create table customers as select* from sqoopusername.customers where 5=7; exit;