Apache SQOOP in Hadoop

Apache Sqoop:

Apache Sqoop is a tool designed to transfer data between Hadoop and relational databases. Mostly used for import/export data from RDBMS to HDFS vice versa. Sqoop works with relational databases such as Teradata, Oracle, MySQL etc.




Apache Sqoop is a tool designed for efficiently transferring bulk data between Hadoop.

Where Sqoop is used?

Developers feel the transferring of data between relational database systems and HDFS is not interesting, the interesting work starts after data is loaded into HDFS. They always write custom scripts to transfer data in and out of Hadoop.

In case of Map-Reduce programs needs to do similar jobs, the database server would experience very high load, for a large number of concurrent connections, while Map Reduce programs were running for performance issues.

Apache Sqoop makes this possible with a single command line mostly Sqoop uses MapReduce to import and export the data, which provides parallel operations as well as fault tolerance purpose.

What Sqoop Does?

1. Sqoop import sequential data sets from mainframe – the growing need to move data from the mainframe to HDFS.

2. Data import – moves certain data from external stores into Hadoop to optimize the cost-effectiveness of combined data storage and processing.

3. Fast Data copies –  from external systems into Hadoop

4. Parallel data transfer – faster performance and optimal system utilization

5. Load balancing – excessive storage and processing loads to other systems.

Apache Sqoop latest version:

Latest stable release 1.4.7

Sqoop Architecture:

Apache Sqoop command submitted by the end user is parsed by Sqoop and launches Hadoop Map Reduce the only job to import or export data and aggregations are needed. Sqoop just imports and exports the data. It does not do any aggregations. Map job launch multiple mappers depends on the number defined by the user in the command line. Each mapper creates a connection with the database using JDBC and fetches the part of data assigned.

SQOOP Architecture diagram:




Sqoop only for imports and exports the data.