Apache Pig In Hadoop


Pig is founded by Apache Software Foundation is one of a component of Hadoop built on top of HDFS.

Apache Pig is using Hadoop to focus more on analyzing large data sets with less time complexity having to write mapper and reducer programs. The Apache Pig programming language is designed to handle any kind of data.

Pig is made of two components are Pig Latin and another one is run time environment  Pig Latin programs are executed.

It is analyzing large data sets that consist of a high-level language for expressing data analysis programs.

Pig Latin:

Pig Latin is a high-level programming language provided by Pig. It can be used in any framework including Hadoop and Java is not required but it contains all Data processing features like group by, joins, order by

Pig Execution modes:

1.Local mode

Input: LFS Path

Output: LFS Path

2.HDFS mode

Input: HDFS Path

Output: HDFS Path

Data Types in Pig

Simple Types:

int, long, float, double, Boolean, char array, byte array etc.

Complex Types:

bag, tuple, field, map etc.

When to use MapReduce and Pig in Real-time projects:

In the below use case scenario MapReduce only more recommended Pig

1.Unstructured data processing

2.When we are aiming at high and performance.

3. For some hierarchical and job processing involves.

Running Pig Programs

1.Grunt SHELL:

Is an interactive shell which is the default mode of Pig execution. That is whether the output is success or failure we will come to know the result the there itself.

2.Script Mode:

Instead of writing each command with grunt shell, we can write a bunch of Pig commands in a single file and only executing that script alone.


If we are not achieving desired functionality by using the predefined transformation of Pig, we can generally go head with Pig UDF’s.

Leave a Reply

Your email address will not be published. Required fields are marked *