Pig is founded by Apache Software Foundation is one of a component of Hadoop built on top of HDFS.
Apache Pig is using Hadoop to focus more on analyzing large data sets with less time complexity having to write mapper and reducer programs. The Apache Pig programming language is designed to handle any kind of data.
Pig is made of two components are Pig Latin and another one is run time environment Pig Latin programs are executed.
It is analyzing large data sets that consist of a high-level language for expressing data analysis programs.
Pig Latin is a high-level programming language provided by Pig. It can be used in any framework including Hadoop and Java is not required but it contains all Data processing features like group by, joins, order by
Pig Execution modes:
Input: LFS Path
Output: LFS Path
Input: HDFS Path
Output: HDFS Path
Data Types in Pig
int, long, float, double, Boolean, char array, byte array etc.
bag, tuple, field, map etc.
When to use MapReduce and Pig in Real-time projects:
In the below use case scenario MapReduce only more recommended Pig
1.Unstructured data processing
2.When we are aiming at high and performance.
3. For some hierarchical and job processing involves.
Running Pig Programs
Is an interactive shell which is the default mode of Pig execution. That is whether the output is success or failure we will come to know the result the there itself.
Instead of writing each command with grunt shell, we can write a bunch of Pig commands in a single file and only executing that script alone.
If we are not achieving desired functionality by using the predefined transformation of Pig, we can generally go head with Pig UDF’s.