PIG:
Pig is founded by Apache Software Foundation is one of a component of Hadoop built on top of HDFS.
Apache Pig is using Hadoop to focus more on analyzing large data sets with less time complexity having to write mapper and reducer programs. The Apache Pig programming language is designed to handle any kind of data.
Pig is made of two components are Pig Latin and another one is run time environment Pig Latin programs are executed.
It is analyzing large data sets that consist of a high-level language for expressing data analysis programs.
Pig Latin:
Pig Latin is a high-level programming language provided by Pig. It can be used in any framework including Hadoop and Java is not required but it contains all Data processing features like group by, joins, order by
Pig Execution modes:
1.Local mode
Input: LFS Path
Output: LFS Path
2.HDFS mode
Input: HDFS Path
Output: HDFS Path
Data Types in Pig
Simple Types:
int, long, float, double, Boolean, char array, byte array etc.
Complex Types:
bag, tuple, field, map etc.
When to use MapReduce and Pig in Real-time projects:
In the below use case scenario MapReduce only more recommended Pig
1.Unstructured data processing
2.When we are aiming at high and performance.
3. For some hierarchical and job processing involves.
Running Pig Programs
1.Grunt SHELL:
Is an interactive shell which is the default mode of Pig execution. That is whether the output is success or failure we will come to know the result the there itself.
2.Script Mode:
Instead of writing each command with grunt shell, we can write a bunch of Pig commands in a single file and only executing that script alone.
3.Embedded:
If we are not achieving desired functionality by using the predefined transformation of Pig, we can generally go head with Pig UDF’s.