Most frequently asked HBase Interview Questions and Answers





1. When should you HBase and What are the Key components of HBase in Hadoop eco-system?

In Hadoop eco-system, HBase should be used in the big data application has a variable schema in data is stored in the form of collections the applications should be demand key-based access and retrieving data. Region Server is monitors the Region and HBase Master is responsible for monitoring the region server simply.
Zookeeper takes care of the coordination and configuration between the HBase Master component and the client. Catalog Tables are two catalog tables is ROOT and META.ROOT.

2. What are the different operational commands in HBase at a record level and table level?
One is Record level  – put, get, increment, scan and delete.
The second one is Table level – describing, list, drop disable and scan.

3. Explain the difference between RDBMS data model and HBase data mode in Big Data environment?

A. In Big Data environment RDBMS is a schema-based database model
B.HBase is a schemaless database model
C.RDBMS doesn’t have support for in-built partitioning in Data modeling
D.HBase there is automated partitioning in Data modeling




4. What is the difference between HBase and Hive in Hadoop?

HBase and Hive both are different Hadoop based technologies. Whereas Hive is Data summarization on top of Hadoop. HBase is a NoSQL key-value store that runs on top Hadoop

HBase supports 4 primary operations like put, get, scan and delete. whereas Hive helps for SQL to run MapReduce job.

5. What are different types of tombstone markers in HBase for deletion?
In HBase, three types of tombstone markers are there for deletion

A. Family Delete Marker B. Version Delete Marker C. Column Delete Marker.
6. Explain the process of row deletion in HBase on top of Hadoop?

In HBase, the deleted command is not actually deleted from the cells but rather the cells are made invisible by setting up a tombstone marker.

How to Install HBASE on Hadoop Eco – System

HBASE:  Hadoop dataBASE

Apache HBase runs on top of Hadoop. It is a Database which is an open source, distributed, NoSQL database related. Hadoop can perform on batch processing and data will access only in a sequential manner leading with low latency but HBase internally uses Hash tables and provides random access, and stores the data in HDFS files that are indexed by their key for faster lookups thus providing high latency compared to Hadoop HDFS.



HBASE Installation on Ubuntu/Linux

Step 1: Download HBase 1.2.7. bin. tar. gz tarball from Apache Mirrors Website

http://archive.apache.org/dist/hbase/1.2.7/

Step 2: After Downloading tarball place the Downloaded Tar Ball in “Hadoop” director(Path: /home/sreekanth/Big_Data)

Step 3: Extract the Downloaded tar ball using below command:

tar -xzvf HBase-1.2.7-bin.tar.gz

Step 4: After Tar ball extraction we will get the hbase-1.2.7 directory

Step 5:  After that Update the HBASE_HOME and HBASE PATH Variables in the bashrc file using below command

nano ~/.bashrc

Step 6: Give HBASE_HOME & PATH details below like this:

export HBASE_HOME = / home / sreekanth / Big_Data / HBase-1.2.7

export PATH = $ PATH : $ HBASE_HOME / bin

Step 7:  Check Bashrc changes are using below command

echo $HBASE_HOME

Above command is not working due to not  update within the terminal so we need for the new terminal

Step 8:  Open new terminal then checks update bashrc using below command  :

echo $HBASE_HOME

Step 9: To Install HBase in clustered mode we have to place below properties in conf/ hbase – site.xml file in between configuration tag

Give hbase properties like name and value (directory etc info) below like this

Step 10: We  have to add these properties the end of file in hbase – env.sh for region servers




export JAVA_HOME= /usr/lib/jvm/java-8-openjdk-amd64

export HBASE_REGIONSERVERS = $ {HBASE_HOME}/conf/regionservers

export HBASE_MANAGES_ZK= true

Step 11: First start all daemons then start hbase by using below command:

start-hbase.sh

Step 12:  To Access HMaster in Web UI using default port 16010

http : // <<  hostname Of HMaster >> : 16010

When not use use HBASE

When we need to handle transactions and relational analytics.

When Applications need data to be rolled up aggregated or analyzed across now.

HBASE in Hadoop

HBASE – Hadoop dataBASE 





Apache HBase runs on top of Hadoop. It is a Database which is an open source, distributed, NoSQL database related.

Hadoop can perform on batch processing and data will access only in sequential manner leading with low latency but HBASE internally uses Hash tables and provices random access, and stores the data in HDFS files that are indexed by their key for faster lookups thus providing high latency comapred to Hadoop HDFS.

Here Comparison HBASE and RDBMS

Some more points for comparison of HBASE and RDBMS :

HBASE doesnot directly supports JOINS or Aggregations and it can handle large amount of gdata

RDMSB supports JOINS, Aggregations using SQL and it can handle limited amount of data at a time.

The comparison of HBASE and HDFS :

HBase is a distributed, column oriented database and stores data in key,value pairs. In HDFS is a distributed file system and stores data in the form of flat files.

HBase random reads and writes but HDFS sequential file access random writes are not possible as it allows to write once and read many times.

HBase is a suitable for Low latency operations by providing access to the specific row data from the big volume. In HDFS suitable for High Latency operations through batch processing.

The comparison of HBASE and NoSQL:

HBase is a NoSQL database, data stores in <key,value> pair. In NoSQL by default, Value is stored in key, value pair.




HBase is a Horizontal scalability, No SQL also Horizontal Scalability.

HBase uses MapReduce for processing data but in NoSQL can perform basic CRUD operations. Complex aggregations are tough to handle so we need to integrate with solutions like Hadoop for complex processing.

HBase Maste Slave model to address parallel processing

HBase may permit two types of access: random access of rows through their row keys and offline or batch access through map-reduce queries. In NoSQL Random access of data is possible