What is Heartbeat in Hadoop? How to resolve Heartbeat lost in Cloudera and Hortonworks

Heartbeat in Hadoop:





In Hadoop, eco-system Heartbeat is an in-between Namenode and Datanode communication. It is the signal that is sent by the Datanode to Namenode after a regular interval. If Datanode in HDFS does not send a heartbeat to Namenode around 10 minutes by default then Namenode considers the Datanode is not available.

The default heartbeat interval is 3 seconds. Put in dfs.heartbeat.interval in a hdfs-site.xml file in Hadoop installation directory.

What is Heartbeat lost:

In Hadoop eco-system, the Datanode does not send a heartbeat to Namenode around 10 minutes by default. So, in this case, Namenode considers a Datanode is unavailable it is known as “Heartbeat lost”.

How to resolve Heartbeat lost:

In Bigdata distribution environment will take Hortonworks (HDP)In Hortonworks:
1. In HDP check Amabari agents status whether it is running or not by using” ambari-agent status ”
2. If it is not running then check with log files for Ambari server and Ambari agent as well as in the directory of /var/log/ambari-server and /var/log/ambari-agent.

3. Follow the below steps:

A) Stop ambari-server
B) Stop ambari-agent service on all nodes
C) Start ambari-agent service on all nodes
D) Start ambari-server server

Cloudera:

1. First Check the Cloudera scm agent status whether it is running or not by using” sudo service cloudera-scm-agent status ”





2.check the agent log files in this directory in /var/log/cloudera-scm-agent/

2. Then follow the below commands with root user

sudo service cloudera-scm-agent status
sudo service cloudera-scm-agent stop
sudo service cloudera-scm-agent start

Summary: Hadoop is following Master, Slave architecture. The master node stores the metadata and slave nodes stores the actual data. So while sending data communication between Namenode and Datanode is called as a “Heartbeat”. If it fails simply called as a “Heartbeat lost” it means that Datanode is unavailable.  To find resolution steps for Bigdata distributions like Hortonworks (HDP) and Cloudera (CDH) with step by step process for this issue.

Hadoop job (YARN Staging) error while executing simple job

In a Hadoop eco-system, no.of jobs are executing in a fraction of time in that time. I am trying to execute the Hive job for Data validation in Hive server in Production server. While executing a Hive job in the hive command line I got this type of error.



at org.apache.hadoop.ipc.Client.call(Client.java:1468)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1532)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1349)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588)
22:33:33 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging//.staging/job_1562044010976_0003
Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/hadoop-yarn/staging//.staging/job_1562044010976_0003/job.jar could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1549)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3200)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:641)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

The above error belongs to a connection error in Datanode while executing the code. At the time Datanode not running properly. so find below resolution for this issue:

Stop all services:

stop-all.sh
start-all.sh

Here restart all services including Namenode, Secondary Namenode, DataNodes and remaining services like Hive, Spark,
etc.

If still showing this type of error then start the distributed file system.

start-dfs.sh

Check all the Hadoop Daemons like Name node, Secondary Name node, Datanode, Resource Manager and Node Manager, etc. By using below command

jps

And then check All node information by using “hadoop dfsadmin -report ” for the status of the Datanode whether it is running fine or not.

Above steps for Local, Pseudo distributed,  and standalone mode only in Hadoop eco-system.

For Cloudera, Hortonworks, MapR distributions are simply “Restart” DataNodes and Services like Hive, Spark, etc.




Summary: In Big Data environment we executing so many jobs like Hadoop/Spark/Hive for the result but some times showing above error. At the time we stuck but here the simple solution for the above error

Most frequently Asked Hadoop Admin interview Questions for Experienced

Hadoop Admin interview Questions for Experienced

1.Difference between Missing and Corrupt blocks in Hadoop 2.0 and how to handle it?

Missing block: Missing block means that there are blocks with no replicas anywhere in the cluster.

Corrupt block: It means that HDFS cannot find any replica containing data and replicas are all corrupted.




How to Handle :
By using  below command will handle

To find out which file is corrupted and remove a file

A) hdfs fsck /
B)hdfs fsck / | grep -v '^\.+$' | grep -v eplica
C) hdfs fsck /path/to/corrupt/file -location -block -files
D)hdfs fs -rm /path/to/file/

2. What is the reason behind of an odd number of zookeepers count?

Because Zookeeper elects a master based opinion of more than half of nodes from the cluster. If even number of zookeepers is there difficult to elects master so zookeepers count should be an odd number.

3. Why Kafka is required for zookeeper?

Apache Kafka uses zookeeper, need to first start zookeeper server. Zookeeper elects the controller topic configuration.

4. What is the retention period of Kafka logs?

When a message sent to Kafka cluster appended to the end of logs. The message remains on the topic for a configurable period of time. In this period of time Kafka generates a log file, it called retention period of Kafka log.
It defines log.retention.hours 

5. What is block size in your cluster, why not recommended for 54 MB block?

Depends upon your cluster, because of Hadoop standard is 64 MB

6. For suppose if the file is 270 MB then block size is 128 MB on your cluster so how many blocks if 3 blocks are 128+!28+14MB so 3rd block 14MB is wasted or other data can be appended?

7. What are the FS image and Edit logs?

FS image: In a Hadoop cluster the entire file system namespace, file system properties and block of files are stored into one image it is called an FS image (File System image). And total information in Editlogs.

8. What is your action plan if your PostgreSQL or MySQL down on your cluster?

First, check with the log file, then go with what is an error and find out the solution

For example: If connectionBad Postgres SQL
Solution: First status Postgres SQL service

sudo systemctl status postgressql

Then stop the Postgres SQL service

sudo systemctl stop postgressql

Then provide pg_ctlcluster with the right user and permissions

sudo systemctl enable postgressql

9. If both name nodes are in stand by name node, then if jobs are running or failed?

10. What is the Ambari port number?

By Default Ambari port number is 8080 for access to Ambari web and the REST API.




11. Is your Kerberos cluster which one using LDAP or Active Directory?

Depends upon your project if LDAP integration or Active Directory and explain it.

Hadoop Architecture vs MapR Architecture





Basically, In BigData environment Hadoop is a major role for storage and processing. Coming to MapR is distribution to provide services to Eco-System. Hadoop architecture and MapR architecture have some of the difference in Storage level and Naming convention wise.

For example in In Hadoop single storage unit is called Block. But in MapR it is called Container.

Hadoop VS MapR

Coming to Architecture wise somehow the differences in both:
In Hadoop Architecture based on the Master Node (Name node) and Slave (Data Node) Concept. For Storage purpose using HDFS and Processing for MapReduce.




In MapR Architecture is Native approach it means that SAN, NAS or HDFS approaches to store the metadata. It will directly approach to SAN  no need to JVM. Sometimes Hypervisor, Virtual machines are crashed then data directly pushed into HardDisk it means that if a server goes down the entire cluster re-syncs the data node’s data. MapR has its own filesystem called MapR File System for storage purpose. For processing using MapReduce in background.

There is no Name node concept in MapR Architecture. It completely on CLDB ( Container Location Data Base). CLDB contains a lot of information about the cluster. CLDB  installed one or more nodes for high availability.

It is very useful for failover mechanism to recovery time in just a few seconds.

In Hadoop Architecture Cluster Size will mention for Master and Slave machine nodes but in MapR CLDB default size is 32GB in a cluster.




 

In Hadoop Architecture:

NameNode
Blocksize
Replication

 

In MapR Architecture:

Container Location DataBase
Containers
Mirrors

Summary: The MapR Architecture is entirely on the same architecture of Apache Hadoop including all the core components distribution. In BigData environment have different types of distributions like Cloudera, Hortonworks. But coming to MapR is Enterprise edition. MapR is a stable distribution compare to remaining all. And provide default security for all services.

Hadoop Admin Vs Hadoop Developer

Basically in Hadoop environment Hadoop Admin and Hadoop Developer major roles according to present IT market survey Admin has more responsibilities and salaries compared to Hadoop developers. But we can differentiate below-mentioned points:



Hadoop Developer:

  1. In Big Data environment Hadoop is a major role, especially in Hadoop developers. A developer primarily responsible for Coding in Hadoop developer also the same kind of thing here developing like:

A)Apache Spark – Scala, Python, Java, etc.

B) Map Reduce – Java

C)Apache Hive  – HiveQL (Query Language & SQL)

D) Apache Pig  – Pig Scripting language etc.

2. Familiarity with ETL backgrounds for data loading and ingestion tools like:

A)Flume

B)Sqoop

3. Bit of knowledge on Hadoop admin part also like Linux environment and some of the basic commands while developing and executing.

4. Nowadays most preferably Spark & Hive developers with high-level experience and huge packages.

2.Hadoop Administration:

1. Coming to Hadoop Administration is a good and respectable job in the IT industry. Whereas, admin is responsible for performing the operational tasks to keep the infrastructure and running jobs.

2. Strong knowledge of the Linux environment. Setting up Cluster and Security authentication like Kerberos and testing the HDFS environment.

3. To provide new user access to Hive, Spark, etc. And cluster maintenance like adding (commissioning) node and removing (decommissioning) nodes. Resolve errors like memory issues, user access issues, etc.

4.Must and should knowledge on BigData platforms like:




A) Cloudera Manager

B) Horontworks Data Platform

C) MapR

D) Pseudo-distributed and Single node cluster setup etc.

5. Review and Managing log files and setting up of XML files.

6. As of now trending and career growth job.

7. Compared to Hadoop developers, Hadoop Admins are getting high salary packages in present marketing.

Summary: In the Bigdata environment Hadoop has valuable and trending jobs. And provide huge packages for both Hadoop developers and Hadoop administration. Depends upon skill set will prefer what we need for future growth.

Big Data Spark Multiple Choice Questions

Spark Multiple Choice Questions and Answers:

1)Point out the incorrect  statement in the context of Cassandra:

A) Cassandra is a centralized key -value store

B) Cassandra is originally designed at Facebook

C) Cassandra is designed to handle a large amount of data across many commodity servers, providing high availability with no single point if failure.

D) Cassandra uses a right based DHT*Distribution Hash Table) but without finger tables or routing

Ans : D

2. Which of the following are the simplest NoSQL databases in BigData environment?

A) Document                                    B) Key-Value Pair

C) Wide – Column                        D) All of the above mentioned 

Ans : ) All of the above mentioned

3) Which of the following is not a NoSQL database?

A) Cassandra                          B) MongoDB

C) SQL Server                           D) HBase

Ans: SQL Server

4) Which of the following is a distributed graph processing framework on top of Spark?

A) Spark Streaming                   B)MLlib

C)GraphX                                          D) All of the above

Ans: GraphX

5) Which of the following is leverage of Spark core fast scheduling capability to perform streaming analytics?

A) Spark Streaming                     B) MLlib

C)GraphX                                       D) RDDs

Ans: Spark Streaming

6) Which of the following Machine Learning API for Spark based on Which one:

A) RDD                                 B) Dataset

C)DataFrame          D) All of the above

Ans: DataFrame

7) Based on which functional programming language construct for Spark optimizer

A) Python                         B) R

C) Java                                   D)Scala

Ans: Scala is a functional programming language

8) Which of the following is a basic abstraction of Spark Streaming?

A)Shared variable                 B)RDD

C)Dstream                                  D)All of the above

Ans: Dstream

9) In a which cluster manager to do support of Spark?

A) MESOS                                B)YARN

C) Standalone Cluster manager   D) Pseudo Cluster manager

E) All of the above

Ans: All of the above

10) Which of the following is the reason for Spark being faster than MapReduce while execution time?

A) It supports different programming languages like Scala, Python, R, and Java.

B)RDDs

C)DAG execution engine and in-memory computation (RAM based)

D) All of the above

Ans: DAG execution engine and in-memory computation (RAM based)

BigData and Spark Multiple Choice Questions – I

1. In Spark, a —————– is a read-only collection of objects partitioned across a set of machines that can be rebuilt if a partition is lost.

A) Resilient Distributed Dataset (RDD)                  C)Driver

B)Spark Streaming                                                          D) Flat Map

Ans: Resilient Distributed Dataset (RDD)

2. Consider the following statement is the correct context of Apache Spark   :

Statement 1: Spark allows you to choose whether you want to persist Resilient Distributed Dataset (RDD) onto the disk or not.

Statement 2: Spark also gives you control over how you can partition your Resilient Distributed Datasets (RDDs).

A)Only statement 1 is true                 C)Both statements are true

B)Only statement 2 is true                  D)Both statements are false

Ans: Both statements are true

3) Given the following definition about the join transformation in Apache Spark:

def : join [W] (other: RDD[(K, W)]) : RDD [(K, (V, W))]

Where join operation is used for joining two datasets. When it is called on datasets of type (K, V) and (K, W), it returns a dataset of (K, (V, W)) pairs with all pairs of elements for each key.

Output the result of joinrdd, when the following code is run.

val rdd1 = sc.parallelize (Seq ((“m”,55), (“m”,56), (“e”,57), (“e”,58), (“s”,59),(“s”,54)))
val rdd2 = sc.parallelize (Seq ((“m”,60),(“m”,65),(“s”,61),(“s”,62),(“h”,63),(“h”,64)))
val joinrdd = rdd1.join(rdd2)
joinrdd.collect
A) Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)), (s,(59,62)), (h,(63,64)), (s,(54,61)), (s,(54,62)))
B) Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)), (s,(59,62)), (e,(57,58)), (s,(54,61)), (s,(54,62)))
C) Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)), (s,(59,62)), (s,(54,61)), (s,(54,62)))
D)None of the mentioned.

Ans: Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)), (s,(59,62)), (s,(54,61)), (s,(54,62)))

4)Consider the following statements are correct:

Statement 1: Scale up means incrementally grow your cluster capacity by adding more COTS machines (Components Off the Shelf)

Statement 2: Scale out means grow your cluster capacity by replacing with more powerful machines

A) Only statement 1 is true               C) Both statements are true

B) Only statement 2 is true              D) Both statements are false

Ans: Both statements are true

Complete mapR Installation on Linux machine

After completion of Prerequisite set up will go through directly with MapR actual steps for Installation on Linux machine.

Actual steps for MapR installation:

Step 1:  fdisk -l




Powerful and popular command it is used for the list of disk partition tables.

Step 2: cat /etc/yum.repos.d/mapr_ecosystem.repo

Install/Update mapr eco system repo files

Step 3:  cat /etc/yum.repos.d/mapr_installer.repo

Install/Update mapr installer repo  files

Step 4:  cat /etc/yum

configuring yum repos

Step 5:cat /etc/yum.repos.d/mapr_core.repo

Install/Update mapr repo repo files

Step 6: yum clean all

Yum un necessary repos cleaned

Step 7: yum update

Yum update

Step 8: yum list | grep mapr

Check yum list files in mapr by using grep command

Step 9: rpm –import http://package.mapr.com/releases/pub/maprgpg.key

Import mapr public key

Step 10: yum install mapr-cldb mapr-fileserver mapr-webserver mapr-resourcemanager mapr-nodemanager mapr-nfs mapr-gateway mapr-historyserver

Install mapr CLDB file server, Web server, Resource manager, node manager, nfs ,gateway and History server by using above single command.

Step 11: yum install mapr-zookeeper

Install MapR Zookeeper for configuration

Step 12:  ls -l /opt/mapr/roles

Check mapr roles

Step  13: rpm -qa | grep mapr

Step 14: id mapr

ID creation of mapr user

Step 15: hostname -i

Check Fully Qualified Domain Name

Step 16: /opt/mapr/server/configure.sh -N training -C 192.0.0.0 -Z  192.0.0.0:5181

Configure server with your ip

Step 17: cat /root/maprdisk.txt

Check disk files
Step 18: /opt/mapr/server/disksetup -F /root/maprdisk.txt

Disk setup in mapr disk.
Step 19: service mapr-zookeeper start

Start the MapR Zookeeper service

Step 20: service mapr-zookeeper status

Status of the MapR Zookeeper service

Step 21: service mapr-warden start

Start the MapR Warden service

Step 22: service mapr-warden status

Status of the MapR Warden service

Step 23: maprcli node cldbmaster

Step 24: maprcli license showid

Show your mapr license id

Step 25: https://<ipaddress>:8443

Open a web browser with your < IP address : 8443 > then will check it working or not

Step 26: hadoop fs -ls /

Check hadoop file list




Summary: Above steps are worked for Linux single node cluster for complete MapR Installation with the explanation each and every command.

CTS Hadoop and Spark Interview Questions

Nowadays IT market Hadoop and Spark interview question for experienced persons.

Round 1:

1. What is the future class in Scala programming language?




2.Difference between fold by fold Left or foldRight-in Scala?

3. How to distribute by will work in hive give some data tell me how to data will be distributed

4.dF.filter(Id == 3000) how to pass this condition in data frame on values in dynamically?

5. Have you worked on multithreading in Scala and explain?

7.On what basis you will increase the mappers in Apache Sqoop?

8. What will you mention last value while you are importing for the first time in Sqoop?

9. How do you mention date for incremental last modified in Spark?

10. Let’s say you have created the partition for Bengaluru but you loaded Hyderabad data what is the validation we have to do in this case to make sure that there won’t be any errors?

11. How many reducers will be launched in distributed by in Spark?

12. How to delete sqoop job in simple command?

13.In which location sqoop job last value will be stored?

14. What are the default input and output formats in Hive?

15. Can you explain brief idea about distributing cache in Spark with an example?

16. Did you use Kafka/Flume in your project and explain in detail?

17.Difference between Parquet and ORC file formats?

Round 2:

1. Explain your previous project?

2. How do you handle incremental data in apache sqoop?

3. Which Optimization techniques are used in sqoop?

4. What are the different parameters you pass your spark job?

5. In case one task is taking more time how will you handle?

6. What is stages and task in spark and give a real-time scenario?

7.On what basis you set mappers in Sqoop?

8. How will you export the data to Oracle without putting much load in the table?

9. What is column family in Hbase?




10. Can you create a table without mentioning column family

11.The number of column families limits for one table?

12. How to schedule Spark jobs in your previous project?

13. Explain Spark architecture with a real-time based scenario?

MapR Installation steps on AWS

MapR Installation on Amazon Web Service Machine with simple steps for Hadoop environment.




Step 1: Login with AWS credentials and then open the root machine.

[ec2-user@ip----~]$ sudo su -

Step 2: Put off the IP tables  services

[root@ip---- ~]# service iptables stop

Step 3: Check the configuration of iptables

[root@ip----- ~]# chkconfig iptables off

Step 4: Edit the SELinux configuration

[root@ip----~]# vim /etc/selinux/config

Step 5: EDIT replace enforcing with disabled (save and exit)

[root@ip----~]# SELINUX = disabled

Step 6: Open repos by using below command

[root@ip----~]# cd /etc/yum.repos.d/

Step 7: edit mar ecosystem repo file.

[root@ip----yum.repos.d]# vi mapr_ecosystem.repo

Put the following lines into the above file

[MapR_Ecosystem]
name = MapR Ecosystem Components
baseurl = http://package.mapr.com/releases/MEP/MEP-3.0.4/redhat
gpgcheck = 0
enabled = 1
protected = 1

Step 8: edit mapr installer repo files.

[root@ip----yum.repos.d]# vi mapr_installer.repo

Step 9: Edit mapr core repo files.

[root@ip----yum.repos.d]# vi mapr_core.repo

Put the following lines into the above file

[MapR_Core]
name = MapR Core Components
baseurl = http://archive.mapr.com/releases/v5.0.0/redhat/
gpgcheck = 1
enabled = 1
protected = 1

Step 10: create yum repolist

[root@ip----- yum.repos.d]# yum repolist

(here you will seen all packages)
Step 11: Search mapr package files.

[root@ip------ yum.repos.d]# yum list all | grep mapr

(this displays all packages related to mapr)

Step 12: import rpm package files

[root@ip----- yum.repos.d]# rpm --import

http://package.mapr.com/releases/pub/maprgpg.key

Step 13:  install mapr cldb file server,webserver,resource manager and node manager

[root@ip------ yum.repos.d]# yum install mapr-cldb mapr-fileserver mapr-

webserver mapr-resourcemanager mapr-nodemanager

Step 14: Install mapr Zookeeper

[root@ip------ yum.repos.d]# yum install mapr-zookeeper

Step 15: list of mapr files

[root@ip----- yum.repos.d]# ls -l /opt/mapr/roles/

Step 16: search for mapr rpm files by using files grep command.

[root@ip------ yum.repos.d]# rpm -qa | grep mapr

(displays installed packages related to mapr)

Step 17: Adding Group for mapr system

[root@ip------ yum.repos.d]# groupadd -g 5000 mapr

Step 18: Adding a user for mapr group system

[root@ip------ yum.repos.d]# useradd -g 5000 -u 5000 mapr

Step 19 : Set passwd for mapr user

[root@ip------ yum.repos.d]#passwd mapr

(here you will give password for mapr user)
(you can give any name)

Step 20: create id mapr

[root@ip------ yum.repos.d]# id mapr

Step 21: check Fully Qualified Doman Name using below command

[root@ip------ yum.repos.d]# hostname -f

Step 22: check disk availability

[root@ip------ yum.repos.d]# fdisk -l

(here you have seen available disks in that machine and select the second disk for mapr)

Step 23: Edit second disk information for maprdisk file system.

[root@ip----- yum.repos.d]# vi /root/maprdisk.txt

(here that second disk put here)(save and exit)

Step 24: Set the configuration server in different zones.

[root@ip----- yum.repos.d]# /opt/mapr/server/configure.sh -N training -C ip--------.ap-southeast-1.compute.internal -Z ip------.ap-southeast-1.compute.internal:5181

Step 25: Edit second disk files

[root@ip------ yum.repos.d]# cat /root/maprdisk.txt

Step 26: Download the rpm files

[root@ip------ ~]# wget http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm

Step 27: Extra package for enterprise linux system

[root@ip------ ~]# rpm -Uvh epel-release-6*.rpm

Step 28: Start Zookeeper services

[root@ip------ ~]# service mapr-zookeeper start

Step 29 :Start warden services

[root@ip-1----- ~]# service mapr-warden start

Step 30: Start MapR CLI NODE CLDB MASTER service



[root@ip----- ~]# maprcli node cldbmaster

Here you will go with your machine ip in web server for mcs..shown below..
example: http://192.168.0.0:8443