MapR Architecture:
Before Hadoop was introduced in 2007, there was not a single data platform that can provide the scalable architecture to handle fast-growing data with a unified security model.
There are four important pillars of a data platform
1.Distributed Metadata
2.Variety of Protocols and API support
3.Variety of Data persistence like objects, files, tables and event queues.
4.Security
Distributed Metadata:
In Distributed metadata is a centralized metadata service leads to a number of restrictions as below:
1.Creates a single point of failure
2.Creates a hotspot that limits the scalability of the cluster
3.Limits sharing of data artifacts
4. Limits the number of data artifacts that can be stored in the cluster.
MapR has built a distributed metadata service from the top that removes all these restrictions.
CLDB (Container Location Data Base) serves as MapR’s level – I metadata service and maintains metadata about volumes, containers, nodes in the entire cluster.
The metadata about data artifacts such as objects, files, tables, topics, directories are maintained in the level-Il metadata is stored in the name container.
Variety of APIs and Protocol Support:
MapR Data Platform provides data ability among the different APIs. In different applications using different APIs:
1.HDFS API
2.S3 API
3.NFS
4.POSIX
5.OJAI API
6.CDC API
Variety of Data persistence:
MapR data container is the unit of storage allocation and management. Each container stores a variety of data elements such as objects, files, tables, and directories.
It supports two types of data elements:
1.File chunks
2.Key – Value stores
These two are data elements in MapR for thread file chunks across containers. Directories are built over Key-Value stores. The tables are built on top of files and key-value stores in an index.
MapR Data Platform war architected in such a way to solve most data problems for enterprise and eliminate data tools.
The heart of the MapR data platform is the Data Container.
And Data Container provides:
1.Different data persistence models, such as files, tables, objects etc.
2.Distributed scale-out storage
3.Data loss prevention
4.Failure resilience and disaster recovery