Mahout in Machine Learning

Mahout:





Mahout is an open source by the Apache Software Foundation to implementations of all kinds of machine learning techniques with the goal of creating scalabe algorithms that are free to under the Apache license. In order to see the algorithms currently implemented in mahout type the following command in the terminal.

I)export MAHOUT_HOME = /Path

II)cd $MAHOUT_HOME

III)bin/mahout

All these can be accessed the $MAHOUT_HOME /bin/mahout command-line driver. Each of these needs certain arguments as input to generate the output.

Mahout algorithms are Divided into 4 sections:

1.Collaborative filtering

2.Categorization

3.Clustering

4.Mahout utilities

1. Collaborative filtering:

Collaborative filtering is a machine learning technique used for generating recommendations. It uses information’s like ratings, user preference, etc.

Collaborative filtering basically tow ways of generating recommendations.

I) User – Based : In this recommend items by finding similar users. Example if a user purchased a computer and the second user has even purchased a computer along with other products then they are a supposed to be similar users and the items purchased by the second user other than the computer recommended to the first user. It is like the dynamic nature of users.

II) Item-Base: This item based recommendations calculate the similarity between items and creates a similarity matrix from which recommendation are generated Mahout provides a set of components from our own recommendation engine.

Examples of Collaborative Filtering Algorithms:-

Distributed Item – based Collaborative Filtering

Non-Distributed recommend

Collaborative Filtering using Matrix factorization.

Mahout Utilities :

In Mahout some algorithms , it helps in preparing content into formats for Mahout and are called MAHOUT UTILITIES.

Mahout Utilities mainly 3 categories




Creating Vectors from Text – In this utilities allow to produce Mahout Vector representations. There are mainly two utilities for converting a directory of text documents into Vector.

Creating text from Vectors – In this utilities allow to produce text from vectors.

Viewing Result – These utilities particularly is used for viewing the clusters generating by clustering algorithms.