Machine Learning Workflow

Machine Learning how to works in a step by step process with diagram:

Here Historic data is collected the entire data and used as the input for machine learning. This may however used indirectly it may need to be cleared and transformed etc. Some of the common tasks during feature engineering include handling missing values, handling outliers creating new features out of existing ones etc.

After feature engineering the data is split into two types are train data and data. The train data is used for training the machine learning model and data information. In other words, the machine learns from the training data. To build the model machine learning some algorithms are used.

Once the model is built it validated against test data. The concept of the test data is that it is real-time data. If the model performance on both train and test data satisfactory then the model completely.


Spam classification in Email conversations:

Historic data :

In emails msgs are received by user containing labels whether the user marked it as Spam or not

Feature Engineering :

Extracting information from past emails such as sender mail ID, sender IP, characteristics of email.

Train and Test data :

Some of the emails are kept for a test, the remaining are used to train the machine learning model.

Machine Learning algorithm :

Supervised learning algorithms to learn the pattern in train data. The patterns are a general representation of what makes email spam or not spam

Machine learning Model :

The model built using the train data

Model Validation :

Test the machine learning model against the emails in the best data in order to check if the model is able to predict spam emails as spam correctly.

New data :

New email being sent to the user in real – time

Results :

If the models detect that this new email is spam, then the email is moved the “Spam” folder otherwise it is related in the user’s inbox.

The World of Machine Learning

When Machine Learning is Used:

Analysis of a given data by a human being has a massive associated cost, time and effort as in the case of Express.

Human competence is absent for example If we want to navigate on space and we don’t have the expertise available we can make a machine lean and let it navigate on an unknown territory without any human

In Human competence cannot always be explained.

Example: Image processing, Self-driven, Speech recognition

Amazon Go:

At present situation, physical stores like Amazon is setting up and eliminates the need for register while going through a commonplace shopping experience. Shoppers can pick items off the aisles as the would any supermarket without the need to go through the inconvenience of billing the items at counters. All shoppers need to is swipe a card when they enter the store and pick up product via Amazon Wallet.

Translate Apps:

Nowadays a lot of building applications that help us communicate with people in any languages at all. Some applications like Google Translate have eliminated the need for memorizing common phrases from different languages and struggle of connecting with people from all over the world.

Self Driven Cars:

The present has a lot of cars that can take anybody from one place to other places with no need for human interaction. These cars navigate independently, since any other barrier in the path and manually accordingly. Slowing down, stopping and accelerating up on their own, self-driven cars have the potential of reducing accidents. From tolerating time for drivers reducing traffic, self-driven cars have turned around the whole experience of traveling on roads.

Summary: At present world around Machine learning. In Machine learning where we used these type of technique and when used in different types of Application in Technology

Mahout in Machine Learning


Mahout is an open source by the Apache Software Foundation to implementations of all kinds of machine learning techniques with the goal of creating scalabe algorithms that are free to under the Apache license. In order to see the algorithms currently implemented in mahout type the following command in the terminal.

I)export MAHOUT_HOME = /Path



All these can be accessed the $MAHOUT_HOME /bin/mahout command-line driver. Each of these needs certain arguments as input to generate the output.

Mahout algorithms are Divided into 4 sections:

1.Collaborative filtering



4.Mahout utilities

1. Collaborative filtering:

Collaborative filtering is a machine learning technique used for generating recommendations. It uses information’s like ratings, user preference, etc.

Collaborative filtering basically tow ways of generating recommendations.

I) User – Based : In this recommend items by finding similar users. Example if a user purchased a computer and the second user has even purchased a computer along with other products then they are a supposed to be similar users and the items purchased by the second user other than the computer recommended to the first user. It is like the dynamic nature of users.

II) Item-Base: This item based recommendations calculate the similarity between items and creates a similarity matrix from which recommendation are generated Mahout provides a set of components from our own recommendation engine.

Examples of Collaborative Filtering Algorithms:-

Distributed Item – based Collaborative Filtering

Non-Distributed recommend

Collaborative Filtering using Matrix factorization.

Mahout Utilities :

In Mahout some algorithms , it helps in preparing content into formats for Mahout and are called MAHOUT UTILITIES.

Mahout Utilities mainly 3 categories

Creating Vectors from Text – In this utilities allow to produce Mahout Vector representations. There are mainly two utilities for converting a directory of text documents into Vector.

Creating text from Vectors – In this utilities allow to produce text from vectors.

Viewing Result – These utilities particularly is used for viewing the clusters generating by clustering algorithms.

Machine Learning

What is Machine Learning?

Machine Learning is a branch of artificial intelligence (AI) that focuses on the development of computer programs that can teach to grow and change exposed to new data. It is concerned with the design and development of algorithms that can take complex input data and can make an intelligent decision based on the input data.

Types of Machine Learning:

1. Supervised Learning :

This type of machine learning is concerned with associating some undefined data document to some predefined label of the training data in order to predict the value of any valid input.  Common examples of supervised learning include classifying e-mail message as spam. Mainly all classification algorithms. Example Mahout.

2.Unsupervised Learning :

Unsupervised learning type of machine learning is concerned with making sense out of complex hard to understand data by creating some similarity or some interesting patterns. No labels are associated with it. Mainly all clustering algorithms in mahout for example.

3.Semi-Supervised Learning :

Semi-supervised learning is concerned with defining an undefined data document in the presence of both labeled and unlabeled data. It is a combination of supervised and unsupervised learning. The main aim of semi-supervised learning is to demonstrate how combining both labeled and unlabeled data can change learning behavior.