Supervised Learning Techniques

In supervised machine learning techniques, we have:

1. A labeled data set. It means each example / sample of the data set has two parts: one or more input variables (X) and an output variable (Y).

2. A mapping function f which when applied on the input variables provides the value of output variable.

This relationship between input variables, output variable and the mapping function f can be denoted as:

Y = f (X)

The aim of a supervised learning technique is to get an approximate function f, that when supplied with a new input data set (X), it predicts the output variable (Y) which is very close to the true value.

In later sections, we will discuss in detail how to calculate the deviation of the predicted output value with the true output.

Major problems to be solved with Supervised Learning:

A supervised machine learning technique is used to solve two different kind of problems: Regression and Classification.

Regression

When the output variable is a continuous output or, in other words, a real value then the problem is a regression problem. In such scenario, our mapping function should be a function which when provided with the input data, predicts a continuous value. Such functions are called as continuous function.

For example: Let us suppose, we are provided with a data set in which we need to predict the price of the houses with different sizes. Here, the input variable is the size of house and output variable is the price of the house. As price of a house is a real value / continuous value, hence it is a regression problem.

Classification

Now, let us think about a problem when output variable is not a real value, instead it is a categorical variable. In such kind of problems, we are trying to make categories / classes of the output variable. Such outputs are discrete output, not a continuous output as mentioned in Regression problem. These kind of problems are called classification problem.

For an instance, let us assume that we need to categorize all the emails in our mail box as spam or not-spam. So, here the output is a categorical variable. This is a classification problem.

Types of Supervised Learning Techniques

There are a lot many supervised learning techniques. Some of the commonly used techniques are:

  • Linear Regression
  • Logistic Regression
  • Support Vector Machines
  • Naive Bayes
  • K-Nearest Neighbor
  • Decision trees
  • Neural Networks

In the next few sections, we will discuss each technique in details for developing better understanding.