In supervised learning techniques, we saw that the model needs to be trained with labeled data. In such techniques, each sample of the data set has input data and a label.
Now suppose, the data set does not have any labels. In such scenarios, unsupervised learning techniques are used.
In this kind of machine learning technique, labeling of data is not required and model is trained with unlabeled data. As manual intervention of labeling the data is not required, so the user does not need to supervise the training of the model. The model itself groups the unsorted data and discovers the similarities, differences and patterns among different samples of the training data.
There are several benefits of using unsupervised learning technique as it:
- Finds all patterns in data.
- Helps to find features which can be useful for splitting the data in categories.
- Becomes easier to get unlabeled data than labeled data.
There are few shortcomings also of unsupervised learning techniques:
- The algorithm is computationally more complex.
- The accuracy is less compared to supervised learning techniques. As there are no labels, it is difficult to find out how accurate our model is performing.
Types of Unsupervised Learning:
Unsupervised learning techniques are classified in two categories:
1. Clustering
When the data needs to be split into several groups based on some criteria, clustering techniques can help to solve such problems.
Any business needs to understand its customers, to target them with better value proposition, it may want to split them into several groups based on different criteria such as age, gender or any other criteria. Clustering techniques can segment the data and find the clusters inherent in the data.
The Major types of clustering techniques are:
- Hierarchical Clustering
- Density based Clustering
- Gaussian Mixture Models
- K-means clustering
- K-nearest neighbors (KNN)
- Latent Dirichlet Allocation (LDA)
- Dimensionality Reduction Techniques:
- Principal Component Analysis (PCA)
- Singular Value Decomposition (SVD)
- Random Projections
- Independent Component Analysis
- Cluster Validation
- Self organizing maps
- Hidden Markov Models
2. Association Rules
When we want to discover hidden relationships between variables or features in a data set, then association rules can be used. These rules try to find out the correlations between different entities in the data set. The rules are used for description of the data rather than prediction. They provide valuable insights regarding patterns in the data set.
The association rules find foremost use cases in marketing related problems such as market basket analysis, promotional pricing, and assortment decisions. By analyzing the transactions in a shopping supermart, it can be found out what items are frequently bought together. Such insights help in designing the layout of supermarket.
Association rules are also used in web usage mining, intrusion detection and bioinformatics.
In the next few sections, we will discuss each technique in details. For any feedback or correction in the article, please share on the social media accounts mentioned on the page.