Clustering -Unsupervised Learning

Jai Kushwaha
3 min readSep 29, 2020

The Question of What and why Clustering via Application of Clustering

A cluster can be defined as a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters.

Why Clustering?

  • To group similar objects / data points
  • To find homogeneous sets of customers
  • To segment the data in similar groups

Applications:

  • Marketing : Customer Segmentation & Profiling
  • Libraries : Book classification
  • Retail : Store Categorization
  • More Examples
Use cases of Clustering

How do we define Similar/Dissimilar in clustering? mathematically via Distance

Various distance measures

  • Eucledian Distance
  • Chebyshev Distance
  • Manhattan Distance

Now the question comes what is the difference between Classification and Clustering

Types of Clustering Procedures

  • Hierarchical clustering is characterized by a tree like structure and uses distance as a measure of (dis)similarity
  • Partitioning Algorithms starts with a set of partitions as clusters and iteratively refines the partitions to form stable clusters
Advanced Classification

How to select which clustering technique to use:

Conclusion:

Main objective of clustering is to find a structure from a collection of data to form sense out of data via labeling them.

Clustering is grouping data with similar characteristics. This characteristics among data is defined by the logic. This can be further illustrated with business used cases.

A small example with respect to banking which i have practically noticed

Banking:

· High Net worth Customer (HNI)

· High Risk Customer

· Medium risk Customer

Accounts Classification in Banks:

· Performing Asset or Standard Asset

· Substandard Asset

· Non-performing Asset (NPA)

— Doubtful Asset — 1

— Doubtful Asset — 2

— Doubtful Asset — 3

— Loss Asset

By labeling the customers in bank based on the amount of money in the account, annual debits and credits in account, based on this simple logic a cluster is formed and sense is put in simple terms via labels of HNI, NPA among the banking customers/accounts. Then, these categorical customers can be targeted for more business.

What’s next after clustering

Clustering provides you with clusters in the given datasets

Clustering does not provide you rules to classify future records

To be able to classify future records you may do the following

Build Classification Tree Model on Clustered Datasets

Business understanding with clustering application will help businesses to grow and target the market, product placement and sales strategies and much more.

--

--

Jai Kushwaha

I am a 11yrs+ experienced Senior Consultant in Analytics and Model development with domain expertise in BFSI.