Clustering -Unsupervised Learning
The Question of What and why Clustering via Application of Clustering
A cluster can be defined as a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters.
Why Clustering?
- To group similar objects / data points
- To find homogeneous sets of customers
- To segment the data in similar groups
Applications:
- Marketing : Customer Segmentation & Profiling
- Libraries : Book classification
- Retail : Store Categorization
- More Examples
How do we define Similar/Dissimilar in clustering? mathematically via Distance
Various distance measures
- Eucledian Distance
- Chebyshev Distance
- Manhattan Distance
Now the question comes what is the difference between Classification and Clustering
Types of Clustering Procedures
- Hierarchical clustering is characterized by a tree like structure and uses distance as a measure of (dis)similarity
- Partitioning Algorithms starts with a set of partitions as clusters and iteratively refines the partitions to form stable clusters
How to select which clustering technique to use:
Conclusion:
Main objective of clustering is to find a structure from a collection of data to form sense out of data via labeling them.
Clustering is grouping data with similar characteristics. This characteristics among data is defined by the logic. This can be further illustrated with business used cases.
A small example with respect to banking which i have practically noticed
Banking:
· High Net worth Customer (HNI)
· High Risk Customer
· Medium risk Customer
Accounts Classification in Banks:
· Performing Asset or Standard Asset
· Substandard Asset
· Non-performing Asset (NPA)
— Doubtful Asset — 1
— Doubtful Asset — 2
— Doubtful Asset — 3
— Loss Asset
By labeling the customers in bank based on the amount of money in the account, annual debits and credits in account, based on this simple logic a cluster is formed and sense is put in simple terms via labels of HNI, NPA among the banking customers/accounts. Then, these categorical customers can be targeted for more business.
What’s next after clustering
Clustering provides you with clusters in the given datasets
Clustering does not provide you rules to classify future records
To be able to classify future records you may do the following
Build Classification Tree Model on Clustered Datasets
Business understanding with clustering application will help businesses to grow and target the market, product placement and sales strategies and much more.