A clustering algorithm looks at data points and automatically finds data points that are related or similar to each other. In contrast to supervised learning, where the dataset included both the inputs x as well as the target outputs y, in unsupervised learning you are given a dataset with just x, but not the target labels y. [1] Unsupervised learning algorithms are great for complex processing tasks, such as organizing large datasets into clusters. "They are useful for identifying previously undetected patterns in data and can help identify features useful for categorizing data."[2]
"Clustering is a technique for exploring raw, unlabeled data and breaking it down into groups (or clusters) based on similarities or differences. It is used in a variety of applications, including customer segmentation, fraud detection, and image analysis. Clustering algorithms split data into natural groups by finding similar structures or patterns in uncategorized data."[2]
One type of unsupervised learning algorithms is the k-means clustering algorithm and it is used for exclusive clustering, where data is grouped in a way where a single data point can only belong to one cluster. It involves two main steps to identify clusters within a dataset. Initially, it randomly selects centroids for the clusters. The algorithm then iteratively performs the following, assign points to cluster centroids, for each data point, the algorithm determines the nearest centroid. The algorithm calculates the average location of all points assigned to the cluster and moves its centroid to the new location. This process repeats until there are no further changes in either the assignment of points to centroids or the locations of the centroids. At this point, the algorithm has converged. The k-means algorithm refines its cluster assignments and centroid positions through iterative steps and recomputes the centroids.[2]
[1] Andrew Ng, Stanford University & DeepLearning.AI, Machine Learning Specialization, Course 3, Week 1
[2] What is unsupervised learning?, Google Cloud