Complete Playlist of Unsupervised Machine Learning https://www.youtube.com/playlist?list=PLfQLfkzgFi7azUjaXuU0jTqg03kD-ZbUz
Let's take a look at what the K-means clustering algorithm does. Let me start with an example. Here I've plotted a data set with 30 unlabeled training examples. So there are 30 points. And what we like to do is run K-means on this data set. The first thing that the K-means algorithm does is it will take a random guess at where might be the centers of the two clusters that you might ask it to find. In this example I'm going to ask it to try to find two clusters. Later in this week we'll talk about how you might decide how many clusters to find. But the very first step is it will randomly pick two points, which I've shown here as a red cross and the blue cross, at where might be the centers of two different clusters. This is just a random initial guess and they're not particularly good guesses. But it's a start. One thing I hope you take away from this video is that K-means will repeatedly do two different things. The first is assign points to cluster centroids and the second is move cluster centroids. Let's take a look at what this means. The first of the two steps is it will go through each of these points and look at whether it is closer to the red cross or to the blue cross. The very first thing that K-means does is it will take a random guess at where are the centers of the cluster? And the centers of the cluster are called cluster centroids. After it's made an initial guess at where the cluster centroid is, it will go through all of these examples, x(1) through x(30), my 30 data points. And for each of them it will check if it is closer to the red cluster centroid, shown by the red cross, or if it's closer to the blue cluster centroid, shown by the blue cross. And it will assign each of these points to whichever of the cluster centroids It is closer to. I'm going to illustrate that by painting each of these examples, each of these little round dots, either red or blue, depending on whether that example is closer to the red or to the blue cluster centroid. So this point up here is closer to the red centroid, which is why it's painted red. Whereas this point down there is closer to the blue cluster centroid, which is why I've now painted it blue. So that was the first of the two things that K-means does over and over. Which is a sign points to clusters centroids. And all that means is it will associate which I'm illustrating with the color, every point of one of the cluster centroids. The second of the two steps that K-means does is, it'll look at all of the red points and take an average of them. And it will move the red cross to whatever is the average location of the red dots, which turns out to be here. And so the red cross, that is the red cluster centroid will move here. And then we do the same thing for all the blue dots. Look at all the blue dots, and take an average of them, and move the blue cross over there. So you now have a new location for the blue cluster centroid as well. In the next video we'll look at the mathematical formulas for how to do both of these steps. But now that you have these new and hopefully slightly improved guesses for the locations of cluster centroids, we'll look through all of the 30 training examples again. And check for every one of them, whether it's closer to the red or the blue cluster centroid for the new locations. And then we will associate them which are indicated by the color again, every point to the closer cluster centroid. And if you do that, you see that the field points change color. So for example, this point is colored red, because it was closer to the red cluster centroid previously. But if we now look again, it's now actually closer to the blue cluster centroid, because the blue and red cluster centroids have moved. So if we go through and associate each point with the closer cluster centroids, you end up with this. And then we just repeat the second part of K-means again. Which is look at all of the red dots and compute the average. And also look at all of the blue dots and compute the average location of all of the blue dots. And it turns out that you end up moving the red cross over there and the blue cross over here. And we repeat. Let's look at all of the points again and we color them, either red or blue, depending on which cluster centroid that is closer to. So you end up with this. And then again, look at all of the red dots and take their average location, and look at all the blue dots and take the average location, and move the clusters to the new locations. And it turns out that if you were to keep on repeating these two steps, that is look at each point and assign it to the nearest cluster centroid and then also move each cluster centroid to the mean of all the points with the same color.
Subscribe to our channel for more computer science related tutorials| https://www.youtube.com/@learnwithcoursera