Complete Playlist of Unsupervised Machine Learning https://www.youtube.com/playlist?list=PLfQLfkzgFi7azUjaXuU0jTqg03kD-ZbUz

In the earlier courses, courses one and two of the specialization, you saw a lot of supervised learning algorithms as taking training set posing a cost function. And then using grading descent or some other algorithms to optimize that cost function. It turns out that the key means algorithm that you saw in the last video is also optimizing a specific cost function. Although the optimization algorithm that it uses to optimize that is not creating dissent is actually the algorithm that you already saw in the last video. Let's take a look at what all this means. Let's take a look at what is the cost function for a means, to get started as a reminder this is a notation we've been using whereas CI is the index of the cluster. So CI is some number from one Su K of the index of the cluster to which training example XI is currently assigned and new K is the location of cluster centroid k. Let me introduce one more piece of notation, which is when lower case K equals CI. So mu subscript CI is the cluster centroid of the cluster to which example XI has been assigned. So for example, if I were to look at some training example C train example 10 and I were to ask What's the location of the constant century to which the 10th training example has been assigned? Well, I would then look up C10. This will give me a number from one to K. That tells me was example 10 assigned to the red or the blue or some other cluster centroid, and then mu subscript C- 10 is the location of the cluster centroid to which extent has been assigned. So armed with this notation, let me now write out the cost function that K means turns out to be minimizing. The cost function J, which is a function of C1 through CM. These are all the assignments of points to clusters Androids as well as new one through mu capsule K. These are the locations of all the clusters centroid is defined as this expression on the right. It is the average, so one over M some from Michael's one to him of the squared distance between every training example XI as I goes from one through M it is a square distance between X I. And Nu subscript C high. So this quantity up here, in other words, the cost function good for Kenyans is the average squared distance between every training example XI. And the location of the cluster centroid to which the training example exile has been assigned. So for this example up here we've been measuring the distance between X10 and mu subscript C10. The cluster centroid to which extent has been assigned and taking the square of that distance and that would be one of the terms over here that we're averaging over. And it turns out that what the K means album is doing is trying to find assignments of points of clusters centroid as well as find locations of clusters centroid that minimizes the squared distance. Visually, here's what you saw part way into the run of K means in the earlier video. And at this step the cost function. If you were to computer it would be to look at everyone at the blue points and measure these distances and computer square. And then also similarly look at every one of the red points and compute these distances and compute the square. And then the average of the squares of all of these differences for the red and the blue points is the value of the cost function J, at this particular configuration of the parameters for kings. And what they will do on every step is try to update the cluster assignments C1 through C30 in this example. Or update the positions of the cluster centralism, U1 and U2. In order to keep on reducing this cost function J. By the way, this cost function J also has a name in the literature is called the distortion function.

Subscribe to our channel for more computer science related tutorials| https://www.youtube.com/@learnwithcoursera