Complete Playlist of Unsupervised Machine Learning https://www.youtube.com/playlist?list=PLfQLfkzgFi7azUjaXuU0jTqg03kD-ZbUz
Back in the first course, you have seen how for linear regression, future normalization can help the algorithm run faster. In the case of building a recommended system with numbers wide such as movie ratings from one to five or zero to five stars, it turns out your algorithm will run more efficiently. And also perform a bit better if you first carry out mean normalization. That is if you normalize the movie ratings to have a consistent average value, let's take a look at what that means. So here's the data set that we've been using. And down below is the cost function you used to learn the parameters for the model. In order to explain mean normalization, I am actually going to add fifth user Eve who has not yet rated any movies. And you see in a little bit that adding mean normalization will help the algorithm make better predictions on the user Eve. In fact, if you were to train a collaborative filtering algorithm on this data, then because we are trying to make the parameters w small because of this regularization term. If you were to run the algorithm on this dataset, you actually end up with the parameters w for the fifth user, for the user Eve to be equal to as well as quite likely b(5) = 0. And this is equal to 0 if w and b above equals 0. And so this algorithm will predict that if you have a new user that has not yet rated anything, we think they'll rate all movies with zero stars and that's not particularly hopeful. So in this video, we'll see that mean normalization will help this algorithm come up with better predictions of the movie ratings for a new user that has not yet rated any movies. In order to describe mean normalization, let me take all of the values here including all the question marks for Eve and put them in a two dimensional matrix like this. Just to write out all the ratings including the question marks in a more sustained and more compact way. To carry out mean normalization, what we're going to do is take all of these ratings and for each movie, compute the average rating that was given. So movie one had two 5s and two 0s and so the average rating is 2.5. Movie two had a 5 and a 0, so that averages out to 2.5. Movie three 4 and 0 averages out to 2. Movie four averages out to 2.25 rating. So this seems more reasonable to think Eve is likely to rate this movie 2.5 rather than think Eve will rate all movie zero stars just because she hasn't rated any movies yet. And in fact the effect of this algorithm is it will cause the initial guesses for the new user Eve to be just equal to the mean of whatever other users have rated these five movies. And that seems more reasonable to take the average rating of the movies rather than to guess that all the ratings by Eve will be zero. It turns out that by normalizing the mean of the different movies ratings to be zero, the optimization algorithm for the recommended system will also run just a little bit faster. But it does make the algorithm behave much better for users who have rated no movies or very small numbers of movies. And the predictions will become more reasonable. In this example, what we did was normalize each of the rows of this matrix to have zero mean and we saw this helps when there's a new user that hasn't rated a lot of movies yet. There's one other alternative that you could use which is to instead normalize the columns of this matrix to have zero mean. And that would be a reasonable thing to do too. But I think in this application, normalizing the rows so that you can give reasonable ratings for a new user seems more important than normalizing the columns. Normalizing the columns would hope if there was a brand new movie that no one has rated yet. But if there's a brand new movie that no one has rated yet, you probably shouldn't show that movie to too many users initially because you don't know that much about that movie. So normalizing columns the hope with the case of a movie with no ratings seems less important to me than normalizing the rules to hope with the case of a new user that's hardly rated any movies yet. And when you are building your own recommended system in this week's practice lab, normalizing just the roles should work fine. So that's mean normalization. It makes the algorithm run a little bit faster. But even more important, it makes the algorithm give much better, much more reasonable predictions when there are users that rated very few movies or even no movies at all. This implementation detail of mean normalization will make your recommended system work much better.
Subscribe to our channel for more computer science related tutorials| https://www.youtube.com/@learnwithcoursera