Complete Playlist of Unsupervised Machine Learning https://www.youtube.com/playlist?list=PLfQLfkzgFi7azUjaXuU0jTqg03kD-ZbUz

In this video, we'll look at two further refinements to the reinforcement learning algorithm you've seen. The first idea is called using mini-batches, and this turns out to be an idea they can both speedup your reinforcement learning algorithm and it's also applicable to supervised learning. They can help you speed up your supervised learning algorithm as well, like training a neural network, or training a linear regression, or logistic regression model. The second idea we'll look at is soft updates, which it turns out will help your reinforcement learning algorithm do a better job to converge to a good solution. Let's take a look at mini-batches and soft updates. To understand mini-batches, let's just look at supervised learning to start. Here's the dataset of housing sizes and prices that you had seen way back in the first course of this specialization on using linear regression to predict housing prices. There we had come up with this cost function for the parameters w and b, it was 1 over 2m, sum of the prediction minus the actual value y^2. The gradient in this algorithm was to repeatedly update w as w minus [inaudible] alpha times the partial derivative respect to w of the cost J of wb, and similarly to update b as follows. Let me just take this definition of J of wb and substitute it in here. Now, when we looked at this example, way back when were starting to talk about linear regression and supervised learning, the training set size m was pretty small. I think we had 47 training examples. But what if you have a very large training set? Say m equals 100 million. There are many countries including the United States with over a 100 million housing units, and so a national census will give you a dataset that is this order of magnitude or size. The problem with this algorithm when your dataset is this big, is that every single step of gradient descent requires computing this average over 100 million examples, and this turns out to be very slow. Every step of gradient descent means you would compute this sum or this average over 100 million examples. Then you take one tiny gradient descent step and you go back and have to scan over your entire 100 million example dataset again to compute the derivative on the next step, they take another tiny gradient descent step and so on and so on. When the training set size is very large, this gradient descent algorithm turns out to be quite slow. The idea of mini-batch gradient descent is to not use all 100 million training examples on every single iteration through this loop. Instead, we may pick a smaller number, let me call it m prime equals say, 1,000. On every step, instead of using all 100 million examples, we would pick some subset of 1,000 or m prime examples. This inner term becomes 1 over 2m prime is sum over sum m prime examples. Now each iteration through gradient descent requires looking only at the 1,000 rather than 100 million examples, and every step takes much less time and just leads to a more efficient algorithm. What mini-batch gradient descent does is on the first iteration through the algorithm, may be it looks at that subset of the data. On the next iteration, maybe it looks at that subset of the data, and so on. For the third iteration and so on, so that every iteration is looking at just a subset of the data so each iteration runs much more quickly. To see why this might be a reasonable algorithm, here's the housing dataset. If on the first iteration we were to look at just say five examples, this is not the whole dataset but it's slightly representative of the string line you might want to fit in the end, and so taking one gradient descent step to make the algorithm better fit these five examples is okay. But then on the next iteration, you take a different five examples like that shown here. You take one gradient descent step using these five examples, and on the next iteration you use a different five examples and so on and so forth. You can scan through this list of examples from top to bottom. That would be one way. Another way would be if on every single iteration you just pick a totally different five examples to use. You might remember with batch gradient descent, if these are the contours of the cost function J. Then batch gradient descent would say, start here and take a step, take a step, take a step, take a step, take a step. Every step of gradient descent causes the parameters to reliably get closer to the global minimum of the cost function here in the middle.

Subscribe to our channel for more computer science related tutorials| https://www.youtube.com/@learnwithgeeks