Complete Playlist of Unsupervised Machine Learning https://www.youtube.com/playlist?list=PLfQLfkzgFi7azUjaXuU0jTqg03kD-ZbUz
Let's formalize how a reinforcement learning algorithm picks actions. In this video, you'll learn about what is a policy in reinforcement learning algorithm. Let's take a look. As we've seen, there are many different ways that you can take actions in the reinforcement learning problem. For example, we could decide to always go for the nearer reward, so you go left if this leftmost reward is nearer or go right if this rightmost reward is nearer. Another way we could choose actions is to always go for the larger reward or we could always go for smaller reward, doesn't seem like a good idea, but it is another option, or you could choose to go left unless you're just one step away from the lesser reward, in which case, you go for that one. In reinforcement learning, our goal is to come up with a function which is called a policy Pi, whose job it is to take as input any state s and map it to some action a that it wants us to take. For example, for this policy here at the bottom, this policy would say that if you're in state 2, then it maps us to the left action. If you're in state 3, the policy says go left. If you are in state 4 also go left and if you're in state 5, go right. Pi applied to state S, tells us what action it wants us to take in that state. The goal of reinforcement learning is to find a policy Pi or Pi of S that tells you what action to take in every state so as to maximize the return. By the way, I don't know if policy is the most descriptive term of what pi is, but it's one of those terms that's become standard in reinforcement learning. Maybe calling Pi a controller rather than a policy would be more natural terminology but policy is what everyone in reinforcement learning now calls this. In the last video, we've gone through quite a few concepts in reinforcement learning from states to actions to reward, to returns, to policies. Let's do a quick review of them in the next video and then we'll go on to start developing algorithms for finding that policies. Let's go on to the next video.
Subscribe to our channel for more computer science related tutorials| https://www.youtube.com/@learnwithgeeks