Complete Playlist of Unsupervised Machine Learning https://www.youtube.com/playlist?list=PLfQLfkzgFi7azUjaXuU0jTqg03kD-ZbUz
In machine learning, reinforcement learning is one of those ideas that while not very widely applied in commercial applications yet today, is one of the pillars of machine learning. And has lots of exciting research backing it up and improving it every single day. So let's start by taking a look at what is reinforcement learning. Let's start with an example. Here's a picture of an autonomous helicopter. This is actually the Stanford autonomous helicopter, weighs 32 pounds and it's actually sitting in my office right now. Like many other autonomous helicopters, it's instrumented with an onboard computer, GPS, accelerometers, and gyroscopes and the magnetic compass so it knows where it is at all times quite accurately. And if I were to give you the keys to this helicopter and ask you to write a program to fly it, how would you do so? Radio controlled helicopters are controlled with joysticks like these and so the task is ten times per second you're given the position and orientation and speed and so on of the helicopter. And you have to decide how to move these two control sticks in order to keep the helicopter balanced in the air. By the way, I've flown radio controlled helicopters as well as quad rotor drones myself. And radio controlled helicopters are actually quite a bit harder to fly, quite a bit harder to keep balanced in the air. So how do you write a program to do this automatically? Let me show you a fun video of something we got a Stanford autonomous helicopter to do. Here's a video of it flying under the control of a reinforcement learning algorithm. And let me play the video. I was actually the cameraman that day and this is the helicopter flying on the computer control and if I zoom out the video, you see the trees planted in the sky. So using reinforcement learning, we actually got this helicopter to learn to fly upside down. We told it to fly upside down. And so reinforced learning has been used to get helicopters to fly a wide range of stunts or we call them aerobatic maneuvers. By the way, if you're interested in seeing other videos, you can also check them out at this URL. So how do you get a helicopter to fly itself using reinforcement learning? The task is given the position of the helicopter to decide how to move the control sticks. In reinforcement learning, we call the position and orientation and speed and so on of the helicopter the state s. And so the task is to find a function that maps from the state of the helicopter to an action a, meaning how far to push the two control sticks in order to keep the helicopter balanced in the air and flying and without crashing. One way you could attempt this problem is to use supervised learning. It turns out this is not a great approach for autonomous helicopter flying. But you could say, well if we could get a bunch of observations of states and maybe have an expert human pilot tell us what's the best action y to take. You could then train a neural network using supervised learning to directly learn the mapping from the states s which I'm calling x here, to an action a which I'm calling the label y here. But it turns out that when the helicopter is moving through the air is actually very ambiguous, what is the exact one right action to take. Do you tilt a bit to the left or a lot more to the left or increase the helicopter stress a little bit or a lot? It's actually very difficult to get a data set of x and the ideal action y. So that's why for a lot of task of controlling a robot like a helicopter and other robots, the supervised learning approach doesn't work well and we instead use reinforcement learning. Now a key input to a reinforcement learning is something called the reward or the reward function which tells the helicopter when it's doing well and when it's doing poorly. So the way I like to think of the reward function is a bit like training a dog. When I was growing up, my family had a dog and it was my job to train the dog or the puppy to behave. So how do you get a puppy to behave well? Well, you can't demonstrate that much to the puppy. Instead you let it do its thing and whenever it does something good, you go, good dog. And whenever they did something bad, you go, bad dog. And then hopefully it learns by itself how to do more of the good dog and fewer of the bad dog things. So training with the reinforcement learning algorithm is like that. When the helicopter's flying well, you go, good helicopter and if it does something bad like crash, you go, bad helicopter.
Subscribe to our channel for more computer science related tutorials| https://www.youtube.com/@learnwithgeeks