Complete Playlist of Unsupervised Machine Learning https://www.youtube.com/playlist?list=PLfQLfkzgFi7azUjaXuU0jTqg03kD-ZbUz
How to improve Neural Network Architecture? In the last video, we saw a neural network architecture that will input the state and action and attempt to output the Q function, Q of s, a. It turns out that there's a change to neural network architecture that make this algorithm much more efficient. Most implementations of DQN actually use this more efficient architecture that we'll see in this video. Let's take a look. This was the neural network architecture we saw previously, where it would input 12 numbers and output Q of s, a. Whenever we are in some state s, we would have to carry out inference in the neural network separately four times to compute these four values so as to pick the action a that gives us the largest Q value. This is inefficient because we have to carry our inference four times from every single state. Instead, it turns out to be more efficient to train a single neural network to output all four of these values simultaneously. This is what it looks like. Here's a modified neural network architecture where the input is eight numbers corresponding to the state of the lunar lander. It then goes through the neural network with 64 units in the first hidden layer, 64 units in the second hidden layer. Now the output unit has four output units, and the job of the neural network is to have the four output units output Q of s, nothing, Q of s, left, Q of s, main, and q of s, right. The job of the neural network is to compute simultaneously the Q value for all four possible actions for when we are in the state s. This turns out to be more efficient because given the state s we can run inference just once and get all four of these values, and then very quickly pick the action a that maximizes Q of s, a. You notice also in Bellman's equations, there's a step in which we have to compute max over a prime Q of s prime a prime, this maximizes a gamma and then there was plus R of s up here. This dual network also makes it much more efficient to compute this because we're getting Q of s prime a prime for all actions a prime at the same time. You can then just pick the max to compute this value for the right-hand side of Bellman's equations. This change to the neural network architecture makes the RN much more efficient, and so we will be using this architecture in the practice lab. Next, there's one other idea that'll help the algorithm a lot which is something called an Epsilon-greedy policy, which affects how you choose actions even while you're still learning. Let's take a look at the next video , and what that means.
Subscribe to our channel for more computer science related tutorials| https://www.youtube.com/@learnwithgeeks