Complete Playlist of Unsupervised Machine Learning https://www.youtube.com/playlist?list=PLfQLfkzgFi7azUjaXuU0jTqg03kD-ZbUz
Now that you've seen how the Gaussian or the normal distribution works for a single number, we're ready to build our anomaly detection algorithm. Let's dive in. You have a training set x1 through xm, where here each example x has end features. So, each example x is a vector within numbers. In the case of the airplane engine example, we had two features corresponding to the heat and the vibrations. And so, each of these excise would be a two dimensional vector and n would be equal to 2. But for many practical applications can be much larger and you might do this with dozens or even hundreds of features. Given this training set, what we would like to do is to carry out density estimation and all that means is, we will build a model or estimate the probability for p(x). What's the probability of any given feature vector? And our model for p(x) is going to be as follows, x is a feature vector with values x1, x2 and so on down to xn. And I'm going to model p(x) as the probability of x1, times the probability of x2, times the probably of x3 times the probability of xn, for the n th features in the feature vectors. If you've taken an advanced class in probably in statistics before, you may recognize that this equation corresponds to assuming that the features x1, x2 and so on up to xm are statistically independent. But it turns out this algorithm often works fine even that the features are not actually statistically independent. But if you don't understand what I just said, don't worry about it. Understanding physical independence is not needed to fully complete discuss and also, be able to very effectively use anomaly detection algorithm. Now, to fill in this equation a little bit more, we are saying that the probability of all the features of this vector features x, is the product of p(x) 1 and p(x2) and so on up through p(xn). And in order to model the probability of x1, say the heat feature in this example we're going to have two parameters, new 1 and sigma 1 or sigma squared is 1. And what that means is we're going to estimate, the mean of the feature x1 and also the variance of feature x1 and that will be new 1 and sigma 1. To model p(x2) x2 is a totally different feature measuring the vibrations of the airplane engine. We're going to have two different parameters, which I'm going to write as mu 2, sigma 2 squared. And it turns out this will correspond to the mean or the average of the vibration feature and the variance of the vibration feature and so on. If you have additional features mu 3 sigma 3 squared up through mu n and sigma n squared. In case you're wondering why we multiply probabilities, maybe here's 1 example that could build intuition. Suppose for an aircraft engine there's a 1/10 chance that it is really hot, unusually hot and maybe there is a 1 in 20 chance that it vibrates really hard. Then, what is the chance that it runs really hot and vibrates really hard. We're saying that the chance of that is 1/10 times 1/20 which is 1/200. So it's really unlikely to get an engine that both fronts really hot and vibrates really hard. It's the product of these two probabilities A somewhat more compact way to write this equation up here, is to say that this is equal to, the product from j =1 through n of p(xj). Would parameters mu j and sigma squared j. And this symbol here is a lot like the summation symbol except that whereas the summation symbol corresponds to addition, this symbol here corresponds to multiplying these terms over here for j =1 through n. So let's put it all together to see how you can build in the nominee detection system. The first step is to choose features xi that you think might be indicative of anomalous examples. Having come up with the features you want to use, you would then fit the parameters mu 1 through mu n and sigma square 1 through sigma squared n, for the n features in your data set. As you might guess, the parameter mu j will be just the average of xj of the feature j of all the examples in your training set.
Subscribe to our channel for more computer science related tutorials| https://www.youtube.com/@learnwithcoursera