Complete Playlist of Unsupervised Machine Learning https://www.youtube.com/playlist?list=PLfQLfkzgFi7azUjaXuU0jTqg03kD-ZbUz
When building an anomaly detection algorithm, I found that choosing a good choice of features turns out to be really important. In supervised learning, if you don't have the features quite right, or if you have a few extra features that are not relevant to the problem, that often turns out to be okay. Because the algorithm has to supervised signal that is enough labels why for the algorithm to figure out what features ignore, or how to re scale feature and to take the best advantage of the features you do give it. But for anomaly detection which runs, or learns just from unlabeled data, is harder for the anomaly to figure out what features to ignore. So I found that carefully choosing the features, is even more important for anomaly detection, than for supervised learning approaches. Let's take a look at this video as some practical tips, for how to tune the features for anomaly detection, to try to get you the best possible performance. One step that can help your anomaly detection algorithm, is to try to make sure the features you give it are more or less Gaussian. And if your features are not Gaussian, sometimes you can change it to make it a little bit more Gaussian. Let me show you what I mean. If you have a feature X, I will often plot to hissed a gram of the feature which you can do using the python command PLT. Though you see this in the practice lab as well, in order to look at the history graham of the data. This distribution here looks pretty Gaussian. So this would be a good candidate feature. If you think this is a feature that hopes distinguish between anomalies and normal examples. But quite often when you plot a hissed a gram of your features, you may find that the feature has a distribution like this. This does not at all look like that symmetric bell shaped curve. When that is the case, I would consider if you can take this feature X, and transform it in order to make a more Gaussian. For example, maybe if you were to compute the log of X and plot a hissed a gram of log of X, look like this, and this looks much more Gaussian. And so if this feature was feature X one, then instead of using the original feature X one which looks like this on the left, you might instead replace that feature with log of X one, to get this distribution over here. Because when X one is made more Gaussian. When anomaly detection models P of X one using a Gaussian distribution like that, is more likely to be a good fit to the data. Other than the log function, other things you might do is, given a different feature X two, you may replace it with X two, log of X two plus one. This would be a different way of transforming X two. And more generally, log of X two plus C, would be one example of a formula you can use, to change X to try to make it more Gaussian. Or for a different feature, you might try taking the square root or really the square would have executed this X lead to the power of one half, and you may change that exponentially term. So for a different feature X four, you might use X four to the power of one third, for example. So when I'm building an anomaly detection system, I'll sometimes take a look at my features, and if I see any highly non Gaussian by plotting hissed a gram, I might choose transformations like these or others, In order to try to make it more Gaussian. It turns out a larger value of C, will end up transforming this distribution less. But in practice I just try a bunch of different values of C, and then try to take a look to pick one that looks better in terms of making the distribution more Gaussian. Now, let me illustrate how I actually do this and that you put a notebook. So this is what the process of exploring different transformations in the features might look like. When you have a feature X, you can plot a hissed a gram of it as follows. It actually looks like there's a pretty cause hissed a gram. Let me increase the number of bins in my history gram to 50. So bins equals 50 there. That's what hissed a gram bins. And by the way, if you want to change the color, you can also do so as follows. And if you want to try a different transformation, you can try for example to plot X square root of X. So X to the power of 0.5 with again 50 hissed a gram bins, in which case it might look like this. And this actually looks somewhat more Gaussian.
Subscribe to our channel for more computer science related tutorials| https://www.youtube.com/@learnwithcoursera