Complete Playlist of Unsupervised Machine Learning https://www.youtube.com/playlist?list=PLfQLfkzgFi7azUjaXuU0jTqg03kD-ZbUz
When you have a few positive examples with Michael's 1 and a large number of negative examples say y = 0? When should you use anomaly detection and when should you use supervised learning? The decision is actually quite subtle in some applications. So let me share with you some thoughts and some suggestions for how to pick between these two types of algorithms. And anomaly detection algorithm will typically be the more appropriate choice when you have a very small number of positive examples, 0-20 positive examples is not uncommon. And a relatively large number of negative examples with which to try to build a model for p of x. When you recall that the parameters for p of x are learned only from the negative examples and this much smaller. So the positive examples is only used in your cross validation set and test set for parameter tuning and for evaluation. In contrast, if you have a larger number of positive and negative examples, then supervised learning might be more applicable. Now, even if you have only 20 positive training examples, it might be okay to apply a supervised learning algorithm. But it turns out that the way anomaly detection looks at the data set versus the way supervised learning looks at the data set are quite different. She is the main difference, which is that if you think there are many different types of an obvious or many different types of positive examples. Then anomaly detection might be more appropriate when there are many different ways for an aircraft engine to go wrong. And if tomorrow there may be a brand new way for an aircraft engine to have something wrong with it. Then your 20 say positive examples may not cover all of the ways that an aircraft engine could go wrong. That makes it hard for any algorithm to learn from the small set of positive examples what the anomalies, what the positive examples look like. And future anomalies may look nothing like any of the anomalous examples we've seen so far. If you believe this to be true for your problem, then I would gravitate to using an anomaly detection algorithm. Because what anomaly detection does is it looks at the normal examples that is the y = 0 negative examples and just try to model what they look like. And anything that deviates a lot from normal It flags as an anomaly. Including if there's a brand new way for an aircraft engine to fail that had never been seen before in your data set. In contrast, supervised learning has a different way of looking at the problem. When you're applying supervised learning ideally you would hope to have enough positive examples for the average to get a sense of what the positive examples are like. And with supervised learning, we tend to assume that the future positive examples are likely to be similar to the ones in the training set. So let me illustrate this with one example, if you are using a system to find, say financial fraud. There are many different ways unfortunately that some individuals are trying to commit financial fraud. And unfortunately there are new types of financial fraud attempts every few months or every year. And what that means is that because they keep on popping up completely new. And unique forms of financial fraud anomaly detection is often used to just look for anything that's different, then transactions we've seen in the past. In contrast, if you look at the problem of email spam detection, well, there are many different types of spam email, but even over many years. Spam emails keep on trying to sell similar things or get you to go to similar websites and so on. Spam email that you will get in the next few days is much more likely to be similar to spam emails that you have seen in the past. So that's why supervised learning works well for spam because it's trying to detect more of the types of spam emails that you have probably seen in the past in your training set. Whereas if you're trying to detect brand new types of fraud that have never been seen before in the nominate section, maybe more applicable. Let's go through a few more examples. We have already seen fraud detection being one use case of anomaly detection. Although supervised learning is used to find previously observed forms of fraud. And we've seen email spam classification typically being address using supervised learning.
Subscribe to our channel for more computer science related tutorials| https://www.youtube.com/@learnwithcoursera