Unsupervised Learning
Last updated
Last updated
Unsupervised learning is where you only have input data (X) and no corresponding output variables.
The goal of unsupervised learning is to model the underlying structure or distribution of the data in order to learn more about the data.
These are called unsupervised learning because unlike supervised learning there are no correct answers and there is no teacher. Algorithms are left to their own devices to discover and present an interesting structure in the data.
Clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group than those in other groups. In simple words, the aim is to segregate groups with similar traits and assign them into clusters.
Let’s understand this with an example. Suppose, you are the head of a rental store and wish to understand the preferences of your customers to scale up your business. Is it possible for you to look at the details of each customer and devise a unique business strategy for each one of them? Definitely not. But, what you can do is to cluster all of your customers into say 10 groups based on their purchasing habits and use a separate strategy for customers in each of these 10 groups. And this is what we call clustering.
Blind Source Separation is the separation of a set of source signals from a set of mixed signals.
My favorite example of this problem is known as the cocktail party problem where a number of people are talking simultaneously and we want to separate each person's speech so we can listen to it separately. Now the caveat with this type of approach is that we need as many mixtures as we have source signals or in terms of the cocktail party problem we need as many microphones as people talking in the room.