Three faces of machine learning: Un-, Semi- and Supervised Learning


Please note the blog posts on IreneAldridge.com may contain sponsored or promotional content. To sponsor or promote content on irenealdridge.com, please click here.

You probably knew there were different types of machine learning, but did you know that there is supervised, semi-supervised, and unsupervised learning? This brief note aims to clarify the confusion among the three.

You have probably heard of machine learning as the technology that powers everything from Netflix recommendations to self-driving cars. But did you know that machine learning comes in different flavors? The way algorithms learn can be categorized into three main approaches: supervised, semi-supervised, and unsupervised learning. Understanding these differences is crucial for anyone looking to implement or work with AI systems.

Supervised learning is like having a teacher who provides both questions and answers. Suppose that you have inputs X (can be several vectors, known as features) and outputs y. You can recognize these as standard equation tools: something like y = f(X), where f(.) denotes a function. In data science, vector y is known as labels. In a labeled dataset, there is y for every row of X.

In this approach, algorithms learn from labeled datasets X where the desired output y is already known. For example, a spam filter trained on thousands of emails X marked as either y=”spam” or y=”not spam” uses these labels to learn how to classify new incoming emails. Other common applications include image recognition, where algorithms learn to identify objects after being shown thousands of labeled images, and predictive analytics, where historical data with known outcomes help predict future events. The strength of supervised learning lies in its precision and reliability, though it requires extensive labeled data, which can be expensive and time-consuming to gather or create.

Semi-supervised learning strikes a balance by using both labeled and unlabeled data. In other words, some of the y may be missing (see Figure 1). Think of it as having a teacher who provides answers to some questions but leaves others open for discovery. This approach is particularly valuable when labeled data are scarce or expensive to obtain. For example, in medical image analysis, experts might label a small subset of images, while the algorithm leverages patterns from a larger pool of unlabeled images to improve its performance. Semi-supervised learning can achieve results comparable to fully supervised methods, while requiring significantly less labeled data, making it both cost-effective and practical for many real-world applications.

Unsupervised learning operates without labeled training data. There is no y! Unsupervised algorithms explore data independently to discover hidden patterns or structures. This approach is similar to learning without a teacher, where the algorithm must find its own way to make sense of the information. Clustering algorithms, for example, group similar data points together based on inherent similarities, while dimensionality reduction techniques identify the most important features in complex datasets. Anomaly detection systems use unsupervised learning to spot unusual patterns that might indicate fraud or system failures. The power of unsupervised learning lies in its ability to reveal insights that humans might not have anticipated, though these insights can sometimes be difficult to interpret or validate.

Each learning approach has its place in the AI ecosystem, with the choice depending on data availability, problem complexity, and desired outcomes. Supervised learning excels when you know exactly what you are looking for and have plenty of examples. Semi-supervised learning bridges gaps when labeled data are limited. Unsupervised learning shines when you are exploring uncharted territory or looking for hidden patterns. As machine learning continues to evolve, hybrid approaches that combine elements of all three types are becoming increasingly common, offering the best of all worlds. Understanding these fundamental distinctions empowers developers, businesses, and individuals to make informed decisions about which approach best suits their specific needs and constraints.


Subscribe Now
More Real-Time Research
ADVERTISEMENT