What are supervised learning and unsupervised learning?

Mathew Farmer
Mathew Farmer
AI ethics consultant and policy advisor. AI伦理顾问兼政策专家。AI倫理コンサルタント、政策顧問。Berater für KI-Ethik und Politik.

Okay, no problem. This concept is actually very easy to understand with real-life examples.


Supervised Learning: Learning with a "Teacher"

Imagine how you learned to identify fruits when you were a child.

Your mom would pick up an apple and tell you: "Look, this red, round thing is an 'apple' 🍎." Then she'd pick up a banana and tell you: "This yellow, long thing is a 'banana' 🍌."

In this process, your "mom" is the teacher, providing you with "objects (fruits)" and "answers (names)". After extensive practice of "identification," you learned. The next time you saw an apple, even a variety you hadn't seen before, you could recognize it: "This is an apple!"

That's the essence of supervised learning.

We feed machine learning algorithms a bunch of data that already has "answers" or labels. For example, thousands of images, some labeled "cat," others "dog." The machine learns the relationship between these "images" and "labels" to deduce patterns.

  • Learning Material: Labeled data (e.g., images + "cat"/"dog" labels)
  • Learning Goal: To predict. When a new image (without a label) comes in, it can accurately predict whether its label is "cat" or "dog."

Real-life examples:

  • Spam filtering: When you manually mark an email as "spam," you're providing labeled data to the algorithm.
  • Facial recognition: Training a model with photos you've already tagged with names.
  • House price prediction: Learning based on house "features" like "area, location, year" and their corresponding "prices" (labels).

Unsupervised Learning: Discovering "Patterns" on Your Own

Now, let's consider a different scenario.

Suppose you've never seen fruits before, and someone gives you a large basket full of apples, bananas, oranges, and grapes, but no one tells you what each one is called.

What would you do? You might figure it out yourself and categorize them based on certain characteristics.

"Hmm... I'll put these red, round ones in one pile. Those yellow, long ones in another. And these purple, clustered ones in yet another pile."

Even though you don't know their names, you independently discovered patterns and structures within the data, grouping similar items together.

That's the essence of unsupervised learning.

We only give the machine a pile of data, without any "answers" or "labels." The machine needs to figure out on its own to discover hidden patterns or groups within the data.

  • Learning Material: Unlabeled data (e.g., a large collection of user purchase records)
  • Learning Goal: To discover structure. For example, automatically segmenting users with similar behaviors into different groups.

Real-life examples:

  • Customer segmentation: E-commerce websites automatically group users into different segments like "high-potential spenders," "price-sensitive," or "discount lovers" based on their purchase history and browsing behavior, for targeted marketing.
  • News aggregation: News apps automatically group different news articles reporting on the same event.
  • Anomaly detection: Automatically identifying "suspicious transactions" that deviate from normal patterns within large volumes of transaction data.

In a nutshell

  • Supervised Learning: Giving the machine a workbook with standard answers (data + labels) to teach it to predict.
  • Unsupervised Learning: Giving the machine a pile of disorganized material (data only) and letting it discover classifications and patterns on its own.