What is a feature? What does it refer to in AI?

陽一 和也
陽一 和也

Okay, no problem. Imagine you're describing an apple to a friend who has never seen one. What would you say?

You might say:

  • It's red.
  • It's round in shape.
  • It feels smooth.
  • It's about the size of a fist.

Here, 'color,' 'shape,' 'texture,' and 'size' are the features we use to describe the apple.


What are Features in Artificial Intelligence?

In the field of Artificial Intelligence (especially Machine Learning), the concept of a feature is exactly the same. It is a measurable, useful attribute used to describe a data object.

You can think of it this way: computers don't 'understand' an image or a piece of text in the same way humans do. We must break down this complex information into quantifiable 'description points' that a computer can understand. These 'description points' are features.

Example: Spam Detection

Suppose we want to train an AI model to automatically detect spam emails. AI doesn't 'read' words or understand email content. We need to provide it with 'features' to help it make judgments. For an email, we can extract the following features:

  • Feature 1: Is the sender in your contact list? (Can be 1 for 'yes', 0 for 'no')
  • Feature 2: Does the email subject contain words like 'free,' 'win,' or 'make money'? (Can be 1 for 'yes', 0 for 'no')
  • Feature 3: How many exclamation marks are in the email body? (Can be a specific number, e.g., 5)
  • Feature 4: How many links are in the email? (Can be a number, e.g., 3)
  • Feature 5: Was the email sent at 3 AM? (Can be 1 for 'yes', 0 for 'no')

We convert thousands of emails into a set of numbers based on these features, and then tell the AI: "Look, emails with these number combinations (features) are spam, and those are normal emails."

The AI will learn from this massive amount of data and summarize patterns on its own. For example, it might discover: "If an email's sender is not in the contact list, and the subject contains 'free,' and there are more than 5 exclamation marks in the content, then there's a 99% chance it's spam."

When a new email arrives later, the AI will automatically extract these features and make a judgment based on the patterns it has learned.

In Summary

  • For humans: Features are the adjectives and attributes we use to describe things.
  • For AI: Features are a set of numbers into which raw data (like images, text, sound) is converted. These numbers represent certain key information from the raw data and are the sole basis for AI's learning and judgment.

The choice of features directly determines the effectiveness of an AI model. Choosing the right features can yield twice the results with half the effort; choosing the wrong ones might render the model useless. This process of selecting and creating features is known in the industry as "feature engineering," and it is a very important part of machine learning.