3. Understanding Data, Attributes & Data Objects in Machine Learning

In our previous article, we explored the types of machine learning and their basic workflow. However, a strong algorithm is only as good as the data it's trained on. Garbage in, garbage out - a flawed dataset can lead to poor model performance, regardless of the algorithm's complexity. Therefore, understanding the data, its attributes, and data objects is crucial for building effective models. In this article, we'll dive deeper into the importance of data understanding and explore the key concepts that will help you work with data more effectively.

Data

In simple words, data is the raw information that we feed into a machine learning model so it can learn patterns.

Data is usually arranged in a tabular format – like a spreadsheet – made up of rows and columns:

  • Rows = different examples or instances of data

  • Columns = different attributes or features that describe the data

Attributes

Attributes are the properties or features of a data object.
Each attribute describes an aspect of the object. In tabular data, these are the columns.

Data Object

A data object is a single record or row in the dataset.
It’s one complete unit of data that contains values for all the attributes.

You might also hear it called a Data point, Instance, Record, Sample. All of these are synonyms for one row in your dataset.

Juice Store Example

Let’s make this real with an example.

Imagine you own a juice shop and are collecting customer data.

Customer IDAgeFavorite JuiceQuantityPurchased on Weekend?
10125Orange2Yes
10230Mango1No
10322Apple3Yes

🔍 In this example:

  • The whole table is your data

  • Each row (e.g., Customer ID 101) is a data object

  • Each column (like Age, Favorite Juice, etc.) is an attribute


Let’s understand these concept in more detail by types attributes

Types of Attributes in Machine Learning

When you're working with machine learning models, it's not enough to just collect data — you must understand what kind of data you're dealing with. Why? Because the type of attribute (or feature) determines:

  • Which algorithms you can use,

  • How you preprocess the data,

  • And what kind of insights or predictions your model can produce.

Let’s break it down with definitions, detailed explanations, and practical examples.

1- Nominal AttributesLabels without Order

Nominal Data 101: Definition, Examples, and Analysis

Nominal attributes are categorical values that represent different names or labels, and there is no inherent order or ranking among them.

📌 Key Characteristics:

  • These are used to classify or group things.

  • Values are mutually exclusive — a data point can only belong to one category.

  • You cannot perform mathematical operations like average or sort meaningfully.

Example:

Favorite Juice: Apple, Mango, Orange

These are just names of juice flavors. You can't say Apple > Mango — there’s no ranking, just labels.

In Machine Learning: These are usually encoded using One-Hot Encoding or Label Encoding before being used in algorithms.

2- Binary AttributesYes or No, On or Off

Yes/No Communication Support Cards (teacher made) - Twinkl

Binary attributes are a special case of nominal attributes where there are only two possible values.

📌 Key Characteristics:

  • Represent two categories: True/False, Yes/No, 0/1

  • Very common in real-world problems

Example:

Purchased on the Weekend? → Yes or No
Or encoded as: 1 = Yes, 0 = No

In Machine Learning: Binary attributes are often used directly (as 0 or 1) in models and are great for decision trees, logistic regression, etc.

3- Ordinal AttributesCategories with Order

Ordinal attributes are like nominal attributes, but with an added twist — the categories have a meaningful order or ranking.

📌 Key Characteristics:

  • Categories can be ranked, but the difference between ranks isn't measurable.

  • You can sort the data, but you can’t calculate differences like "Medium is 2 units more than Small."

Example:

Juice Size: Small, Medium, Large
Clearly, there's an order — Small < Medium < Large — but the gap between sizes isn’t numerically defined.

In Machine Leaning: Ordinal data is often encoded using integers (e.g., Small = 1, Medium = 2, Large = 3), but care must be taken not to treat them as purely numeric unless the algorithm supports it (like tree-based models).

Numeric AttributesMeasurable Quantities

12 Visualizations to Show a Single Number - Displayr

Numeric attributes represent quantitative values and can be measured. These are the most straightforward data types when it comes to mathematics.

📌 Subtypes:

There are two kinds of numeric data:

  • Discrete – Countable values (e.g., number of items)

  • Continuous – Measurable values with infinite possibilities (e.g., height, weight)

Example:

  • Age: 25, 30, 40 (continuous)

  • Quantity: 1, 2, 3 juices (discrete)

In ML: Numeric data can be used directly in many algorithms. You might still apply techniques like normalization or scaling to standardize values for better performance

🧃 Putting It All Together: Juice Store Dataset Revisited

Customer IDAgeFavorite JuiceQuantityPurchased on Weekend?Juice Size
10125Orange2YesMedium
10230Mango1NoLarge
10322Apple3YesSmall
AttributeType
AgeNumeric
Favorite JuiceNominal
QuantityNumeric (Discrete)
Purchased on Weekend?Binary
Juice SizeOrdinal

Data Preprocessing and Attribute Transformation

Understanding the type of attribute is crucial before applying machine learning models. Here’s how attributes influence model performance:

  • Nominal and Binary attributes: Often require conversion into numerical values to facilitate model training.

  • Ordinal attributes: Need to be properly transformed to reflect their order.

  • Numeric attributes: Are directly used by models but might require scaling or normalization.

Conclusion

Understanding your data is the first real step toward building smart machine learning models. Knowing what attributes are — and how they work — helps you prepare your data correctly, choose the right techniques, and avoid mistakes.

Remember:

Good data = Good model. Bad data = Bad results.

Master these basics, and you’ll build a strong foundation for everything else in your ML journey.

2
Subscribe to my newsletter

Read articles from Muhammad Fahad Bashir directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Muhammad Fahad Bashir
Muhammad Fahad Bashir