In our previous article, we explored the types of machine learning and their basic workflow. However, a strong algorithm is only as good as the data it's trained on. Garbage in, garbage out - a flawed dataset can lead to poor model performance, regardless of the algorithm's complexity. Therefore, understanding the data, its attributes, and data objects is crucial for building effective models. In this article, we'll dive deeper into the importance of data understanding and explore the key concepts that will help you work with data more effectively.

Data

In simple words, data is the raw information that we feed into a machine learning model so it can learn patterns.

Data is usually arranged in a tabular format – like a spreadsheet – made up of rows and columns:

Rows = different examples or instances of data
Columns = different attributes or features that describe the data

Attributes

Attributes are the properties or features of a data object.
Each attribute describes an aspect of the object. In tabular data, these are the columns.

Data Object

A data object is a single record or row in the dataset.
It’s one complete unit of data that contains values for all the attributes.

You might also hear it called a Data point, Instance, Record, Sample. All of these are synonyms for one row in your dataset.

Juice Store Example

Let’s make this real with an example.

Imagine you own a juice shop and are collecting customer data.

Customer ID	Age	Favorite Juice	Quantity	Purchased on Weekend?
101	25	Orange	2	Yes
102	30	Mango	1	No
103	22	Apple	3	Yes

🔍 In this example:

The whole table is your data
Each row (e.g., Customer ID 101) is a data object
Each column (like Age, Favorite Juice, etc.) is an attribute

Let’s understand these concept in more detail by types attributes

Types of Attributes in Machine Learning

When you're working with machine learning models, it's not enough to just collect data — you must understand what kind of data you're dealing with. Why? Because the type of attribute (or feature) determines:

Which algorithms you can use,
How you preprocess the data,
And what kind of insights or predictions your model can produce.

Let’s break it down with definitions, detailed explanations, and practical examples.

1- Nominal Attributes — Labels without Order

Nominal Data 101: Definition, Examples, and Analysis

Nominal attributes are categorical values that represent different names or labels, and there is no inherent order or ranking among them.

📌 Key Characteristics:

These are used to classify or group things.
Values are mutually exclusive — a data point can only belong to one category.
You cannot perform mathematical operations like average or sort meaningfully.

Example:

Favorite Juice: Apple, Mango, Orange

These are just names of juice flavors. You can't say Apple > Mango — there’s no ranking, just labels.

In Machine Learning: These are usually encoded using One-Hot Encoding or Label Encoding before being used in algorithms.

2- Binary Attributes — Yes or No, On or Off

Yes/No Communication Support Cards (teacher made) - Twinkl

Binary attributes are a special case of nominal attributes where there are only two possible values.

📌 Key Characteristics:

Represent two categories: True/False, Yes/No, 0/1
Very common in real-world problems

Example:

Purchased on the Weekend? → Yes or No
Or encoded as: 1 = Yes, 0 = No

In Machine Learning: Binary attributes are often used directly (as 0 or 1) in models and are great for decision trees, logistic regression, etc.

3- Ordinal Attributes — Categories with Order

Ordinal attributes are like nominal attributes, but with an added twist — the categories have a meaningful order or ranking.

📌 Key Characteristics:

Categories can be ranked, but the difference between ranks isn't measurable.
You can sort the data, but you can’t calculate differences like "Medium is 2 units more than Small."

Example:

Juice Size: Small, Medium, Large
Clearly, there's an order — Small < Medium < Large — but the gap between sizes isn’t numerically defined.

In Machine Leaning: Ordinal data is often encoded using integers (e.g., Small = 1, Medium = 2, Large = 3), but care must be taken not to treat them as purely numeric unless the algorithm supports it (like tree-based models).

Numeric Attributes — Measurable Quantities

12 Visualizations to Show a Single Number - Displayr

Numeric attributes represent quantitative values and can be measured. These are the most straightforward data types when it comes to mathematics.

📌 Subtypes:

There are two kinds of numeric data:

Discrete – Countable values (e.g., number of items)
Continuous – Measurable values with infinite possibilities (e.g., height, weight)

Example:

Age: 25, 30, 40 (continuous)
Quantity: 1, 2, 3 juices (discrete)

In ML: Numeric data can be used directly in many algorithms. You might still apply techniques like normalization or scaling to standardize values for better performance

🧃 Putting It All Together: Juice Store Dataset Revisited

Customer ID	Age	Favorite Juice	Quantity	Purchased on Weekend?	Juice Size
101	25	Orange	2	Yes	Medium
102	30	Mango	1	No	Large
103	22	Apple	3	Yes	Small

Attribute	Type
Age	Numeric
Favorite Juice	Nominal
Quantity	Numeric (Discrete)
Purchased on Weekend?	Binary
Juice Size	Ordinal

Data Preprocessing and Attribute Transformation

Understanding the type of attribute is crucial before applying machine learning models. Here’s how attributes influence model performance:

Nominal and Binary attributes: Often require conversion into numerical values to facilitate model training.
Ordinal attributes: Need to be properly transformed to reflect their order.
Numeric attributes: Are directly used by models but might require scaling or normalization.

Conclusion

Understanding your data is the first real step toward building smart machine learning models. Knowing what attributes are — and how they work — helps you prepare your data correctly, choose the right techniques, and avoid mistakes.

Remember:

Good data = Good model. Bad data = Bad results.

Master these basics, and you’ll build a strong foundation for everything else in your ML journey.

3. Understanding Data, Attributes & Data Objects in Machine Learning

Table of contents

Data

Attributes

Data Object

Juice Store Example

Types of Attributes in Machine Learning

1- Nominal Attributes — Labels without Order

2- Binary Attributes — Yes or No, On or Off

3- Ordinal Attributes — Categories with Order

Numeric Attributes — Measurable Quantities

🧃 Putting It All Together: Juice Store Dataset Revisited

Data Preprocessing and Attribute Transformation

Conclusion

Subscribe to my newsletter

Muhammad Fahad Bashir

Muhammad Fahad Bashir