3. Understanding Data, Attributes & Data Objects in Machine Learning


In our previous article, we explored the types of machine learning and their basic workflow. However, a strong algorithm is only as good as the data it's trained on. Garbage in, garbage out - a flawed dataset can lead to poor model performance, regardless of the algorithm's complexity. Therefore, understanding the data, its attributes, and data objects is crucial for building effective models. In this article, we'll dive deeper into the importance of data understanding and explore the key concepts that will help you work with data more effectively.
Data
In simple words, data is the raw information that we feed into a machine learning model so it can learn patterns.
Data is usually arranged in a tabular format – like a spreadsheet – made up of rows and columns:
Rows = different examples or instances of data
Columns = different attributes or features that describe the data
Attributes
Attributes are the properties or features of a data object.
Each attribute describes an aspect of the object. In tabular data, these are the columns.
Data Object
A data object is a single record or row in the dataset.
It’s one complete unit of data that contains values for all the attributes.
You might also hear it called a Data point, Instance, Record, Sample. All of these are synonyms for one row in your dataset.
Juice Store Example
Let’s make this real with an example.
Imagine you own a juice shop and are collecting customer data.
Customer ID | Age | Favorite Juice | Quantity | Purchased on Weekend? |
101 | 25 | Orange | 2 | Yes |
102 | 30 | Mango | 1 | No |
103 | 22 | Apple | 3 | Yes |
🔍 In this example:
The whole table is your data
Each row (e.g., Customer ID 101) is a data object
Each column (like
Age
,Favorite Juice
, etc.) is an attribute
Let’s understand these concept in more detail by types attributes
Types of Attributes in Machine Learning
When you're working with machine learning models, it's not enough to just collect data — you must understand what kind of data you're dealing with. Why? Because the type of attribute (or feature) determines:
Which algorithms you can use,
How you preprocess the data,
And what kind of insights or predictions your model can produce.
Let’s break it down with definitions, detailed explanations, and practical examples.
1- Nominal Attributes — Labels without Order
Nominal attributes are categorical values that represent different names or labels, and there is no inherent order or ranking among them.
📌 Key Characteristics:
These are used to classify or group things.
Values are mutually exclusive — a data point can only belong to one category.
You cannot perform mathematical operations like average or sort meaningfully.
Example:
Favorite Juice: Apple, Mango, Orange
These are just names of juice flavors. You can't say Apple > Mango — there’s no ranking, just labels.
In Machine Learning: These are usually encoded using One-Hot Encoding or Label Encoding before being used in algorithms.
2- Binary Attributes — Yes or No, On or Off
Binary attributes are a special case of nominal attributes where there are only two possible values.
📌 Key Characteristics:
Represent two categories: True/False, Yes/No, 0/1
Very common in real-world problems
Example:
Purchased on the Weekend? → Yes or No
Or encoded as: 1 = Yes, 0 = No
In Machine Learning: Binary attributes are often used directly (as 0 or 1) in models and are great for decision trees, logistic regression, etc.
3- Ordinal Attributes — Categories with Order
Ordinal attributes are like nominal attributes, but with an added twist — the categories have a meaningful order or ranking.
📌 Key Characteristics:
Categories can be ranked, but the difference between ranks isn't measurable.
You can sort the data, but you can’t calculate differences like "Medium is 2 units more than Small."
Example:
Juice Size: Small, Medium, Large
Clearly, there's an order — Small < Medium < Large — but the gap between sizes isn’t numerically defined.
In Machine Leaning: Ordinal data is often encoded using integers (e.g., Small = 1, Medium = 2, Large = 3), but care must be taken not to treat them as purely numeric unless the algorithm supports it (like tree-based models).
Numeric Attributes — Measurable Quantities
Numeric attributes represent quantitative values and can be measured. These are the most straightforward data types when it comes to mathematics.
📌 Subtypes:
There are two kinds of numeric data:
Discrete – Countable values (e.g., number of items)
Continuous – Measurable values with infinite possibilities (e.g., height, weight)
Example:
Age: 25, 30, 40 (continuous)
Quantity: 1, 2, 3 juices (discrete)
In ML: Numeric data can be used directly in many algorithms. You might still apply techniques like normalization or scaling to standardize values for better performance
🧃 Putting It All Together: Juice Store Dataset Revisited
Customer ID | Age | Favorite Juice | Quantity | Purchased on Weekend? | Juice Size |
101 | 25 | Orange | 2 | Yes | Medium |
102 | 30 | Mango | 1 | No | Large |
103 | 22 | Apple | 3 | Yes | Small |
Attribute | Type |
Age | Numeric |
Favorite Juice | Nominal |
Quantity | Numeric (Discrete) |
Purchased on Weekend? | Binary |
Juice Size | Ordinal |
Data Preprocessing and Attribute Transformation
Understanding the type of attribute is crucial before applying machine learning models. Here’s how attributes influence model performance:
Nominal and Binary attributes: Often require conversion into numerical values to facilitate model training.
Ordinal attributes: Need to be properly transformed to reflect their order.
Numeric attributes: Are directly used by models but might require scaling or normalization.
Conclusion
Understanding your data is the first real step toward building smart machine learning models. Knowing what attributes are — and how they work — helps you prepare your data correctly, choose the right techniques, and avoid mistakes.
Remember:
Good data = Good model. Bad data = Bad results.
Master these basics, and you’ll build a strong foundation for everything else in your ML journey.
Subscribe to my newsletter
Read articles from Muhammad Fahad Bashir directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
