Handling Categorical Data in R: A Practical Guide


Categorical data is data that takes on a limited, fixed set of values. For example, instead of recording a person’s exact age as a number, we might categorize them as “Child”, “Adult”, or “Senior”. These categories are easier to interpret but come with their own structure and challenges.
Before diving into handling categorical data, it’s important to distinguish between its two main types:
Ordinal Data – Categories that have a natural order (e.g., small < medium < large).
Nominal Data – Categories without any inherent order (e.g., dumbbell, grippers, gloves).
Both forms are common in analytics and machine learning, especially in classification problems, where the output itself is categorical (e.g., churn vs. not churn, profitable vs. not profitable).
In this guide, we’ll walk through how to transform, summarize, and analyze categorical data in R using functions from base R and popular packages.
Converting Numerical Data into Categories
Often, numerical data is converted into categories for easier interpretation. For example, instead of using raw Sepal.Length
values from the iris dataset, we can group them into bins.
Using cut()
and split()
# Load iris dataset
x <- iris
# Split Sepal.Length into 3 ranges
list1 <- split(x, cut(x$Sepal.Length, 3))
summary(list1)
Here, cut()
divides the range of Sepal.Length
into 3 equal-width intervals.
Using cut2()
from Hmisc
library(Hmisc)
# Split into 3 groups with roughly equal counts
list2 <- split(x, cut2(x$Sepal.Length, g = 3))
summary(list2)
Difference:
cut()
→ equal rangescut2()
→ equal number of values per group
Adding Categories as New Columns
Instead of creating lists, we can add categories directly to the dataset:
x$class <- cut(x$Sepal.Length, 3)
x$class2 <- cut2(x$Sepal.Length, g = 3)
If you prefer numeric labels:
x$class <- as.numeric(x$class)
Now class
takes values 1, 2, or 3.
Counting Category Sizes
Using table()
class_length <- table(x$class)
class_length
Output:
1 2 3
59 71 20
To convert into a DataFrame:
class_length_df <- as.data.frame(class_length)
names(class_length_df)[1] <- "group"
class_length_df
Using count()
from plyr (Cleaner)
library(plyr)
class_length2 <- count(x, "class")
class_length2
Output:
class freq
1 1 59
2 2 71
3 3 20
✅ Advantage: Directly returns a clean DataFrame, skipping the renaming hassle.
Comparing table()
vs. count()
table()
– quick summary, but includes all possible combinations (even with 0 counts).count()
– skips 0-count combinations, giving a cleaner output.
Example with two variables (class
and class2
):
# table()
two_way <- as.data.frame(table(x$class, x$class2))
# plyr::count()
two_way_count <- count(x, c("class", "class2"))
👉 count()
omits zero-frequency rows, making the output easier to interpret.
Cross-Tabulation
If you prefer cross-tabulated outputs:
cross_tab <- xtabs(~ class + class2, x)
cross_tab
Output is an xtabs
object (table). For larger N-way tables:
threeway_cross_tab <- xtabs(~ class + class2 + Species, x)
threeway_cross_tab
Downside: readability decreases as dimensions grow.
Cleaner Alternative with count()
threeway_cross_tab_df <- count(x, c("class", "class2", "Species"))
threeway_cross_tab_df
This produces a neat DataFrame with non-zero counts only, making it much easier to work with.
Key Takeaways
Use
cut()
for equal-width bins,cut2()
for equal-size groups.Add categories as new columns instead of splitting into lists.
table()
is good for quick summaries, but requires cleanup.count()
from plyr is more flexible, faster, and cleaner for categorical summaries.For multi-way frequency tables,
count()
provides concise results compared toxtabs()
ortable()
.Our mission is to help organizations unlock the full potential of their data. Over the past two decades, we’ve partnered with Fortune 500 companies and mid-sized firms alike to address complex analytics challenges and deliver measurable results. Our expertise spans across Tableau consultants, Looker Consultant, and AI Consulting, empowering businesses to transform raw information into strategic insights that drive growth and efficiency.
Subscribe to my newsletter
Read articles from Dipti M directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
