Handling Categorical Data in R: A Practical Guide

Dipti MDipti M
3 min read

Categorical data is data that takes on a limited, fixed set of values. For example, instead of recording a person’s exact age as a number, we might categorize them as “Child”, “Adult”, or “Senior”. These categories are easier to interpret but come with their own structure and challenges.

Before diving into handling categorical data, it’s important to distinguish between its two main types:

  • Ordinal Data – Categories that have a natural order (e.g., small < medium < large).

  • Nominal Data – Categories without any inherent order (e.g., dumbbell, grippers, gloves).

Both forms are common in analytics and machine learning, especially in classification problems, where the output itself is categorical (e.g., churn vs. not churn, profitable vs. not profitable).

In this guide, we’ll walk through how to transform, summarize, and analyze categorical data in R using functions from base R and popular packages.


Converting Numerical Data into Categories

Often, numerical data is converted into categories for easier interpretation. For example, instead of using raw Sepal.Length values from the iris dataset, we can group them into bins.

Using cut() and split()

# Load iris dataset
x <- iris  

# Split Sepal.Length into 3 ranges
list1 <- split(x, cut(x$Sepal.Length, 3))  
summary(list1)

Here, cut() divides the range of Sepal.Length into 3 equal-width intervals.

Using cut2() from Hmisc

library(Hmisc)

# Split into 3 groups with roughly equal counts
list2 <- split(x, cut2(x$Sepal.Length, g = 3))  
summary(list2)

Difference:

  • cut() → equal ranges

  • cut2() → equal number of values per group


Adding Categories as New Columns

Instead of creating lists, we can add categories directly to the dataset:

x$class  <- cut(x$Sepal.Length, 3)
x$class2 <- cut2(x$Sepal.Length, g = 3)

If you prefer numeric labels:

x$class <- as.numeric(x$class)

Now class takes values 1, 2, or 3.


Counting Category Sizes

Using table()

class_length <- table(x$class)
class_length

Output:

1  2  3  
59 71 20

To convert into a DataFrame:

class_length_df <- as.data.frame(class_length)
names(class_length_df)[1] <- "group"
class_length_df

Using count() from plyr (Cleaner)

library(plyr)

class_length2 <- count(x, "class")
class_length2

Output:

  class freq
1     1   59
2     2   71
3     3   20

✅ Advantage: Directly returns a clean DataFrame, skipping the renaming hassle.


Comparing table() vs. count()

  • table() – quick summary, but includes all possible combinations (even with 0 counts).

  • count() – skips 0-count combinations, giving a cleaner output.

Example with two variables (class and class2):

# table()
two_way <- as.data.frame(table(x$class, x$class2))

# plyr::count()
two_way_count <- count(x, c("class", "class2"))

👉 count() omits zero-frequency rows, making the output easier to interpret.


Cross-Tabulation

If you prefer cross-tabulated outputs:

cross_tab <- xtabs(~ class + class2, x)
cross_tab

Output is an xtabs object (table). For larger N-way tables:

threeway_cross_tab <- xtabs(~ class + class2 + Species, x)
threeway_cross_tab

Downside: readability decreases as dimensions grow.


Cleaner Alternative with count()

threeway_cross_tab_df <- count(x, c("class", "class2", "Species"))
threeway_cross_tab_df

This produces a neat DataFrame with non-zero counts only, making it much easier to work with.


Key Takeaways

  • Use cut() for equal-width bins, cut2() for equal-size groups.

  • Add categories as new columns instead of splitting into lists.

  • table() is good for quick summaries, but requires cleanup.

  • count() from plyr is more flexible, faster, and cleaner for categorical summaries.

  • For multi-way frequency tables, count() provides concise results compared to xtabs() or table().

  • Our mission is to help organizations unlock the full potential of their data. Over the past two decades, we’ve partnered with Fortune 500 companies and mid-sized firms alike to address complex analytics challenges and deliver measurable results. Our expertise spans across Tableau consultants, Looker Consultant, and AI Consulting, empowering businesses to transform raw information into strategic insights that drive growth and efficiency.

0
Subscribe to my newsletter

Read articles from Dipti M directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Dipti M
Dipti M