Community Scheduling: An Example of TURF Analysis

Chris ChapmanChris Chapman
27 min read

Today I’ll take a brief look at TURF analysis and show a simple example.

At a high level, TURF answers this question: after we do the #1 best thing — whether that is a product, message, placement, etc. — that customers desire most, what should we do next to reach additional customers, beyond those who want the #1 thing? It is not necessarily the #2 thing from our list.

TURF stands for “total unduplicated reach and frequency” (Miaoulis et al, 1990). That means that it find combinations of items that will reach the largest total number of people, while having as few duplications [multiple exposures per person] as possible . And, secondarily, it will maximize the frequency — average number of reaches per person — after reaching the most unique people.

A classic TURF example is to reach as many unique viewers as possible with advertisements. If you ignore the question of pricing, you would want first of all to place an ad into the most popular channel (TV show, magazine, etc.) That would have the largest audience (“reach”) for a single placement. But after that placement, what is the best second choice? Is it the 2nd most popular channel? Not necessarily, because the second most popular channel may have high overlap with the first channel. For instance, visitors to the most popular websites also tend to visit many of the most popular websites; they are heavy internet users across many sites.

If you just go by ranked popularity, you would often be advertising to — reaching — the same people again. It may be better to go farther down the list of popularity and find a channel that is smaller but that reaches a unique audience, relative to the #1 channel.

My example is this: when scheduling practice times for an online Zen group, what is the best set of times that will reach the most people, so at least one time in the set will work for as many people as possible? Respondents took a survey and said when they are available. Given that, I want to find a small number of times that will make practice available to as many unique people as possible. TURF is a way to do that.

In this post, I share a data set — after protecting privacy as noted below — and R code for TURF analysis. In this data set, it would be possible to do TURF by hand. However, it is a great example for R analysis, and would easily scale to larger data sets. As always, I share R code along the way and compile it at the end.


The Data

I fielded a survey that asks respondents what times they are available on weekdays. Each respondent answered for their local time zone, which was later converted to standard time. (BTW, I also asked about weekends, and preferred days of the week. But for this post, we’ll only look at weekday times.) The results are a grid of availability by respondent, adjusted for time zone, mapped to 24 hours of the day.

Unlike many of my surveys, these data are private. However, I have created a “permuted,” simulated data set that I can share. It preserves enough high-level characteristics to give an identical TURF answer.

More detail. In these data, using random permutation of rows and columns, no observation is identical to the original data, and all sets of individual answers — and the relationships among times for every person — are altered randomly. This way, the data set preserves privacy while also reproducing the overall counts of availability by row and column. It gives identical TURF answers as my real data. The permuted data set was generated by the R vegan::permatswap() function (Oksanen et al, 2025).

Why don’t I just use fake data? Because this way, I can share results with the Zen community as well as the Quant community, and show exactly how the schedule was decided.

The permuted data set is small enough to share it in code, with no need to download it. I start with output obtained from the R dput() function, giving the data in text format. To create our data set hourgrid:

# the following was obtained from R dput() after survey import, data permutation, etc.
hourgrid <- structure(c(0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 
  0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 
  0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 
  0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 
  0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 
  0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
  0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 
  0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 
  0L, 1L, 0L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 
  0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 
  0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 
  0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 
  1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
  0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), 
  dim = c(39L, 24L), 
  dimnames = list(
    c("1", "2", "3", "4", "5", "6", "7", "8", 
      "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", 
      "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", 
      "31", "32", "33", "34", "35", "36", "37", "38", "39"), 
    c("1:00", "2:00", "3:00", "4:00", "5:00", "6:00", "7:00", "8:00", "9:00", 
      "10:00", "11:00", "12:00", "13:00", "14:00", "15:00", "16:00", 
      "17:00", "18:00", "19:00", "20:00", "21:00", "22:00", "23:00", "0:00")
    )
  )

Next I structure it as a data.frame and clean it up a bit:

# set that up as a nice dataframe and keep colnames as "hours"
tmp.names       <- colnames(hourgrid)   # save the names instead of what data.frame() does
hourgrid        <- data.frame(hourgrid)
names(hourgrid) <- tmp.names     
hourgrid$ID     <- 1:nrow(hourgrid) # add ID variable needed later by TURF
# check its structure
head(hourgrid)

With the head() command, we see that each row shows a person’s availability for each hour of the day. It is coded as 1 if they said they are available at that time. (as noted, it asked for responses in the local time for each respondent. Those were standardized into Pacific time before this analysis started.)

We can find the most popular times by summing the 24 columns representing each hour:

# count of preferred times (omitting ID column)
colSums(hourgrid[ , 1:24])

We see that 09:00 and 15:00 (not shown here) are the most popular times, followed by 10:00 and 12:00:

Even better is a data visualization. For this data set, we can do a heatmap of times by respondent. Here’s R code for that:

library(ggplot2)
library(reshape2)
# melt the data for nice ggplot structure
hours.m <- melt(subset(hourgrid, rowSums(hourgrid[ , 1:24]) > 0), 
                id.vars = "ID")
names(hours.m) <- c("Respondent", "Time", "Available")

# plot it
p <- ggplot(data=hours.m, 
            aes(x=Respondent, y=Time, fill=Available)) +
  geom_tile(color = "grey90",
            lwd = 0.5,
            linetype = 1) +
  scale_fill_gradient(low = "white", high = "darkblue") +
  scale_y_discrete(limits = rev(levels(hours.m$Time))) +
  coord_fixed() +
  theme_minimal() +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
  theme(legend.position = "none") +
  xlab("Respondent (permuted data)") +
  ylab("Times Selected by Respondent (permuted data)")

p

Two small notes on that code. First, when melt()‘ing the data, I keep only respondents with at least 1 time selected (nb, in ggplot() this can lead to blank columns; solve by making ID an ordered factor, or create a new ID without gaps). Second, I reverse the Y axis using scale_y_discrete(limits = rev(levels(…))) so the times are shown in a more natural order. Here’s the plot:

We can see a few characteristics of the data. First of all — remembering that these are permuted data with the same high level structure but not people’s actual individual responses — most respondents only selected a few times they are available, commonly 1, 2, or 3 times on a given day. Second, we see that the times are distributed across almost all hours of the day (which reflects the worldwide community of respondents). Thus, the overall grid is relatively sparse.

FWIW, we can calculate the sparseness. Add up all the selected times and divide by the size of the grid:

# how sparse is it?
sum(hourgrid[ , 1:24]) / nrow(hourgrid) / 24

The answer is 0.108, i.e., the “average” respondent selected 10.8% of the available times.

All of this tells us that the single most popular time would be either 09:00 or 15:00 Pacific. But what if we want to schedule 2 or 3 times that maximize availability for the largest number of people, where each of them has at least one time that would work? That’s where TURF comes in.

What about days of the week? For purposes here, I’m ignoring that. This grid has responses for weekdays (weekends were asked separately). Among all 7 days of the week, the most popular days were Monday and Tuesday. Of those, only Tuesday works for me to schedule the meetings. Thus, in this post, I assume that we’re talking about times on Tuesdays.

BTW, you might wonder: why not ask about all times, across all days using a 7 day x 24 hour grid? Answer: in that format, respondents tend to select only a small number and data are extremely sparse. However, we could still use TURF with data of that kind.


TURF Code and Results

It is relatively straightforward to write one’s own TURF function, but we don’t need to; there is already an easy-to-use turfR library for R (Horne, 2014). First, we slightly reformat the data to match its expectation:

# TURF Analysis
library(turfR)

# set up the data for TURF package
# add ID variable so respondents are identified to TURF
turf.dat <- data.frame(
  ID = hourgrid[ , "ID"],           # respondent ids
  Weight = rep(1, nrow(hourgrid)),  # the weight / importance for each respondent
  hourgrid[, 1:(ncol(hourgrid)-1)]  # columns of data to use; all except "ID"
)
head(turf.dat)

This format is almost identical to our original format, except that it adds a Weight for each respondent. The weight is the “importance” to give each observation. For example, if you are working with a customer database, you might want to weight customers according to their observed revenue, longevity, retention, segment membership or the like; or use similar estimated from a regression model; or from responses to a survey item such as their level stated interest. For our data, I assign equal weight of 1 to all observations.

Here’s what the turfR data object looks like:

Although our data has simple values of 1 and 0 for people’s availability, binary data is not required for TURF. TURF can work with continuous values, such as likelihood scores from a database, regression model, or simple Likert responses; estimated preference values (e.g., from a choice model survey); and so forth. In our case, the data come from checkboxes of availability on a survey, so 1/0 is the natural format.

With that, we can now fit a TURF model. I’ll start by looking for the best 2 times (k=2) and ask for the top 10 results (keep=10). We tell it that we have 24 items (times) to consider (n=24). Here’s the code and result:

# run the TURF analysis for best 2 times
(turf.2 <- turf(turf.dat, n=24, k=2, keep=10))$turf

I’ll interpret results in the next section, but will note here an unusual but happy coincidence in these data: the item & column numbers (1—24) correspond exactly to the hours of the day (01:00—24:00/0:00). That makes the results easy to read, but it is a unique situation for this data set and question. In most cases, you would have to match those numbers to interpretable names of items, features, messages, etc.

For my purposes, I’ll use TURF results for the tentative 3 best times to schedule. Here’s how to get those, asking for the 3 best times (k=3) and keeping the top 15 results (keep=15):

# what if we scheduled 3 times instead of 2? 
(turf.3 <- turf(turf.dat, n=24, k=3, keep=15))$turf

BTW, in this simple case, the algorithm does a full expansion to consider all possible combinations. If we had larger sets to consider, that would become computationally intractable (e.g., choosing the best sets of 20 ads to rotate out of a set of 100 options). In such cases, turfR offers Monte Carlo sampling options.


Selecting Among the Results

For my scheduling problem, I will start by scheduling meetings 1 time per week and then soon expand to 2 and eventually perhaps 3 times per week. Given that plan (i.e., “strategic goal”), I’ll use the k=3 results for the optimal 3 meetings per week, and develop the schedule in light of those results.

Following is the k=3 result again (first 6 solutions). Before jumping into the schedule choice, I’ll explain the first 3 columns: combo, rchX, and frqX. The combo is simply the internal solution number that turfR considered. We can ignore that unless we want to dig deeply into its solution design matrix.

The next two columns, rchX and frqX, are the heart of the TURF solution: the reach (rchX) and frequency (frqX) that the solutions reach, sorted in decreasing order of reach. Specifically, rchX is the proportion of respondents who are reached at least once by a given solution, which frqX is how often the average respondent is reached.

Consider the first solution, which is to offer meeting times at 09:00, 12:00, and 18:00 Pacific Time. That set of times is estimated to reach 69% of the respondents — meaning that at least 1 of those times would work for 69% of people who replied to the survey. However, some people are available for multiple times within that set. The frequency estimate tells us that this set of 3 times would offer 0.74 available meeting times per week to the average respondent. I.e., some of the 69% would have multiple options.

You might wonder, “what about confidence intervals (CIs)? Are there meaningful differences between 69% for the first solution and 67% for the second one?” Like anything involving CIs, it is a complex question. In this case, it is unclear to what extent we should consider these data to be a sample of a population — as a CI would assume — as opposed to being a descriptive census of a community. However, if we assume that it is a random sample, we could find CIs either with a bootstrap operation (best) or using a quick approximation for the binomial proportions. Given N=39 respondents and looking for an 80% confidence interval (±1.28 standard errors), we get a simple estimate of:

# est'd CI for the TURF #1 solution
# get the proportion reached for solution #1
p.turf <- turf.2$turf[[1]]$rchX[1]
# estimate of 80% confidence interval (swag; bootstrap etc would be better)
# set the Z value (standard errors) as appropriate; in this case "80% CI" Z == 1.28
ci     <- 1.28 * sqrt(p.turf * (1-p.turf) / nrow(hourgrid))
c(p.turf - ci, p.turf + ci)

The answer is that our CI is a range of 0.598—0.787, meaning that solution #1 is estimated to reach approximately 60-79% of respondents. The other 14 solutions all fall within that range for reach. So, as a first approximation, none of those 14 are dramatically worse than solution #1. Here’s the CI computation using simple binomial approximation at the 80% level:

With all of this in place, let’s look at the first 6 solutions. Then I’ll discuss how I chose among them with respect to overall goals. The solutions are:

Examining the patterns, a few things stand out. First, the 09:00 time appears in all 6 of the top results. Second, we see that solutions 2—5 are all tied for reach with one another, and are only slightly behind solution #1. Third, solutions 2 & 3 — and to a lesser degree solutions 4 & 5 — score somewhat higher on frequency than solution #1. Finally, some of the solutions including #1 and #5 have times spread out across the day (e.g., 09:00, 12:00, 18:00) whereas others are more compact, such as #4 that has times at 09:00, 10:00, and 12:00.

Here’s how I assessed the possibilities and made a choice for the schedule.

  1. For the first, initially scheduled time, 09:00 is a clear winner. Although it is slightly less popular on its own than 15:00 (N=12 vs. N=13), the 09:00 time appears in all of the top solutions. There it is a good choice of a foundational time, while allowing the schedule to evolve later.

  2. Beyond that, I tentatively select 10:00 as the second, expected time slot. 10:00 appears in solutions #2 and #4 and thus is an excellent second choice. Additionally it has the advantage of being compact in time (assuming both sessions occur on the same day) which is logistically simple. It also appears in solutions with relatively higher frequency, meaning that it will give some flexibility to people.

  3. If and when the community needs to have a third time scheduled, I’d want to see how 09:00 and 10:00 are going and then decide. Tentatively, 12:00 or 15:00 look promising; another option might be 18:00.

This process of selecting 09:00 — while tentatively planning 10:00, and considering a more complete schedule over time — is a good example of a cautious process.

At Amazon (and elsewhere) this is often described as the difference between a “two-way door” and a “one-way door”. A one-way door is a decision that is difficult to reverse. For example, announcing a scheduled time of 09:00 and scheduling practice then is difficult to reverse because people come to expect it. Similarly shipping a product, buying a house, and getting married are all one-way doors.

By contrast, a two-way door is relatively easier to reverse. Deciding tentatively on 10:00 as a planned next step is easy to change later. It is similarly “two way” and reversible to test a product prototype, rent an apartment, or to date someone as opposed to getting married.


A Few Applications for TURF

In the worlds of Quant UX and marketing research, TURF analysis has many applications. Here are a few that occur to me somewhat randomly:

  • Constructing a restaurant or other “menu”: given preference data (such as a MaxDiff assessment), which smallest set of items will often the most choices that appeal to as many customers as possible? (Want to try it? Get the Quant UX Association class preference data and run TURF on the estimates!)

  • Planning conference or workshop locations over time: which set of locations will maximize the appeal — on at least one occasion — for as many people as possible?

  • Selecting marketing messages: if we can feature only a few messages, such as 3 claims on a product package or website, which set of messages will catch attention from as many people as possible?

  • Offering employee perks or gifts: suppose we want a small number of things to offer to employees. Which set will be most likely to have “something for everyone”?

  • Prioritizing customer outreach efforts: suppose our sales team wants to call customers who are highly interested in at least one of our new features. We want to select a limited number to call, prioritized jointly by feature interest [reach] and account size [weight]. Who are they?

  • Pharmaceutical formulary: which combination of prescription drugs will offer the greatest coverage, to as many of our patients and/or physicians as possible?

  • Excursions for a cruise ship: what is the smallest combination of outings and on-ship activities that will interest the largest number of passengers?

  • Similarly: selecting political party platforms or legislative priorities; books for a mobile library; design templates for a app; destinations to fly to be a “complete” airline; and of course, the original goal of optimizing ad placement for reach; among many others.

Remember that TURF can use continuous estimates for “reach" and thus can take advantage of any sort of (continuous) estimates for interest or preference; and it can also weight respondents. The reach estimates may come from respondent choices, as in the data here, but also may come from observed data as in product instrumentation or a CRM system, or from a regression model. With so many options, there are many applications for TURF!


Conclusion

As we saw, TURF analysis provides potential solutions to a very practical question: how to schedule a small number of events (weekly meeting times) for a group of people. Given a few candidate solutions, I was able to consider other factors such as practicality and create an initial schedule.

We discussed how TURF can be extended to larger and more complex cases, including ones where we have continuous (real number) estimates of interest instead of binary indications, and could weight respondents by some kind of importance estimate.

Could I have done this analysis by hand? With this small data set and number of options, yes. OTOH, by doing it in code, I have an answer that is more scalable and will be easy to apply again as new data arrive. Plus I like coding (and also like blogging and explaining code).

Learning More. Although TURF is widely used, I’m not aware of a great single source to learn more about it. The original paper was by Miaoulis et al (1990), listed in “Citations” below. To find more applications, I’d suggest to check that paper’s citations in Google Scholar and branch out from there. The turfR library also has helpful documentation and references.

Try it! If you’d like to try TURF with another, more complex data set, here’s a homework exercise. Get the public MaxDiff data set with individual preferences for classes to be offered by the Quant UX Association. What is the best set of 3, 4, or 5 classes that would reach the most people? (Note: some choice modeling software offers TURF analysis built-in, but by adapting the code here, you can do it on your own in R!)

Best wishes!


Citations

Horne J (2014). turfR: TURF Analysis for R. R package version 0.8-7, https://CRAN.R-project.org/package=turfR.

Miaoulis, G., Parsons, H., & Free, V. (1990). TURF: A New Planning Approach for Product Line Extensions. Marketing Research, 2(1).

Oksanen J, Simpson G, Blanchet F, Kindt R, Legendre P, Minchin P, O'Hara R, Solymos P, Stevens M, Szoecs E, Wagner H, Barbour M, Bedward M, Bolker B, Borcard D, Carvalho G, Chirico M, De Caceres M, Durand S, Evangelista H, FitzJohn R, Friendly M, Furneaux B, Hannigan G, Hill M, Lahti L, McGlinn D, Ouellette M, Ribeiro Cunha E, Smith T, Stier A, Ter Braak C, Weedon J, Borman T (2025). vegan: Community Ecology Package. R package version 2.6-10, https://CRAN.R-project.org/package=vegan.

Wickham H (2007). Reshaping Data with the reshape Package. Journal of Statistical Software, 21(12), 1-20. URL http://www.jstatsoft.org/v21/i12/.

Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.


All the Code

Following is all of the R code from this post, including inline code that creates the small data set.

####
# load our data
# note, data have been permuted from original responses,
# preserving overall characteristics while changed for each respondent (cf. vegan::permatswap)
hourgrid <- structure(c(0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 
  0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 
  0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 
  0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 
  0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 
  0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
  0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 
  0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 
  0L, 1L, 0L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 
  0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 
  0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 
  0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 
  1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
  0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), 
  dim = c(39L, 24L), 
  dimnames = list(
    c("1", "2", "3", "4", "5", "6", "7", "8", 
      "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", 
      "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", 
      "31", "32", "33", "34", "35", "36", "37", "38", "39"), 
    c("1:00", "2:00", "3:00", "4:00", "5:00", "6:00", "7:00", "8:00", "9:00", 
      "10:00", "11:00", "12:00", "13:00", "14:00", "15:00", "16:00", 
      "17:00", "18:00", "19:00", "20:00", "21:00", "22:00", "23:00", "0:00")
    )
  )
# set that up as a nice dataframe and keep colnames as "hours"
tmp.names <- colnames(hourgrid)
hourgrid <- data.frame(hourgrid)
names(hourgrid) <- tmp.names
hourgrid$ID <- 1:nrow(hourgrid) # add ID variable needed later by TURF
# check its structure
head(hourgrid)

# basic stats
# count of preferred times
colSums(hourgrid[ , 1:24])
# number of times chosen, per respondent
rowSums(hourgrid[ , 1:24])
# how sparse is it?
sum(hourgrid[ , 1:24]) / nrow(hourgrid) / 24

# heat map of times
# make heatmap
library(ggplot2)
library(reshape2)
library(car)
# melt the data for nice ggplot structure
hours.m <- melt(subset(hourgrid, rowSums(hourgrid[ , 1:24]) > 0), 
                id.vars = "ID")
names(hours.m) <- c("Respondent", "Time", "Available")
# some(hours.m, 15)

# plot it
p <- ggplot(data=hours.m, 
            aes(x=Respondent, y=Time, fill=Available)) +
  geom_tile(color = "grey90",
            lwd = 0.5,
            linetype = 1) +
  scale_fill_gradient(low = "white", high = "darkblue") +
  scale_y_discrete(limits = rev(levels(hours.m$Time))) +
  coord_fixed() +
  theme_minimal() +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
  theme(legend.position = "none") +
  xlab("Respondent (permuted data)") +
  ylab("Times Selected by Respondent (permuted data)")

p

# TURF Analysis
library(turfR)

# set up the data for TURF package
# add ID variable so respondents are identified to TURF
turf.dat <- data.frame(
  ID = hourgrid[ , "ID"],           # respondent ids
  Weight = rep(1, nrow(hourgrid)),  # the weight / importance for each respondent
  hourgrid[, 1:(ncol(hourgrid)-1)]  # columns of data to use; all except "ID"
)
head(turf.dat)

# run the TURF analysis for best 2 times
(turf.2 <- turf(turf.dat, n=24, k=2, keep=10))$turf

# est'd CI for the TURF #1 solution
# get the proportion reached for solution #1
p.turf <- turf.3$turf[[1]]$rchX[1]
# estimate of 80% confidence interval (swag; bootstrap etc would be better)
# set the Z value (standard errors) as appropriate; in this case "80% CI" Z == 1.28
ci     <- 1.28 * sqrt(p.turf * (1-p.turf) / nrow(hourgrid))
c(p.turf - ci, p.turf + ci)

# what if we scheduled 3 times instead of 2? 
(turf.3 <- turf(turf.dat, n=24, k=3, keep=15))$turf

0
Subscribe to my newsletter

Read articles from Chris Chapman directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Chris Chapman
Chris Chapman

President + Executive Director, Quant UX Association. Previously: Principal UX Researcher @ Google; Amazon Lab 126; Microsoft. Author of "Quantitative User Experience Research" and "[R | Python] for Marketing Research and Analytics".