Making headlines
In my current mammoth work project, I’m generating many plots. The titles are very descriptive (they tell you what the plot is about), but they are not really telling a story.
That's simply because there are so many on the production line.
What we’d like, is to analyse the data, and extract the salient points.
Better still, we'd want this to adjust dynamically for each plot.
Something along the lines of “some place" is higher/lower than "another place" but not higher than "some council”.
However, I’ve had to park that for now because it seemed like an entire project in itself.
Enter,{headliner}
At the weekend I discovered the {headliner} package from Jake Riley
I have had a play around with it and it’s brilliant - such a clever solution to a potentially cumbersome problem
You’ll find the repo here (https://github.com/rjake/headliner) and the package website here
I’ve uploaded some trial data relating to the population projections for Inverness (this is data already in the public domain courtesy of the Improvement Service).
I’ll avoid putting too much wrangling code on the blog post, but you’ll find the code on the repo here. It’s a minimal dataset showing the start and end projections, for 2018 and 2030, at various age-bands.
I’ve wrangled this table into a wider format for the particular chart that I want to make
t1
## year ageband pop pop2030 year2
## 1: 2018 0-15 14528 13165 2030
## 2: 2018 16-44 28859 29621 2030
## 3: 2018 45-64 22986 22577 2030
## 4: 2018 65-74 8372 10343 2030
## 5: 2018 75-84 4916 6901 2030
## 6: 2018 85+ 1914 2611 2030
Write your own headlines
Now I want to make some text for use in my chart
I use the add_headline_column function to compare the 2030 values with the 2018 values, and then state whether this is an increase or decrease with the trend
placeholder. {delta_p}
returns the variance as a percentage, and finally, Jake showed me how to nicely format the actual values, rather than the boring, hard to read actual values I had on my first attempt:
chart_text <- t1 %>%
add_headline_column( x = pop2030,
y = pop,
headline = "population in the {ageband} ageband will {trend} by {delta_p}% ({f(x)} vs {f(y)})",
f = scales::number_format(big.mark = ","))
chart_text$headline
#[1] "population in the 0-15 ageband will decrease by 9.4% (13,165 vs 14,528)"
#[2] "population in the 16-44 ageband will increase by 2.6% (29,621 vs 28,859)"
#[3] "population in the 45-64 ageband will decrease by 1.8% (22,577 vs 22,986)"
#[4] "population in the 65-74 ageband will increase by 23.5% (10,343 vs 8,372)"
#[5] "population in the 75-84 ageband will increase by 40.4% (6,901 vs 4,916)"
#[6] "population in the 85+ ageband will increase by 36.4% (2,611 vs 1,914)"
Hopefully you can already see how useful this is. This is going to save me so many if/ if_else/ statements.
The reason I’m doing this is so I can choose to tailor specific sentences in my plots.
I decided I would add in each of these placeholders as a separate vector.
There may well be a slicker way of doing this, but I haven’t had a lot of time to delve into it.
chart_text <- chart_text %>%
add_headline_column(x = pop2030,
y = pop,
headline = "{delta_p}%",
.name = "headline2")
chart_text <- chart_text %>%
add_headline_column(x = pop2030,
y = pop,
headline = "{delta_p}",
.name = "headline3")
chart_text <- chart_text %>%
add_headline_column(x = pop2030,
y = pop,
headline = "{trend}",
.name = "headline4",
f = scales::number_format(big.mark = ","))
chart_text <- chart_text %>%
add_headline_column(x = pop2030,
y = pop,
headline = "{f(x)} vs {f(y)}",
.name = "headline5",
f = scales::number_format(big.mark = ",")
Find the Story
I decided I wanted to specifically focus on the 75-84 age band, as that has the biggest increase across the patch, in percentage terms at least. (Arguably, the smaller increase in the 16-44 age band is of more interest to planners or public health officials because there are so many of them).
First - let’s figure out the row I need, but instead of just grabbing the row number, I’ll grab the row itself
source_row <- chart_text[chart_text[, .I[headline3 == max(headline3)],
by = headline4]$V1
][headline4 == "increase"]
```
Yet another nifty data.table trick!
Then I create some more variables to drop into my text
target_age <- source_row$ageband
target_trend <- source_row$headline4
target_amount <- source_row$headline2
para_text <- glue::glue("The {HSCPval} population in the ",
{tar_age},
" ageband is projected to ",
{tar_trend},
" by ",
tar_amount,
" by ",
{year_end})
This returns the following text:
## The Inverness population in the 75-84 ageband is projected to increase by 40.4% by 2030
Show the story
This is a style of plot I’ve been wanting to make for ages, inspired by work by my mate Ryo, (I think I first saw something like this for the Liverpool squad profiles).
I won’t be able to do this at work, as this kind of thing would not gain mass acceptance, but I like it, so here goes...
First some set-up work:
t1$percent <- as.numeric(chart_text$headline3)/100
t1$direction <- chart_text$headline4
t1$colours <- if_else(t1$direction == "increase", year_end_col, year_start_col)
t1$percent <- if_else(t1$direction == "increase",t1$percent, t1$percent * -1)
t1$direction <- if_else(t1$direction == "increase","Increase", "Decrease")
index <- c(0, 0.25, 0.5, 0.75, 1) # for geom_link
I like the use of geom_link from {ggforce} to give an impression of movement.
There's more that could be done to improve this, but I find that in the real world, people don't care too much about whether you've followed all the rules of data visualization.
They just want to know what they need to know, and you need to be able to tell them.
This package helps you, help them.
This seems like a bit of work for a single plot, but using {targets} or {purrr}, I can add in some more variables and easily cycle through each of my 13 areas and pick out the relevant populations. I could even use pmap
to vary whether I am looking for increases or decreases, or min/max values, on a case-by-case basis.
When you consider I already have 50+ plots (for 13 different areas), each needing a bespoke title ideally, you can hopefully understand how impactful {headliner} could be.
I'm very excited by this package - I believe it’s a real game-changer for deriving insight.
(Well, at least until Chat-GPT beats us to it).
Go and star it and install it. I’m sure you will find it very worthwhile.
Subscribe to my newsletter
Read articles from John MacKintosh directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
John MacKintosh
John MacKintosh
Experienced data analyst, blogger and online instructor. Based in the Scottish Highlands, I am an avid R user, and enjoy sharing my learning as I discover new packages, tools and techniques. I am a fellow of the NHS-R community, have developed some R packages, and my blog posts regularly appear in the R-Weekly roundup. I also have a decade of SQL experience and have a popular SQL course on a leading online data science platform