How to find Median in SQL Database

Timothy NduatiTimothy Nduati
2 min read

Since median is harder than min/max/count and not a standard aggregate function, we’ll need to do a little more work to calculate it. Here’s how on several different databases:

Median on Redshift

The Redshift team recently released a median window function, making it one of the easiest syntaxes to find the median with:

select median(price) over () as median
from purchases
limit 1

Note the limit 1: Since median is a window function and not an aggregate function, it’ll return one value for each row in the table.

Median on Postgres

If you like defining your functions in Postgres, the Postgres Wiki has a definition for median. We’ll do it in SQL and get Postgres to help us find the middle value by numbering all the rows with the row_number() window function.

First, a CTE to sort and number all of the rows, with a count that’ll help later on:

with ordered_purchases as (
  select
      price,
      row_number() over (order by price) as row_id,
      (select count(1) from purchases) as ct
  from purchases
)

Then we find the middle one or two rows and average their values:

select avg(price) as median
from ordered_purchases
where row_id between ct/2.0 and ct/2.0 + 1

The where clause ensures that we’ll get the two middle values if there is an even number of values, and the single middle number, if there is an odd number of values because between, is inclusive of its bounds.

Median on MySQL

MySQL might not have window functions, but it does have variables, and we’ll use them to achieve the same result.

First, we’ll set two variables, one for the row count and one to act as the row_id from before:

set @ct := (select count(1) from purchases);
set @row_id := 0;

And just like before, we average the middle one or two values:

select avg(price) as median
from (select * from purchases order by price)
where (select @row_id := @row_id + 1)
between @ct/2.0 and @ct/2.0 + 1

The @row_id := @row_id + 1 syntax simply increments the @row_id counter for each row. Unlike Postgres we don’t need to build up a temporary result set of rows with row_id because variables let us compute the row_id on the fly.

10
Subscribe to my newsletter

Read articles from Timothy Nduati directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Timothy Nduati
Timothy Nduati

I am a student pursing Computer Science. I started coding in 2019 and have grown a keen interest on Data Science and Machine learning. My favorite language is Python and SQL.