How to Get a Non-Aggregate value in an Aggregate Query
Suppose you want to find the maximum salary for each job in the emp
table. This is very easy using using the max
aggregate:
select job, max(sal) max_sal
from emp
group by job
order by job
;
JOB MAX_SAL
--------- -------
ANALYST 3000
CLERK 1300
MANAGER 2975
PRESIDENT 5000
SALESMAN 1600
The next logical question is: Who earns the max salary? A very common solution to this problem is to use an analytical function:
select
job, ename, sal
from (
select
row_number() over (partition by job order by sal desc) rn,
job, ename, sal
from emp
)
where rn = 1
order by job
;
JOB ENAME SAL
--------- ------ ----
ANALYST SCOTT 3000
CLERK MILLER 1300
MANAGER JONES 2975
PRESIDENT KING 5000
SALESMAN ALLEN 1600
-- Explain plan:
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 546 | 4 (25)| 00:00:01 |
|* 1 | VIEW | | 14 | 546 | 4 (25)| 00:00:01 |
|* 2 | WINDOW SORT PUSHED RANK | | 14 | 252 | 4 (25)| 00:00:01 |
| 3 | TABLE ACCESS STORAGE FULL| EMP | 14 | 252 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RN"=1)
2 - filter(ROW_NUMBER() OVER ( PARTITION BY "JOB" ORDER BY "SAL")<=1)
There is an alternate way to do this using the keep
clause and dense_rank
along with the first/last
commands:
select
job,
-- The initial "min" aggregate could also be "max"
-- The key logic is in the dense_rank first/last and ordering
min(ename) keep (dense_rank last order by sal) ename,
-- Could also write as:
-- min(ename) keep (dense_rank first order by sal desc) ename,
max(sal)
from emp
group by job
order by job
;
JOB ENAME MAX(SAL)
--------- ------ --------
ANALYST FORD 3000
CLERK MILLER 1300
MANAGER JONES 2975
PRESIDENT KING 5000
SALESMAN ALLEN 1600
-- Explain Plan
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 90 | 4 (25)| 00:00:01 |
| 1 | SORT GROUP BY | | 5 | 90 | 4 (25)| 00:00:01 |
| 2 | TABLE ACCESS STORAGE FULL| EMP | 14 | 252 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Using the dense_rank
command is easier to write and it may result in more performant queries. The different explain plans shows that less bytes were required.
Note: the keen observer may note that for the ANALYST
position there's two different results. That's because both they're two analysts (FORD
and SCOTT
) and they both earn the same amount. If "ties are allowed" the following query can be used:
select job, ename, sal
from (
select
job, ename, sal,
dense_rank() over (partition by job order by sal desc) rn
from emp
)
where rn = 1
order by job
;
-- Note that ANALYST has two rows given that SCOTT and FORD earn the same amount
JOB ENAME SAL
--------- ------ ----
ANALYST SCOTT 3000
ANALYST FORD 3000
CLERK MILLER 1300
MANAGER JONES 2975
PRESIDENT KING 5000
SALESMAN ALLEN 1600
References
Oracle's official documentation on this feature is actually in the
first
commandRob van Wijk has a good article about the
keep
clause- Covers some of the performance gains in detail
Thanks to Jeff Kemp for showing me this feature!
Subscribe to my newsletter
Read articles from Martin Giffy D'Souza directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by