Top-N queries in Oracle SQL

by Rocky Jagtiani
Oracle Certified Associate
I train on Oracle Certification from last 7 years

I decided to write about Top-N queries today because even though there are lots of articles and documentation about the correct way to do it in Oracle out there, I have seen this question asked on different forums, and to my surprise, many times people suggesting incorrect ways to do it without anyone clarifying their mistake. Now, every time I see the question again, I will have a trustworthy link to share with my students, data analyst and data-science-learners !!

Contents [25 mins to read & Execute]

  1. What are Top-N queries?
  2. Using the rownum pseudocolumn
  3. Using analytic functions
  4. Using the row limiting clause


For understanding Top-N Queries , make sure of :
1> Installing Oracle-11g ( steps of installation and Oracle-11g download copy is here , please see      Oracle SQL - Pl/SQL > downloads section ). By the way any version from 9i and upwards would do. In case if you don't want to install Oracle , then use liveSQL from here.
2> Reading each and every step.
3> & finally Executing as you read on.

What are Top-N queries?

Top-N queries are a type of query that is used to get the Top-N records based on some sorting criteria, where N can be any natural number, for example, to get the top 5 earning employees, or the top 10 selling products.

There are several ways to write Top-N queries in Oracle. I’m going to show you the most common ones.

Using the rownum pseudocolumn

One of the options we have is to utilize the rownum pseudocolumn. This is a number that is assigned to a row when it is retrieved from the source table, and something that is key to know, is that if it is used in a query that has an ORDER BY clause, this number is assigned before the rows are ordered, and that is why it cannot be used to create top-n queries directly, which is a mistake some people do.

Take a look at this. If I run this query on my test table:

SELECT employee_id, first_name, last_name, salary
FROM Top_N_Employees
ORDER BY salary DESC;




By the way for the beginners , here is the schema of my test table : 

create table Top_N_Employees (
employee_id int primary key, 
first_name varchar2(20) not null, 
last_name varchar2(20) not null, 
salary float not null,
department_id int
);

// Assume deptno : 100 -> HR and 101 -> Training
// lets add some records to our test table
insert into Top_N_Employees values(1,'Simran', 'Jagtiani', 75000, 100);

insert into Top_N_Employees values(2,'Rocky', 'Jagtiani', 200000, 101);

insert into Top_N_Employees values(9,'Ganesh', 'Gaikwad', 6000, 100);

insert into Top_N_Employees values(10,'Pravin', 'dumb', 6000, 100);

insert into Top_N_Employees values(3,'Lokesh', 'Singh', 100000, 101);

insert into Top_N_Employees values(4,'Niraj', 'Sharma', 70000, 101);

insert into Top_N_Employees values(5,'Prajakta', 'Smart', 23000, 100);

insert into Top_N_Employees values(6,'Neetu', 'Intelligent', 22000, 100);

insert into Top_N_Employees values(7,'Vipin', 'Singh', 20000, 101);

insert into Top_N_Employees values(8,'Anant', 'Surve', 18000, 100);

insert into Top_N_Employees values(11,'Ashish', 'mongoDB', 70000, 101);

insert into Top_N_Employees values(12,'Anna', 'Aroke', 20000, 101);

Now, look what happens if I try to get the top-5 earning employees by using the rownum directly:

SELECT employee_id, first_name, last_name, salary
FROM Top_N_employees
WHERE rownum <= 5
ORDER BY salary DESC;


These are not the real top-5 earners !!

To be able to use the rownum to get the correct result, I need to use a subquery like this:

WITH ordered AS
(
SELECT employee_id,first_name,last_name,salary
FROM top_N_employees
ORDER BY salary DESC
)
SELECT *
FROM ordered
WHERE rownum <= 5;

Which returns these [correct] results:


How does it work? Well, the sub-query orders the employees by salary, and once they are ordered they are retrieved by the main query in that order, and at that point the rownum is assigned. So, essentially, what I did was to invert the order in which things happen. Without the subquery, the rownum is assigned first, but with the sub-query, the ordering is applied first. And as you might be thinking, yes, this could have been an inline view also, and the results would be the same.

Using analytic functions

Another way to create top-n queries is by using analytic functions, which offer some extra flexibility as to how we want to handle certain situations.


There are 3 functions you could use to create top-n queries. Here is an example that shows how these functions work, so that you can decide which one gives you the desired results.

SELECT employee_id,first_name,last_name,salary,
ROW_NUMBER() over (order by salary DESC) row_number,
RANK() over (order by salary DESC) rank,
DENSE_RANK() over (order by salary DESC) dense_rank
FROM top_N_employees
ORDER BY salary DESC;



In this case the ROW_NUMBER function works in a similar manner as the rownum pseudocolumn. 

As you can see, it assigned the row number correctly based on the specified order, but it did because I added an ORDER BY in the function's analytic clause.

But If you look at the results from the RANK function, it assigned a rank of 4 to the forth and fifth rows, why? Because those employees have the same salary. It takes into account the fact that they earn the same salary, so it assigns them the same rank. This might or might not be what you want, but that is the way this function works. Another thing to notice is that since it assigned 2 rows the same rank (number 4), there is no rank number 5. So, when using the RANK function, keep in mind that ranks may not be consecutive numbers.

The DENSE_RANK function works similar to RANK, but it does not skip rank numbers. As you can see it assigned the same rank to the forth and fifth rows too, but the sixth row was assigned the next rank number, without leaving any holes.

Now, the previous example is not a top-n query. It just returned our results in order and assigned a number we could use to filter the results.

We still need to filter the results so that we only get the top 5, but since these analytic functions are calculated after the WHERE clause is evaluated, you cannot use them or reference them in the WHERE clause, which means we need a subquery in this case too.


Here is how the top-5 earners query would look like using the ROW_NUMBER function:

WITH numbered AS
(
SELECT employee_id,first_name,last_name,salary,
ROW_NUMBER() over (order by salary DESC) row_number
FROM top_N_employees
)
SELECT *
FROM numbered
WHERE row_number <= 5;


An additional benefit of using analytic functions for top-n queries, is that they allow you to easily get top-n results for different groups of rows, for example, the top-5 earners per department.

We just need to add a PARTITION clause to the function's analytic clause:


As you can see, the row number is reset for each department.

Using the row limiting clause

The row limiting clause is new in Oracle 12c, and it greatly simplifies writing top-n queries.

See how easy it is to create our top-5 earners query with this clause:

SELECT employee_id,first_name,last_name,salary
FROM top_N_employees
ORDER BY salary DESC
FETCH FIRST 5 ROWS ONLY;

No subquery, no analytic function... Only one additional line of code (the last one).

Because this is new in 12c, you cannot test it if you use an express edition (there is currently no express edition 12c). But if you want to give it a try, Oracle Live SQL is waiting for you

So, there you have it. Now you know how to create Top-N queries in Oracle.

To know more about the training's my team takes and great resourcessee here



Have something to add? Great, share your thoughts in the comments section below.


Comments

Popular posts from this blog

PPC - SomeOne SpenDing - SomeOne EarNing

Web scrapping tools,Sooo Muuch Data - Analysis Needed !

13 years old coding n0w !!