I routinely do a "rehearsal" of my presentations with our DBA team and any other willing participants (they tend to be developers) to receive feedback and to share knowledge. In the last one, I've got quite an interesting question, which requires understanding of execution plans - specifically, the order in which operations are executed. This post is my attempt to explain the misconception.

The Query

The query which raised the question doesn't need much explaining. It's a simple example for demonstration purposes:

select /*+ ORDERED */ *
    from tab_first tf
    join tab_second ts on ts.id_first=tf.id_first;

---------------------------------------------------------------------------------
| Id  | Operation          | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |            |       |       |     4 (100)|          |
|*  1 |  HASH JOIN         |            |     6 |   978 |     4   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| TAB_FIRST  |     3 |   225 |     2   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| TAB_SECOND |     6 |   528 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------

The Question

Which of the two tables is accessed first?

Is it tab_first because it is explicitly requested to be the leading table or is it tab_second because it is displayed as the last line in execution plan output?

Well, the correct answer is tab_first, but what prompted the question was actually the awareness of the general rule of the order in which execution plans are supposed to be read: from bottom up - starting with the deepest entries (leaf nodes). And, if we'd follow this rule to the letter, the tab_second would be accessed first.

This general rule comes from the way the execution plan is executed. Maybe it can help thinking of it this way: it all starts with root operation (id=0), but in order to complete this root operation it needs full or partial results from its children operations. And the same goes for all children that have children until we get to leaf nodes. That is why we're generally reading the plan as discussed earlier.

Let's first understand how Hash Join and Nested Loops operations work in order to better understand when is which table accessed.

Hash Join

This join method is generally used when joining two larger row sources (two tables in our example) using equality operator.

The optimizer will use statistics to determine which of the two row sources is smaller than the other and proclaim it to be the build table. We can also force specific table to be the build table by explicitly specifying the join order using hints (ORDERED or LEADING).

The other row source (generally the bigger one) is considered the probe table.

In our example, first_table is the build table and second_table is the probe table.

Hash Join will read the whole build table ("smaller one") first and calculate hash on joined columns for each row. The hashes are stored in PGA where each hash value has associated linked list of rows for which the same hash has been calculated.

The probe table (bigger one) is being read only after this hash -> linked list memory structure is built in PGA. This memory structure is probed by each record in the probe table.

So, in our example, tab_first is read first, and after that the tab_second is read.

Nested Loops

Let's now consider an execution plan with Nested Loops such as this one:

---------------------------------------------------------------------------------
| Id  | Operation          | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |            |     6 |   102 |     6   (0)| 00:00:01 |
|   1 |  NESTED LOOPS      |            |     6 |   102 |     6   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| TAB_FIRST  |     3 |    30 |     2   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| TAB_SECOND |     2 |    26 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------

Nested loops are probably the easiest to understand: for each row in tab_first loop over all rows in tab_second to find matching rows. tab_first would be considered outer table and tab_second would be considered the inner table in terms of Nested loops terminology.

In regards of when is which table accessed:

first, a batch of rows is read from tab_first
for each of the rows in this batch the matching rows in tab_second are found

So, tab_first is accessed first, then tab_second is iterated as many times as there is rows in first batch, then tab_first is accessed again for the next batch.

To put it differently, full table scan of tab_first is done once. But by the time it is done, the full table scan of tab_second was already done as many times as there is rows in tab_first.

Connecting the Dots

In the similar manner, if we'd have a multi-table join with nested loops, then all three tables would be accessed in described alternating manner and the execution plan would look like this:

----------------------------------------------------------------------------------
| Id  | Operation           | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |            |     8 |  2008 |    12   (0)| 00:00:01 |
|   1 |  NESTED LOOPS       |            |     8 |  2008 |    12   (0)| 00:00:01 |
|   2 |   NESTED LOOPS      |            |     6 |   978 |     6   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| TAB_FIRST  |     3 |   225 |     2   (0)| 00:00:01 |
|*  4 |    TABLE ACCESS FULL| TAB_SECOND |     2 |   176 |     1   (0)| 00:00:01 |
|*  5 |   TABLE ACCESS FULL | TAB_THIRD  |     1 |    88 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------

Hopefully, I managed to explain that reading the execution plan from bottom up, starting with the deepest entries (leaf nodes), is correct. But it does not mean that one operation must complete for the next to begin. And the operations displayed as siblings (tab_first and tab_second in our examples), don't necessarily execute in any specific order.

And if you need more food for thought on the subject - consider thinking about execution plans of parallel queries, pipelined functions and adaptive plans.

References

Official doc: Oracle Documentation - Joins

Reading Execution Plans