Improving Oracle SQL Query with Connect By Performance

Anton NielsenAnton Nielsen
2 min read

I recently had a hierarchical query that was slow. I could run the same query, un-nested, quickly, but when I added in the “connect by” things slowed down. I had the same issue using a recursive common table expression (CTE or “with clause”). Either way, the query took 104 or more seconds to run. Below is an example of the query:

-- this is slow
select id,
       parent_id,
       sys_connect_by_path(id,'.') cb_path,
       connect_by_isleaf is_leaf,
       c1
       ...
       c200,
       (select my_function(c1) from dual) my_f,
       ...
  from my_table
  start with parent_id is null  -- ** set parent id column
  connect by nocycle prior id = parent_id
  order siblings by seq_no

Looking at the explain plan and a trace, I found that the database was reading the entire table many times. I guessed that I might be able to get the database to do a single read of the table if I could limit the “connect by” portion to use only an index, and then, once the hierarchy is established, join in the rest of the data in a single pass.

-- this is much faster
-- assumes you have a single compound index on id,parent_id
-- and another unique key index on id
with cb_data as
(select id,
       parent_id,
       sys_connect_by_path(id,'.') cb_path,
       connect_by_isleaf is_leaf
  from my_table
  start with parent_id is null  -- ** set parent id column
  connect by nocycle prior id = parent_id
  order siblings by seq_no
)
select id,
       parent_id,
       cb_path,
       is_leaf,
       c1
       ...
       c200,
       (select my_function(c1) from dual) my_f,
       ...
  from cb_data
  join my_table mt on mt.id = cb_data.id

In the new query, the hierarchical portion, cb_data, contains only 3 columns: id, parent_id, seq_no. I created an index with just these three columns. If I run just the hierarchical portion of the query the database does not read the table at all. It only reads the index. When I run the entire query, the database reads only the index to establish the results of cb_data and then does a single pass of my_table to return the final results. The new query returns the same results but, in my case, runs in 4 seconds. That’s a solid improvement over the original 104 seconds.

The key is to have a single index that contains all of the columns used in the CTE (cb_data above). If you require any additional columns in the CTE to accomplish the connect by, ensure they are in the index. I have not done extensive testing related to the order of the columns. My initial impression is that id, parent_id, seq_no provides good performance.

I hope this helps others as well. Comment to let me know if you try it and have similar results.

1
Subscribe to my newsletter

Read articles from Anton Nielsen directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Anton Nielsen
Anton Nielsen