Improving Oracle SQL Query with Connect By Performance


I recently had a hierarchical query that was slow. I could run the same query, un-nested, quickly, but when I added in the “connect by” things slowed down. I had the same issue using a recursive common table expression (CTE or “with clause”). Either way, the query took 104 or more seconds to run. Below is an example of the query:
-- this is slow
select id,
parent_id,
sys_connect_by_path(id,'.') cb_path,
connect_by_isleaf is_leaf,
c1
...
c200,
(select my_function(c1) from dual) my_f,
...
from my_table
start with parent_id is null -- ** set parent id column
connect by nocycle prior id = parent_id
order siblings by seq_no
Looking at the explain plan and a trace, I found that the database was reading the entire table many times. I guessed that I might be able to get the database to do a single read of the table if I could limit the “connect by” portion to use only an index, and then, once the hierarchy is established, join in the rest of the data in a single pass.
-- this is much faster
-- assumes you have a single compound index on id,parent_id
-- and another unique key index on id
with cb_data as
(select id,
parent_id,
sys_connect_by_path(id,'.') cb_path,
connect_by_isleaf is_leaf
from my_table
start with parent_id is null -- ** set parent id column
connect by nocycle prior id = parent_id
order siblings by seq_no
)
select id,
parent_id,
cb_path,
is_leaf,
c1
...
c200,
(select my_function(c1) from dual) my_f,
...
from cb_data
join my_table mt on mt.id = cb_data.id
In the new query, the hierarchical portion, cb_data, contains only 3 columns: id, parent_id, seq_no. I created an index with just these three columns. If I run just the hierarchical portion of the query the database does not read the table at all. It only reads the index. When I run the entire query, the database reads only the index to establish the results of cb_data and then does a single pass of my_table to return the final results. The new query returns the same results but, in my case, runs in 4 seconds. That’s a solid improvement over the original 104 seconds.
The key is to have a single index that contains all of the columns used in the CTE (cb_data above). If you require any additional columns in the CTE to accomplish the connect by, ensure they are in the index. I have not done extensive testing related to the order of the columns. My initial impression is that id, parent_id, seq_no provides good performance.
I hope this helps others as well. Comment to let me know if you try it and have similar results.
Subscribe to my newsletter
Read articles from Anton Nielsen directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
