Tuning High-Volume SQL

articles: 

Mr and Mrs Oracle at the Supermarket

I was at the supermarket the other day waiting my turn at the checkout behind another guy. The checkout-chick (I'm sure there is a more PC term, I just don't know what it is) just finished scanning his groceries and he asked her to wait until his wife returned with a few last-minute items. I've done it before, so steam didn't start coming out of my ears - yet. Fifteen seconds later she ran up - sorry, sorry - and checked out the last few items, so I wasn't really inconvenienced.

As it turned out, their technique was efficient. If they had returned to the aisles with the shopping trolly to get the last few items, they would almost certainly have been behind me at the checkout; 30 seconds of extra items could have cost them minutes at the checkout.

How does this technique scale though? What if he had walked into the supermarket and reserved a place at the checkout whilst his wife ran back and forth with groceries - one aisle at a time. It is inconceivable that she could fetch groceries as fast as the checkout-chick could check them out, so inevitably there would be considerable waiting time between checking-out each armful of groceries. Also, the collection itself will take longer because of the extra ground covered. Although I haven't performed a study, logic tells me the technique will be slower. By lots!

So, it makes sense if you only need to go back for one or two more things, but not more. It's hardly surprising that you don't see people doing it, right?

But here's the thing. I DO see it all the time. Not at the supermarket with groceries; but in Oracle with data. The database equivalent of running back and forth to the checkout is called Nested I/O. Nested I/O is really efficient for small volumes, but it doesn't scale. It is the cause of almost every poorly tuned high-volume SQL I have ever seen.

More is always more

Consider a sub-optimal SQL (SQL1) that takes 10 minutes to run and its optimal tuned equivalent (SQL2) that takes 1 minute to run. They perform the same logic and return the same results, but one is faster than the other. There can only be one reason why SQL1 is slower: because it is doing more work, and since they are producing the same results, this extra work is clearly redundant.

So the secret to tuning High-Volume SQL (indeed - tuning any SQL) is to eliminate the redundant work, leaving only the minimum effort required to generate the result. In the context of SQL tuning, "More Work" comes in two basic forms:

  1. Full table scans - A Full Table Scan reads all of the rows in a table or table partition. If you don't need every row to satisfy your SQL, then some of them have been read redundantly. The solution is generally to create an index so that only the required rows are read. This is a characteristic of a low-volume SQL. Low volume SQLs read small proportions of large tables, so full scans that discard heaps of data become very expensive. Conversely, high-volume SQLs tend to read large proportions of large tables; a Full Table Scan discards much less data and becomes pretty efficient. Indexes are not the solution to high-volume SQL performance problems. That is not a mis-print. Generally speaking, a high-volume SQL will produce its optimal performance without using a single index.
  2. Nested I/O - Nesting is when you perform the same operation for every element in a data set: For each A do B. Nested I/O is when the nested operation (B) is an I/O operation. Like at the supermarket, it is much more efficient to do all your I/O in one hit rather than bit-by-bit. There's a whole 'nuther article on that concept alone, but to give you some idea consider the following:

    EXAMPLE

    Consider a nested operation where for each of 1,000,000 SALES transactions, we lookup the customer from CUSTOMER table, which contains 10,000 rows. Assume this is performed in a nested operation. For each Sales Transaction, we must:

    • Read the root block of the index.

    • Navigate to the leaf block of the index, reading intermediate branch blocks along the way. This will vary depending on the size of the indexed column(s) and the size of the table, but an extra 2 blocks is not uncommon.

    • Use the ROWID in the leaf block of the index to lookup the customer in the table
  3. That's a total of 4 blocks to process a single sales transaction. Over 1,000,000 sales transactions, that's 4,000,000 block reads just to pick up the customer. Consider that the CUSTOMER table may fit perhaps 50 customers in a single block (probably more, but let's err on the conservative side). This means that the entire table occupies no more than 200 blocks in total. Yes that's right, we read 4 million blocks from a 200 block table!

    By nesting the customer lookup, we have increased the number of blocks read 20,000-fold. In other word's 2 million percent more I/O.

    OK, this is a bit of an over-dramatisation because much of those 4,000,000 block reads will be cached. But make no mistake, even reading cached blocks is not free. This overhead is real and very expensive.

Eliminating Nested I/O is the key to tuning High-Volume SQL.

Sounds simple, but it doesn't really help. If I have a slow SQL, how do I tune it? I never asked it to perform nested I/O, so how do I ask it NOT to? This is where it gets a little complex because there are a number of common problems that cause nested I/O; there is a separate section below for each case.

Nested Loops Joins

Nested Loops Joins are the most common and straightforward type of nesting in Oracle. When joining two tables, for each row in one table Oracle looks up the matching rows in the other table. Consider the following SQL Trace output from TKProf.

SELECT  /*+RULE*/ empno
,       ename
,       dname
FROM emp
JOIN dept USING (deptno)

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.00          0         26          0          14
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.00       0.00          0         26          0          14

Misses in library cache during parse: 1
Optimizer mode: RULE
Parsing user id: 88

Rows     Row Source Operation
-------  ---------------------------------------------------
     14  NESTED LOOPS  (cr=26 pr=0 pw=0 time=0 us)
     14   NESTED LOOPS  (cr=12 pr=0 pw=0 time=15 us)
     14    TABLE ACCESS FULL EMP (cr=8 pr=0 pw=0 time=5 us)
     14    INDEX UNIQUE SCAN SYS_C009725 (cr=4 pr=0 pw=0 time=0 us)(object id 70296)
     14   TABLE ACCESS BY INDEX ROWID DEPT (cr=14 pr=0 pw=0 time=0 us)

Starting with the inner-most step of the plan, we see a full table scan of EMP returning 14 rows, and a unique index lookup on DEPT for each of those rows. So we see some wastage already; we have retrieved 14 rows from DEPT even though the table contains only 4 rows in total.

So how can we stop this happening? Keen observers will note the /*+RULE*/ hint in the above SQL. The Cost Based Optimiser is not stupid; it knows that nested loops is a wasteful plan for this SQL. The Rule Based Optimiser is stupid however; one of its rules is to use an index if it is available. This problem will fix itself when we remove the hint.

SELECT  empno
,       ename
,       dname
FROM emp
JOIN dept USING (deptno)

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.00          0         11          0          14
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.00       0.00          0         11          0          14

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 88

Rows     Row Source Operation
-------  ---------------------------------------------------
     14  MERGE JOIN  (cr=11 pr=0 pw=0 time=0 us cost=6 size=364 card=14)
      4   TABLE ACCESS BY INDEX ROWID DEPT (cr=4 pr=0 pw=0 time=9 us cost=2 size=52 card=4)
      4    INDEX FULL SCAN SYS_C009725 (cr=2 pr=0 pw=0 time=4 us cost=1 size=0 card=4)(object id 70296)
     14   SORT JOIN (cr=7 pr=0 pw=0 time=2 us cost=4 size=182 card=14)
     14    TABLE ACCESS FULL EMP (cr=7 pr=0 pw=0 time=1 us cost=3 size=182 card=14)

This was actually a bit unexpected. Oracle will usually prefer a HASH join over a MERGE for equi-joins. This is almost certainly a result of the very small tables involved. Let's see what Oracle would do in a real case of high-volume SQL:

SELECT  /*+ CARDINALITY(emp, 1000000) CARDINALITY(dept,10000)*/ empno
,       ename
,       dname
FROM emp
JOIN dept USING (deptno)

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.00          0         15          0          14
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.00       0.00          0         15          0          14

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 88

Rows     Row Source Operation
-------  ---------------------------------------------------
     14  HASH JOIN  (cr=15 pr=0 pw=0 time=0 us cost=16515 size=65000000000 card=2500000000)
      4   TABLE ACCESS FULL DEPT (cr=7 pr=0 pw=0 time=1 us cost=3 size=130000 card=10000)
     14   TABLE ACCESS FULL EMP (cr=8 pr=0 pw=0 time=0 us cost=14 size=13000000 card=1000000)

Now that it thinks there is a million employees and ten-thousand departments, it uses a HASH join. Hash joins are the preferred method for large data volumes, because their nested operations are performed in memory, not on disk, and they scale better than MERGE joins. Interestingly, Oracle was right to choose the MERGE join for the small volumes; note that MERGE performed only 11 buffer reads, whilst HASH performed 15, and NESTED LOOPS performed 26 - the worst of all.

Although the optimiser is very reliable on simple queries such as this, the real world contains much more complex queries that are beyond the capability of the CBO to optimise. In some cases the CBO will wrongly believe that a high-volume join is actually low-volume, and wrongly choose a Nested Loops join.

Tuning such queries is a complex process and outside the scope of this article. To find out if there is a problem, take a look at the Explain Plan output for the query and see if there are any NESTED LOOPS or FILTER operations with two child steps (FILTER steps with one child do not nest). Look at the first child step and determine from your knowlege of the data how many rows that step will return; this is the number of times the second step will be executed. If this number is greater than 10% of the total rows in the table of the 2nd child step, then you almost certainly have a nested I/O problem. Even greater then 1% could be a problem, although not so serious.

Oracle generally makes these mistakes because it thinks the first child step will return fewer rows than it really does. Look at the row estimate in the Explain Plan to see if it is accurate - it almost certainly won't be. Use DBMS_STATS.GATHER_TABLE_STATS() if statistics are stale. If that doesn't help, use CARDINALITY hints as shown above to provide a more accurate estimate, or if this fails, use ORDERED, USE_HASH or NO_USE_NL hints to advise the preferred join order and method.

Correlated sub-query expressions in the SELECT clause

Personally, I find sub-query expressions in the SELECT clause insidious and annoying. They serve almost no useful purpose and are the only guaranteed way to kill the performance of a high-volume SQL. There is simply NO WAY to tune them; they must be avoided at all costs.

Consider the following query and TK*Prof output. The query selects all employees from EMP and picks up the department name from DEPT in a correlated sub-query expression

SELECT  empno
,       ename
,       (SELECT dname FROM dept WHERE deptno = emp.deptno) AS dname
FROM emp

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.00          0         13          0          14
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.00       0.00          0         13          0          14

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 88

Rows     Row Source Operation
-------  ---------------------------------------------------
      3  TABLE ACCESS BY INDEX ROWID DEPT (cr=5 pr=0 pw=0 time=0 us cost=1 size=13 card=1)
      3   INDEX UNIQUE SCAN SYS_C009725 (cr=2 pr=0 pw=0 time=0 us cost=0 size=0 card=1)(object id 70296)
     14  TABLE ACCESS FULL EMP (cr=8 pr=0 pw=0 time=0 us cost=3 size=182 card=14)

But this is extraordinary, n'est pas? Like the Nested Loops plan above, this plan shows that all 14 rows were read from EMP, and that a unique index scan was used to pick up matching rows in DEPT. But look at the row count against DEPT! 3 rows! How is that possible? There were 14 rows in EMP and therefore 14 lookups to perform.

Something very different to a standard Nested Loops join is going on here. It is definitely nesting (for each EMP get DEPT) but the buffer count of 13 tells us that it is doing less work than the Nested Loops join. Nested Loops has no cleverness whatsoever; if it tries to lookup a row that it has just retrieved a moment earlier, it still reads the blocks. Those blocks may well be in Oracle's buffer-cache (so you don't need to get them from disk), but Oracle must still access the buffer cache and find the desired row amongst the potentially hundreds in the block. This is cheaper than disk I/O, but it's not free. That's why SQL Trace records the buffer cache reads and reports them in TKProf.

What has happened here is that Oracle has performed 14 nested operations to retrieve a DEPT row, but only 3 of those operations have been recorded. What happened to the other 11? What type of operation did Oracle perform? In my database there are 4 rows in DEPT of which only 3 are referenced by EMP.DEPTNO, so this kind of makes sense; the three operations that it recorded were the first lookup of each of the three departments. The other 11 lookups were satisfied by some type of cache other than the buffer-cache. Oracle is a bit cagey about this; I can find no reference to it in the Performance Tuning manual.

So how magical is this other cache? Is it the same as the Hash Area? Hash area retrievals in hash joins are not recorded in SQL Trace, so it could be the same thing. This would be awesome, because the Hash Area can be made quite large, and hits against it are much cheaper that buffer-cache hits. Let's up the ante and find out; the following script creates BIGEMP and BIGDEPT 10-times larger than EMP and DEPT.

SQL> CREATE TABLE bigemp (
  2          empno primary key, ename, job, mgr, hiredate, sal, comm, deptno
  3  ) AS
  4  SELECT     empno+x AS empno, ename, job, mgr, hiredate, sal, comm, deptno+x AS deptno
  5  FROM       emp
  6  CROSS JOIN    (select level*10000 AS x FROM dual CONNECT BY level <= 10)
  7  /

Table created.

SQL>
SQL> CREATE TABLE bigdept (
  2          deptno PRIMARY KEY, dname, loc
  3  ) AS
  4  SELECT     deptno+x AS deptno, dname, loc
  5  FROM       dept
  6  CROSS JOIN    (select level*10000 AS x FROM dual CONNECT BY level <= 10)
  7  /

Table created.

SQL>
SQL> exec dbms_stats.gather_table_stats(user, 'BIGEMP')

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(user, 'BIGDEPT')

PL/SQL procedure successfully completed.

SELECT  empno
,       ename
,       (SELECT dname FROM bigdept dept WHERE deptno = emp.deptno) AS dname
FROM bigemp emp

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch       11      0.00       0.00          0         54          0         140
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       13      0.00       0.00          0         54          0         140

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 88

Rows     Row Source Operation
-------  ---------------------------------------------------
     30  TABLE ACCESS BY INDEX ROWID BIGDEPT (cr=41 pr=0 pw=0 time=0 us cost=1 size=15 card=1)
     30   INDEX UNIQUE SCAN SYS_C0016601 (cr=11 pr=0 pw=0 time=0 us cost=0 size=0 card=1)(object id 84747)
    140  TABLE ACCESS FULL BIGEMP (cr=13 pr=0 pw=0 time=1 us cost=3 size=2240 card=140)

It scales perfectly! Although it has performed 140 nested operations to retrieve department information, only 30 of them recorded as table lookups. Let's try a 100-times larger data set:

SELECT  empno
,       ename
,       (SELECT dname FROM bigdept dept WHERE deptno = emp.deptno) AS dname
FROM bigemp emp

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch       95      0.00       0.00          0        589          0        1400
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       97      0.00       0.00          0        589          0        1400

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 88

Rows     Row Source Operation
-------  ---------------------------------------------------
    388  TABLE ACCESS BY INDEX ROWID BIGDEPT (cr=484 pr=0 pw=0 time=0 us cost=1 size=16 card=1)
    388   INDEX UNIQUE SCAN SYS_C0016605 (cr=96 pr=0 pw=0 time=0 us cost=0 size=0 card=1)(object id 84755)
   1400  TABLE ACCESS FULL BIGEMP (cr=105 pr=0 pw=0 time=10 us cost=5 size=23800 card=1400)

Woops! We were expecting (hoping?) to see 1400 rows retrieved from BIGEMP resulting in only 300 lookups on BIGDEPT, but it has all gone horribly wrong. Instead of 300 lookups there are 388. Our seemingly magical cache has topped out and cannot hold even 300 rows. At around 30 bytes per row, that's around 10Kb at best. The block-size on this database is 8K. Coincidence? Who can tell?

So we don't really know exactly what's going here. Obviously there is some kind of results-cache in action, but equally obvious it that it is not terribly useful for high volume SQL (these tests were performed on 11g R1).

So what's the impact? Let's compare to our preferred high-volume join method: HASH:

SELECT  /*+ CARDINALITY(bigemp, 1000000) CARDINALITY(bigdept,10000)*/ empno
,       ename
,       dname
FROM bigemp
JOIN bigdept USING (deptno)

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch       95      0.00       0.00          0        109          0        1400
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       97      0.00       0.00          0        109          0        1400

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 88

Rows     Row Source Operation
-------  ---------------------------------------------------
   1400  HASH JOIN  (cr=109 pr=0 pw=0 time=30 us cost=184 size=825000000 card=25000000)
    400   TABLE ACCESS FULL BIGDEPT (cr=4 pr=0 pw=0 time=4 us cost=3 size=160000 card=10000)
   1400   TABLE ACCESS FULL BIGEMP (cr=105 pr=0 pw=0 time=8 us cost=16 size=17000000 card=1000000)

Look at that! HASH processed seemingly more rows (400 departments instead of 388) but it read only 109 buffers whereas the subquery expression read 589. That is a 400% overhead. Remember that we are still dealing with relatively small volumes here; both tables and the index are in the buffer-cache. This effect multiplies even further when the buffer-cache fills and the query starts performing some disk I/O.

Lesson: Never, ever, ever use correlated scalar subquery expressions in high-volume SQL.

One final note: un-correlated subquery expressions do not suffer the same problem. Since they do not reference any columns in the outer query, they can be performed once only, and the results applied to every row.

SELECT  empno
,       ename
,       (SELECT MAX(dname) FROM bigdept dept) AS dname
FROM bigemp emp

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch       95      0.00       0.00          0        109          0        1400
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       97      0.00       0.00          0        109          0        1400

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 88

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=4 pr=0 pw=0 time=0 us)
    400   TABLE ACCESS FULL BIGDEPT (cr=4 pr=0 pw=0 time=2 us cost=3 size=4000 card=400)
   1400  TABLE ACCESS FULL BIGEMP (cr=105 pr=0 pw=0 time=4 us cost=5 size=15400 card=1400)

UPDATE - subqueries in the SET clause

Consider the following SQL that gives everyone a departmental average 4% salary increase

UPDATE  emp a
SET     sal = (
        SELECT  a.sal + trunc(0.04 * avg(b.sal), 2)
        FROM    emp b
        WHERE   b.deptno = a.deptno
)

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0         91         30          14
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.00       0.00          0         91         30          14

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 88

Rows     Row Source Operation
-------  ---------------------------------------------------
      0  UPDATE  EMP (cr=91 pr=0 pw=0 time=0 us)
     14   TABLE ACCESS FULL EMP (cr=7 pr=0 pw=0 time=2 us cost=3 size=98 card=14)
     12   SORT AGGREGATE (cr=84 pr=0 pw=0 time=0 us)
     59    TABLE ACCESS FULL EMP (cr=84 pr=0 pw=0 time=10 us cost=3 size=35 card=5)

See what's happening? For each of the 14 rows in EMP (line 2 of the plan), Oracle performs a FULL scan of EMP to get the departmental average salary. Look at the 4th line of the plan; 59 rows returned from the sub-query, but that does not tell you haw many rows were scanned and filtered out. The real number is much higher!

Even so, 59 rows read from a 14 row table. There's a bit of wastage going on there.

I know what you're thinking. Why not index EMP.DEPTNO, that'll fix everything.

SQL> create index emp_dept on emp(deptno) ;

Index created.

SQL> exec dbms_stats.gather_table_stats(user,'EMP');

PL/SQL procedure successfully completed.


UPDATE  emp a
SET     sal = (
        SELECT  a.sal + trunc(0.04 * avg(b.sal), 2)
        FROM    emp b
        WHERE   b.deptno = a.deptno
)

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0         31         30          14
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.00       0.00          0         31         30          14

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 88

Rows     Row Source Operation
-------  ---------------------------------------------------
      0  UPDATE  EMP (cr=31 pr=0 pw=0 time=0 us)
     14   TABLE ACCESS FULL EMP (cr=7 pr=0 pw=0 time=2 us cost=3 size=98 card=14)
     12   SORT AGGREGATE (cr=24 pr=0 pw=0 time=0 us)
     59    TABLE ACCESS BY INDEX ROWID EMP (cr=24 pr=0 pw=0 time=25 us cost=2 size=35 card=5)
     59     INDEX RANGE SCAN EMP_DEPT (cr=12 pr=0 pw=0 time=6 us cost=1 size=0 card=5)(object id 84768)

Yes, it reduced the total buffers read from the buffer-cache (91 down to 31), but it still performed way too much redundant I/O: 59 rows from the index and the table.

The problem with correlated SET clause subqueries - like subquery expressions - is that they always nest; they are UNTUNEABLE.

In order to maximise performance of a high-volume update with a correllated SET sub-query, it is necessary to rewrite the statement. This can be done in two ways:

  • Updateable Join Views
  • MERGE statement

Solution 1 - Updateable Join Views

The Oracle UPDATE statement permits updates not only on tables, but also Views and Inline Views. If we can define a view that presents both the current value of a column and the intended new value, then it permits the following syntax:

UPDATE view_name
SET curr_col_val = new_col_val
WHERE ....

Note that this syntax does not contain a sub-query expression; if the view can be defined in a way that avoids nesting, then the update can be performed without nesting. As we shall see, this technique is limited by a condition called Key Preservation. Let's look at it in action:

Here is a SQL that gives us the raw data required for the update, without nesting

  1  SELECT old.empno, old.deptno, old.sal, old.sal + new.incr AS new_sal
  2  FROM   emp old
  3  JOIN (
  4      SELECT  deptno, trunc(0.04 * avg(sal), 2) AS incr
  5      FROM    emp
  6      GROUP BY deptno
  7  ) new
  8* ON     old.deptno = new.deptno

     EMPNO     DEPTNO        SAL    NEW_SAL
---------- ---------- ---------- ----------
      7900         30        950    1012.66
      7844         30       1500    1562.66
      7654         30       1250    1312.66
      7521         30       1250    1312.66
      7499         30       1600    1662.66
      7698         30       2850    2912.66
      7876         20       1100       1187
      7369         20        800        887
      7902         20       3000       3087
      7788         20       3000       3087
      7566         20       2975       3062
      7934         10       1300    1416.66
      7782         10       2450    2566.66
      7839         10       5000    5116.66

14 rows selected.

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |    65 |  4225 |     8  (25)| 00:00:01 |
|*  1 |  HASH JOIN           |      |    65 |  4225 |     8  (25)| 00:00:01 |
|   2 |   TABLE ACCESS FULL  | EMP  |    14 |   546 |     3   (0)| 00:00:01 |
|   3 |   VIEW               |      |    14 |   364 |     4  (25)| 00:00:01 |
|   4 |    HASH GROUP BY     |      |    14 |   364 |     4  (25)| 00:00:01 |
|   5 |     TABLE ACCESS FULL| EMP  |    14 |   364 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------

What we want to do is to treat this SQL as a view, set SAL to NEW_SAL, and have the results applied back to the EMP table.

  1  UPDATE (
  2     SELECT old.empno, old.deptno, old.sal, old.sal + new.incr AS new_sal
  3     FROM   emp old
  4     JOIN (
  5         SELECT  deptno, trunc(0.04 * avg(sal), 2) AS incr
  6         FROM    emp
  7         GROUP BY deptno
  8     ) new
  9     ON     old.deptno = new.deptno
 10  )
 11* SET sal = new_sal
SET sal = new_sal
    *
ERROR at line 11:
ORA-01779: cannot modify a column which maps to a non key-preserved table

This is one of the limitations of updateable join views: when updating a table via a join-view, it may only join to other tables using their primary or unique key. This concept is discussed in more detail in the Oracle manuals. The problem here of course is that the EMP table is joined to an inline view that cannot carry a unique/primary key even though the GROUP BY guarantees uniquess of deptno.

One way around this is use a Global Temporary Table instead of an inline view since a GTT can be created with a primary key.

CREATE GLOBAL TEMPORARY TABLE emp_upd (
        deptno PRIMARY KEY
,       incr
) ON COMMIT PRESERVE ROWS
AS
SELECT deptno, trunc(0.04 * avg(sal), 2) AS incr
FROM    emp
GROUP BY deptno

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          1          0           0
Execute      1      0.00       0.01          0          7          9           3
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.00       0.01          0          8          9           3


UPDATE (
    SELECT old.empno, old.deptno, old.sal, old.sal + new.incr AS new_sal
    FROM   emp old
    JOIN   emp_upd new
    ON     old.deptno = new.deptno
)
SET sal = new_sal

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          2          0           0
Execute      1      0.00       0.00          0         10         27          14
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.00       0.00          0         12         27          14

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 66

Rows     Row Source Operation
-------  ---------------------------------------------------
      0  UPDATE  EMP (cr=10 pr=0 pw=0 time=713 us)
     14   HASH JOIN  (cr=10 pr=0 pw=0 time=294 us)
      3    TABLE ACCESS FULL EMP_UPD (cr=3 pr=0 pw=0 time=29 us)
     14    TABLE ACCESS FULL EMP (cr=7 pr=0 pw=0 time=20 us)

There are some additional overheads to this method not shown above. Whilst the creation of the GTT is quite efficient, it must also build an index and sample the results to generate statistics. Overall, the nested subquery expression is more efficient over such small volumes (14 rows). Let's take a look at a sample 100 times larger:

UPDATE  bigemp a
SET     sal = (
        SELECT  a.sal + trunc(0.04 * avg(b.sal), 2)
        FROM    bigemp b
        WHERE   b.deptno = a.deptno
)

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          2          0           0
Execute      1      0.09       0.09          4       2505       2919        1400
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.10       0.09          4       2507       2919        1400

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 66

Rows     Row Source Operation
-------  ---------------------------------------------------
      0  UPDATE  BIGEMP (cr=2521 pr=4 pw=0 time=90826 us)
   1400   TABLE ACCESS FULL BIGEMP (cr=12 pr=0 pw=0 time=2829 us)
   1200   SORT AGGREGATE (cr=2489 pr=4 pw=0 time=34982 us)
   5900    TABLE ACCESS BY INDEX ROWID BIGEMP (cr=2489 pr=4 pw=0 time=26630 us)
   5900     INDEX RANGE SCAN BIGEMP_DEPTNO (cr=1217 pr=4 pw=0 time=14675 us)(object id 73563)




CREATE GLOBAL TEMPORARY TABLE emp_upd (
        deptno PRIMARY KEY
,       incr
) ON COMMIT PRESERVE ROWS
AS
SELECT deptno, trunc(0.04 * avg(sal), 2) AS incr
FROM    bigemp
GROUP BY deptno
/

UPDATE (
    SELECT old.empno, old.deptno, old.sal, old.sal + new.incr AS new_sal
    FROM   bigemp old
    JOIN   emp_upd new
    ON     old.deptno = new.deptno
)
SET sal = new_sal
/

OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        3      0.00       0.00          0          3          0           0
Execute      4      0.03       0.06          0         31       1466        1700
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        7      0.04       0.07          0         34       1466        1700

Misses in library cache during parse: 2


OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse       29      0.00       0.00          0          0          0           0
Execute     34      0.01       0.01          1         29        106          12
Fetch       47      0.00       0.00          0         82          0          29
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      110      0.02       0.02          1        111        106          41

Remember that 1400 rows is still a relatively small sample; real life updates are much, much larger. The TK*Prof snippet above for the Global Temporary Table shows the session totals in order to account for all of the overheads. Even so, we can see that it performs substantially less IO (note the disk, query, and current columns) than the nested version. This effect is magnified as the sample gets larger.

Solution 2 - MERGE

A brilliant solution for unnesting bulk updates was introduced in Oracle v9i: the MERGE statement. The idea of MERGE is to apply a series of transactions to a table by updating rows if they exist and inserting them if they do not. By careful construction of the statement, you can engineer MERGE to only update.

MERGE
INTO  bigemp old
USING (
    SELECT bigemp.empno, bigemp.sal + incr.incr AS sal
    FROM   bigemp
    JOIN   (
        SELECT deptno, trunc(0.04 * avg(sal), 2) AS incr
        FROM   bigemp
        GROUP BY deptno
    ) incr
    ON     bigemp.deptno = incr.deptno
) new
ON ( old.empno = new.empno)
WHEN MATCHED THEN UPDATE SET sal = new.sal
WHEN NOT MATCHED THEN INSERT (empno) VALUES (1/0)

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.03       0.05         12        112          0           0
Execute      1      0.03       0.03          0         39       1434        1400
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.06       0.09         12        151       1434        1400

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 66

Rows     Row Source Operation
-------  ---------------------------------------------------
      2  MERGE  BIGEMP (cr=39 pr=0 pw=0 time=37806 us)
   1400   VIEW  (cr=39 pr=0 pw=0 time=15085 us)
   1400    HASH JOIN OUTER (cr=39 pr=0 pw=0 time=12280 us)
   1400     HASH JOIN  (cr=26 pr=0 pw=0 time=7658 us)
    300      VIEW  (cr=13 pr=0 pw=0 time=2515 us)
    300       SORT GROUP BY (cr=13 pr=0 pw=0 time=1609 us)
   1400        TABLE ACCESS FULL BIGEMP (cr=13 pr=0 pw=0 time=85 us)
   1400      TABLE ACCESS FULL BIGEMP (cr=13 pr=0 pw=0 time=41 us)
   1400     TABLE ACCESS FULL BIGEMP (cr=13 pr=0 pw=0 time=1431 us)

Note the disk, query, and current columns are all significantly less than the nested example for 1400 rows shown earlier, and are comparable to the Updateable Join View.

The WHEN NOT MATCHED THEN INSERT clause was made optional in 10g, but is mandatory in v9i. The 1/0 value causes a divide-by-zero error that aborts a badly designed statement and prevents unwanted inserts.

Functions

Extreme care should be taken using stored functions in high-volume SQL. Functions are often used to encapsulate complex code that can then be re-used in many places. The problem with this approach is that when the function is used in a high volume SQL, the complex code is executed for every row encountered. An otherwise simple-looking SQL that returns 100,000 rows will - behind the scenes - execute 100,000 (or more!) SQLs.

Not all functions are problematic:

  • Functions called in the WHERE clause with no parameters, or with constant values as parameters, will execute only once. The value returned will then be used like a constant or bind variable in the SQL.
  • Functions called in the SELECT clause with no parameters, or with constant values as parameters, will execute only once if they have been defined as DETERMINISTIC. Non-deterministic functions (see the Oracle PL/SQL reference) in the SELECT clause will execute once for every row selected.
  • Functions that contain only PL/SQL code are much more efficient than those that include SQL statements. Such functions are not ideal, but they do not tend to result in nested IO.

Importantly, correllated function calls (where a table column value is passed as a parameter) will execute once for each row encountered. This is somewhat mitigated in 11g with its results cache but is still worthy of caution.

create table bigemp1 AS select * from bigemp;

CREATE OR REPLACE FUNCTION dept_sal_avg (
    p_deptno IN bigemp.deptno%TYPE
) RETURN bigemp.sal%TYPE IS
    ret bigemp.sal%TYPE;
BEGIN
    SELECT avg(sal)
    INTO   ret
    FROM   bigemp
    WHERE  deptno = p_deptno;

    RETURN(ret);
END;
/

UPDATE bigemp1
SET    sal = sal + (0.04 * trunc(dept_sal_avg(deptno), 2))

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.01          0          1          1           0
Execute      1      0.11       0.12          0         12       2867        1400
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.12       0.13          0         13       2868        1400

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 66

Rows     Row Source Operation
-------  ---------------------------------------------------
      0  UPDATE  BIGEMP1 (cr=4353 pr=4 pw=0 time=187756 us)
   1400   TABLE ACCESS FULL BIGEMP1 (cr=12 pr=0 pw=0 time=2820 us)

********************************************************************************

SELECT AVG(SAL)
FROM
 BIGEMP WHERE DEPTNO = :B1


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute   1400      0.04       0.02          0          2          0           0
Fetch     1400      0.02       0.03          0       4302          0        1400
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     2801      0.07       0.05          0       4304          0        1400

Keen observers will notice the use of table bigemp1 as a copy of bigemp to avoid mutating updates. I could have demonstrated the principle with a pure SELECT, but this example ties in nicely with the previous section.

Note the 1400 separate calls to the function and the high query buffer IO. These figures are nearly identical to the nested update shown in the previous section.

Notwithstanding the exceptions above, functions containing SQL statements should never be called in high-volume SQL. The only alternative is to re-code the logic of the function into the SQL statement itself whilst avoiding the other nested IO pitfalls (especially subquery expressions in the SELECT clause!).

Triggers

Triggers bearing the FOR EACH ROW clause, not surprisingly, execute once for each row inserted / updated / deleted. The profile of such a statement is exactly the same as that of the correllated function above.

UPDATE bigemp1
SET    sal = trunc(1.04 * sal, 2)

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          1          0           0
Execute      1      0.17       0.17          0         13       2880        1400
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.17       0.17          0         14       2880        1400

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 66

Rows     Row Source Operation
-------  ---------------------------------------------------
      0  UPDATE  BIGEMP1 (cr=4335 pr=4 pw=0 time=254618 us)
   1400   TABLE ACCESS FULL BIGEMP1 (cr=12 pr=0 pw=0 time=2824 us)

********************************************************************************

SELECT :B1 + TRUNC(0.04 * AVG(SAL), 2)
FROM
 BIGEMP WHERE DEPTNO = :B2


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute   1400      0.04       0.03          0          2          0           0
Fetch     1400      0.03       0.04          0       4302          0        1400
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     2801      0.08       0.08          0       4304          0        1400

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 66     (recursive depth: 1)

Rows     Row Source Operation
-------  ---------------------------------------------------
   1400  SORT AGGREGATE (cr=4302 pr=0 pw=0 time=50346 us)
   7000   TABLE ACCESS BY INDEX ROWID BIGEMP (cr=4302 pr=0 pw=0 time=40308 us)
   7000    INDEX RANGE SCAN BIGEMP_DEPTNO (cr=2818 pr=0 pw=0 time=21291 us)(object id 73554)

As for functions, avoid triggers containing SQL statements on tables subject to bulk DML. Any SQL performed in triggers must be migrated to the application code, taking care to avoid all other nested IO issues described in this article.

Sub-queries

The case of sub-queries and nesting used to be simple: with the exception of uncorrelated IN subqueries, all sub-queries would nest. Period. This was great for low-volume SQL, rubbish for high-volume SQL. Easy: use joins instead for high-volume SQL.

Things started to change in v7.3 with Anti Joins that permitted NOT IN to be unnested by the optimiser (effectively converting them into an outer join that discards matched rows). In 8i, Semi-Joins allowed correlated EXISTS sub-queries to be unnested; but more importantly the /*+UNNEST*/ hint internally converted IN subqueries into EXISTS and NOT EXISTS subqueries into NOT IN so that Anti- and Semi-Joins could unnest almost any sub-query.

The revolution was completed in 9i with automated unnesting; the optimiser determines whether unnesting would be beneficial (any high-volume SQL) and applies it automatically. No more hints! As a result, sub-queries are almost never a problem for high-volume SQL any more. But when they are a problem they catch you completely off-guard.

So what's the problem if Oracle does it all for you? There are two:

  1. Oracle will sometimes think your high-volume query is low-volume and won't unnest.
  2. Oracle cannot unnest every subquery. You need to understand the restrictions.

How to tell if your sub-query is nesting

Run your High Volume SQL through Explain Plan. Look for any NESTED LOOPS lines or FILTER lines with two child steps. Nested Loops joins typically relate to join statements, but they can also come from sub-queries re-written by the optimiser. They usually mean that the sub-query is still nesting, even though the optimiser has rewritten (but not unnested) the query. FILTER steps with one child step are usually related to complex queries with views or aggregation but they are relatively harmless. FILTER steps with two child steps come from sub-queries; they mean that the sub-query is nesting.

This is an example of a nested sub-query. Note the FILTER step.

  1  select *
  2  from emp
  3  where deptno = (
  4     select dept.deptno
  5     from dept
  6     where dname = 'RESEARCH'
  7     and deptno = emp.deptno
  8* )

--------------------------------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |     1 |    87 |     5   (0)| 00:00:01 |
|*  1 |  FILTER                      |             |       |       |            |          |
|   2 |   TABLE ACCESS FULL          | EMP         |    14 |  1218 |     3   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS BY INDEX ROWID| DEPT        |     1 |    22 |     2   (0)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN         | SYS_C009534 |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   1 - filter("DEPTNO"= (SELECT /*+ */ "DEPT"."DEPTNO" FROM "DEPT" "DEPT" WHERE
              "DEPTNO"=:B1 AND "DNAME"='RESEARCH'))
   3 - filter("DNAME"='RESEARCH')
   4 - access("DEPTNO"=:B1)

The next example shows a SQL with the same functionality that has been automatically unnested by the optimiser. Note the HASH JOIN SEMI.

  1  select *
  2  from emp
  3  where exists (
  4     select 1
  5     from dept
  6     where dname = 'RESEARCH'
  7     and deptno = emp.deptno
  8* )

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     5 |   545 |     7  (15)| 00:00:01 |
|*  1 |  HASH JOIN SEMI    |      |     5 |   545 |     7  (15)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| EMP  |    14 |  1218 |     3   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| DEPT |     1 |    22 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------


   1 - access("DEPTNO"="EMP"."DEPTNO")
   3 - filter("DNAME"='RESEARCH')

Unnesting subqueries

The optimizer can unnest almost any uncorrelated sub-query and even most simple IN, NOT IN, EXISTS, and NOT EXISTS sub-queries.

If the subquery of a high-volume SQL is simple or uncorrelated and is still nesting, try:

  • Gather statistics on all tables and indexes using DBMS_STATS.GATHER_TABLE_STATS()
  • Check you are using the Cost Based Optimizer, not the Rule based Optimizer.
  • Try adding an /*+UNNEST*/ hint to the sub-query.

If your Explain Plan shows a step NESTED LOOPS (SEMI) step then the optimiser is unnesting a sub-query into a Nested Loops semi-join; this may not improve the situation. The /*+UNNEST*/ hint will not help because the sub-query is already unnested; it may be necessary to add a /*+HASH_SJ*/ (for correlated sub-queries with = predicates) or /*+MERGE_SJ*/ (for correlated sub-queries with > or < predicates) hint.

If your sub-query is complex and correlated, you will have to code it another way. Perhaps materialise some interim results in a Global Temporary Table.

Un-tunable sub-queries

You should never, ever do any of the following in the correlated sub-query of a high-volume SQL:

  • Nest two or more levels deep, and correlate a column not in the immediate parent block
  • Correllate a HAVING clause
  • Use a set operator, analytic function, MODEL clause, CONNECT BY in the sub-query.
  • Correlate a non-scalar sub-query with a comparison operator (WHERE col > (SELECT col ... correlated sub-query ...)). Even = conditions will nest. Use MAX/MIN/AVG to make the sub-query scalar - even if selecting on a unique key.

These types of correllated sub-queries will always nest.

Conclusion

All we've discussed here is the perils of nesting. Is tuning high-volume SQL that simple? The answer is: "sort of".

It's not quite as cut and dried as I've made out; not all nesting is bad. If you have a query that joins 500,000 rows from a 1,000,000 row table (high volume) to 500,000 rows from a 100,000,000 row table, then nesting might be the way to go. A full scan of 100 million rows is a pretty expensive way of reading the half-million that you need.

Rather that thinking in terms of high- and low-volume queries, you need to evaluate each table on its merits. How many rows are there? How many do I need?

Another exception - as John Watson (OCM) identifies in the comments below - is a nested loop on an index without a table lookup. If the inner loop of a nested loop join can get all the columns it needs from the index, then even in a high-volume scenario, the nested loop will probably out-perform the alternatives.

The simple don'ts in this article are easy to avoid; functions, scalar sub-queries, triggers, etc. It gets much more complex when you eliminate all of these things but are still left with a nested step in the Explain Plan. Like many things, the first step is to recognise that there is a problem; hopefully this article helps in that regard.

Comments

Kevin Meade's picture

I was looking for this information. Your descriptions have help me greatly with some work I must do tomorrow, what a co-incidence. Thanks man, Kevin Meade

I'm not sure if one should never use scalar subqueries like:

SELECT empno
, ename
, (SELECT dname FROM bigdept dept WHERE deptno = emp.deptno) AS dname
FROM bigemp emp

Joining maybe faster but what should you choose if you have to outer join?

I think

SELECT empno
, ename
, (SELECT dname FROM bigdept dept WHERE deptno = emp.deptno) AS dname
FROM bigemp emp

is faster than

SELECT empno
, ename
, dname
FROM bigemp emp
, bigdept dept
where emp.deptno = dept.deptno (+)

Try it and find out. Make sure you trace and use Tkprof. Ensure that you get a hash join in the outer join option, and post the results here.

Outer joins attract some type of mystical fear and loathing that they are somehow less performant than inner joins. I usually have a reasonable insight into these myths, but I have no idea how that one came about.

The truth is that Oracle supports four join methods: Nested Loops, Sort-Merge, Hash, and Cluster. All of them work equally well as inner joins and outer joins. Correllated Scalar Sub-Query Expressions are the functional equivalent of a Nested Loops join, so their use merely restricts your available toolset.

Never, ever, ever use correlated scalar subqueries!

What about using analytics ?(Oracle 10gR2)

UPDATE  scott.emp a
SET     sal = sal + (
        SELECT  trunc(0.04 * avg(b.sal) over (partition by deptno),2)
        FROM    scott.emp b
        WHERE   b.empno = a.empno
)

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0         23          4          14
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.00       0.00          0         23          4          14

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 62  

Rows     Row Source Operation
-------  ---------------------------------------------------
      0  UPDATE  EMP (cr=23 pr=0 pw=0 time=2247 us)
     14   TABLE ACCESS FULL EMP (cr=7 pr=0 pw=0 time=306 us)
     14   WINDOW BUFFER (cr=16 pr=0 pw=0 time=1330 us)
     14    TABLE ACCESS BY INDEX ROWID EMP (cr=16 pr=0 pw=0 time=664 us)
     14     INDEX UNIQUE SCAN PK_EMP (cr=2 pr=0 pw=0 time=265 us)(object id 51252)

)

Yep, that's nested too. Steps 3-5 of that plan are executed for each row returned by step 2. Increase the volumes and see how it compares to MERGE.

1) Correlated sub-query expressions in the SELECT clause
Sorry,
but generally

SELECT  empno
,       ename
,       (SELECT dname FROM dept WHERE deptno = emp.deptno) AS dname
FROM emp

Is not equivalent to
SELECT  empno
,       ename
,       dname
FROM emp
JOIN dept USING (deptno)

but rather to
SELECT  empno
,       ename
,       dname
FROM emp
LEFT OUTER JOIN dept USING (deptno)

2) Functions

Can't get this affirmation:

Quote:
Keen observers will notice the use of table bigemp1 as a copy of bigemp to avoid mutating updates.

Could you explain, please ?

3) Conclusion

I think your rather contradicting yourself. If

Quote:

...
It's not quite as cut and dried as I've made out; not all nesting is bad. If you have a query that joins 500,000 rows from a 1,000,000 row table (high volume) to 500,000 rows from a 100,000,000 row table, then nesting might be the way to go. ...

means that we can use scalar sub-queries (more precisely correlated scalar-subquery) because, as you are saying, nesting is not always bad.

1) Agreed, a left outer join is the more appropriate comparison. However the principle remains the same: Oracle is able to Hash Left Outer Join from both directions (use either the inner or outer table as the hash table), so performance-wise the example stands.

2) If my function had referenced the same table that was being updated it would have raised a mutating table error, so I made the function and UPDATE reference different tables. Obviously this is not a real-life example, but it does what I wanted, which is to demonstrate the nature of function calls.

3) Look at the example again. I thought I was being clear, but obviously not. Full Table Sans are faster row-for-row than index scans. Of course, we often scan more rows in a Full Table Scan. Is a full table scan of 1,000,000 rows faster than an indexed scan of 500,000 rows? Absolutely yes! Is a full table scan of 100,000,000 rows faster than an indexed scan of 500,000 rows? Probably not. This last example is an index scan of 0.5% of the table; whilst this is not "low-volume SQL" by any stretch or the imagination, it is processing only a very small proportion of the table.

The new SQL Result Cache in 11g (this is probably the best article available at this time: http://www.oracle.com/technology/pub/articles/oracle-database-11g-top-features/11g-caching-pooling.html)
In 11g, Oracle will now cache the results of queries and PL/SQL blocks within a session. Which totally invalidates some older rules of thumb.

Interesting article, and one I will have to keep in mind, as a counter to my usual experience. My usual experience being, when there is a query which is taking hours, and it should take minutes, it is because Oracle is doing a hash join, and the fix is make it do a nested loops join. Obviously, one size does not fit all.

Results cache is interesting - it should narrow the gap for scalar function calls. Even though it will result in fewer function calls, those calls will still be nested and will result in less efficient IO over a large data set. I haven't done any benchmarking, but I don't expect to be changing my position on the issue of Functions.

If Nested Loops is out-performing Hash Join / Full Scan, it's because you are dealing with Low Volume SQL, not High Volume SQL (the subject of this article). I usually draw the line anywhere between 1% and 10% of a table. If your SQL requires more than 10% of a large table to produce its results, then I contend Indexed Nested Loops will never out-perform Hash Join / FTS except in extraordinary cases deliberately designed to confound hash joins. Conversely, if your SQL needs less than 1% - Indexed NL will always out-perform FTS / HJ if the index is well designed. In-between 1 and 10% - benchmark.

thanks .
"Never, ever, ever use correlated scalar subqueries".

Many thanks for this fantastic article.

Regards,
Vishwa....

Thanks for giving such a nice information.

I came across your article while trying to develop a little thought experiment: "what is the most efficient way to determine which rows are common to two row sets?" You emphasise the importance of avoiding correlated subqueries, as used by the iterative operations nested loop join and filters with two children. Generally, I would agree - but in this case I'm not sure.
My example is the OE demo schema: which products are stocked by both the warehouse in Toronto (warehouse_id=5) and Sydney (warehouse_id=9)? I have five equivalent SQLs, and the results surprised me.
First, the obvious solution which is to use a compund query:

orclz> set autot trace exp
orclz> select product_id from inventories where warehouse_id=5
  2  intersect
  3  select product_id from inventories where warehouse_id=9;

Execution Plan
----------------------------------------------------------
Plan hash value: 3944618082

------------------------------------------------------------------------------------
| Id  | Operation           | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |              |   114 |  1694 |     6  (34)| 00:00:01 |
|   1 |  INTERSECTION       |              |       |       |            |          |
|   2 |   SORT UNIQUE NOSORT|              |   114 |   798 |     3  (34)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN | INVENTORY_IX |   114 |   798 |     2   (0)| 00:00:01 |
|   4 |   SORT UNIQUE NOSORT|              |   128 |   896 |     3  (34)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN | INVENTORY_IX |   128 |   896 |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("WAREHOUSE_ID"=5)
   5 - access("WAREHOUSE_ID"=9)

orclz>
A cost of 6, which I'll take as my baseline.
Now two versions that use a correlated subquery:
orclz> select product_id from inventories where warehouse_id=5
  2  and product_id in (select product_id from inventories where warehouse_id=9);

Execution Plan
----------------------------------------------------------
Plan hash value: 409421562

----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |    80 |  1120 |     2   (0)| 00:00:01 |
|   1 |  NESTED LOOPS     |              |    80 |  1120 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN| INVENTORY_IX |   114 |   798 |     2   (0)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN| INVENTORY_IX |     1 |     7 |     0   (0)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("WAREHOUSE_ID"=5)
   3 - access("WAREHOUSE_ID"=9 AND "PRODUCT_ID"="PRODUCT_ID")

orclz> select product_id from inventories i where warehouse_id=5
  2  and exists (select product_id from inventories j where j.warehouse_id=9 and j.product_id=i.product_id);

Execution Plan
----------------------------------------------------------
Plan hash value: 1721271592

----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |    80 |  1120 |     2   (0)| 00:00:01 |
|   1 |  NESTED LOOPS SEMI|              |    80 |  1120 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN| INVENTORY_IX |   114 |   798 |     2   (0)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN| INVENTORY_IX |   128 |   896 |     0   (0)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("WAREHOUSE_ID"=5)
   3 - access("J"."WAREHOUSE_ID"=9 AND "J"."PRODUCT_ID"="I"."PRODUCT_ID")

orclz>
Both have been re-written to a join and given the same cost, which is far lower than my baseline. I do not understand why it is so low: the nested subquery which has to iterate 114 times is costed at zero. I had expected both these to be worse than the compound query.
Then I tried two versions of, in effect, manually coding the INTERSECT:
orclz> select distinct product_id from (
  2  (select product_id from inventories where warehouse_id=5)
  3  join
  4  (select product_id from inventories where warehouse_id=9)
  5  using (product_id));

Execution Plan
----------------------------------------------------------
Plan hash value: 49070421

-----------------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |              |    80 |  1120 |     3  (34)| 00:00:01 |
|   1 |  SORT UNIQUE NOSORT|              |    80 |  1120 |     3  (34)| 00:00:01 |
|   2 |   NESTED LOOPS SEMI|              |    80 |  1120 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN| INVENTORY_IX |   114 |   798 |     2   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN| INVENTORY_IX |   128 |   896 |     0   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("WAREHOUSE_ID"=5)
   4 - access("WAREHOUSE_ID"=9 AND "PRODUCT_ID"="PRODUCT_ID")

orclz> select product_id from
  2  (select product_id from inventories where warehouse_id=5
  3  union all
  4  select product_id from inventories where warehouse_id=9)
  5  group by product_id having count(*) > 1;

Execution Plan
----------------------------------------------------------
Plan hash value: 352515046

-------------------------------------------------------------------------------------
| Id  | Operation            | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |              |     5 |    20 |     5  (20)| 00:00:01 |
|*  1 |  FILTER              |              |       |       |            |          |
|   2 |   HASH GROUP BY      |              |     5 |    20 |     5  (20)| 00:00:01 |
|   3 |    VIEW              |              |   242 |   968 |     4   (0)| 00:00:01 |
|   4 |     UNION-ALL        |              |       |       |            |          |
|*  5 |      INDEX RANGE SCAN| INVENTORY_IX |   114 |   798 |     2   (0)| 00:00:01 |
|*  6 |      INDEX RANGE SCAN| INVENTORY_IX |   128 |   896 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(COUNT(*)>1)
   5 - access("WAREHOUSE_ID"=5)
   6 - access("WAREHOUSE_ID"=9)

orclz>
The first is re-written to the join with its correlated subquery and gets the same cost, with an extra 1 for removing duplicates. Fair enough. The second has done exactly what I asked it to do (no transformation) but it is still slightly cheaper than the INTERSECT.
My conclusion: perhaps nested subqueries are not as bad as one might think. The CBO is pretty good at transformations nowadays.
This was using release 12.1.0.2.
--
John Watson
Oracle Certified Master DBA
http://skillbuilders.com

Hi John. Sorry it's taken me almost a year to reply, but I come here so infrequently now... /excuse

I neglected to mention in this article, although I do make the point in an earlier article: http://www.orafaq.com/node/1403

Quote:
The exception to this rule - there's always one - is when all of the columns referenced in the SQL are contained in the index. If Oracle does not have to access the table then there is no break-even point; it is generally quicker to scan the index even for 100% of the rows.

In your example, the loop is accessing only the index - this is very quick indeed, especially when the entire index is cached. If you conspired to create a high-volume SQL where the inner loop hit the index AND the table, I feel confident that one of your join alternatives would fare better.

For clarity, I'll go back up into the article and note the exception, giving you credit for finding me out.