Annals of Oracle's Improbable Errors: subquery factoring

Showing posts with label subquery factoring. Show all posts

Friday, July 20, 2018

The importance of being aliased

Always check out the original article at http://www.oraclequirks.com for latest comments, fixes and updates.

I was writing a query returning the indexes to be monitored for a list of schemas for those tables having fewer than 5 indexes (not counting LOB indexes) and I was surprised to see that it returned no rows.
After checking what's wrong I realized that I had specified the wrong column for the included subquery, but how could it be that Oracle wasn't complaining about a non existing column?

with excluded as (select /*+ MATERIALIZE */ b.owner, b.index_name
                                      from dba_constraints b
                                     where b.constraint_type in ('P','U')
                                       and b.owner in ('A','B','C')
                                     ),
       included as (select /*+ MATERIALIZE */ owner, table_name
                                from dba_indexes
                               where owner in ('A','B','C') 
                                 and index_type != 'LOB' 
                               group by owner, table_name
                               having count(*) > 5
                            )
select 'ALTER INDEX ' || owner ||'.' || index_name || ' MONITORING USAGE;' as ddl
  from dba_indexes a
 where (a.owner, a.table_name) in (select owner, index_name
                                     from included
                              )
  and (a.owner, a.index_name) not in (select owner, index_name
                                        from excluded 
                                     )
  and a.owner in ('A','B','C')
  and a.index_type != 'LOB';

DDL

===

(no rows)

Soon I realized not only that I had specified the wrong column (I should have written table_name instead) but also that I hadn't specified table aliases and index_name was not a value coming from included, but from the outer view dba_indexes, which is just fine from Oracle's point of view but totally wrong for my purposes.

In order to prevent these mistakes, the SQL above needs to be written using ALWAYS table aliases in the subqueries and applying them to each relevant column:

  with excluded as (select /*+ MATERIALIZE */ b.owner, b.index_name
                                      from dba_constraints b
                                     where b.constraint_type in ('P','U')
                                       and b.owner in ('A','B','C')
                                     ),
       included as (select /*+ MATERIALIZE */ owner, table_name
                                from dba_indexes
                               where owner in ('A','B','C') 
                                 and index_type != 'LOB' 
                               group by owner, table_name
                               having count(*) > 5
                            )
  select 'ALTER INDEX ' || owner ||'.' || index_name || ' MONITORING USAGE;' as ddl
  from dba_indexes a
 where (a.owner, a.table_name) in (select i.owner, i.table_name
                                     from included i
                              )
  and (a.owner, a.index_name) not in (select e.owner, e.index_name
                                        from excluded e
                                     )
  and a.owner in ('A','B','C')
  and a.index_type != 'LOB';

In this case it was easy to guess that something was wrong because the query returned no results and I knew it was nonsense, but in complex queries made up possibly of several subqueries one can easily overlook the problem should some results being returned.

Table aliases must be used at all times!

Thursday, May 03, 2007

Generating rows magically - the many virtues of DUAL

I was reading one of the last issues of Oracle Magazine when i came across the following tip published by the #1 Oracle Guru, Tom Kyte:

with data
as
(select level n
from dual
connect by level <= 5) select * from data;

if you don't like to use the WITH syntax in such a simple subquery, you can rewrite it as follows:

select n
from (select level n
from dual
connect by level <= 5);

Edited on May 14, 2007: be sure to check out the comments section, there is a comment from Knut about this sql query. Re-edited on September 3, 2007: It turns out that in Oracle 9i there is something odd with the simplified version of the query (after removing the inline view, so check out the comments for the latest version). This short piece of SQL is especially useful because i was still using the old-fashioned select * from table(function(n)) that is a technique available since Oracle 8i where, in order to spawn an arbitrary number of records on the fly, you must populate an object table with the desired number of elements. Note that this technique is still useful when the data must be retrieved in a less straightforward fashion, as it happens usually with complex business rules. However, thanks to this smart technique that takes advantage of the dual table and a sort of fake hierarchical query, you can spawn a recordset made up of n-rows and without additional objects, that is using less resources, less memory and in a simpler, faster way. Now, i was wondering how nice it would be to have a sort of parametric view, something like:

select n from nrows;

Well, if you don't mind setting up a simple package:

CREATE PACKAGE pkg_nrows
as
n   integer;
function fn_n  return integer;
end;
/

CREATE PACKAGE BODY pkg_nrows
as
function fn_n   return integer
is
begin
return n;
end;
end;
/

CREATE VIEW NROWS ( N ) AS
select level n
from dual
connect by level <= pkg_nrows.fn_n

You can finally perform the query from view nrows, just after setting the value of the desired n value, that was defined as a packaged global variable.

begin
pkg_nrows.n := 5;
end;
/
select *
from nrows
/

N
-
1
2
3
4
5

And there you go with your freshly made n-tier dual table.

Hope it helps!

PS: as quirk of the day I'd mention the space characters handling inside pre-formatted text of Blogger's HTML editor, i can't make out how it works, it keeps trimming my spaces and line breaks randomly!

Monday, April 16, 2007

better SQL WITH subquery factoring

I beg you pardon if I'm not going to write about an Oracle quirk today.
On the contrary, I'll cover an interesting feature of Oracle that appeared for the first time with Oracle 9i and that only recently became one of my favorite lifesavers.

I am referring to the obscure and perhaps neglected subquery factoring feature, that is that strange syntax beginning with WITH.

If you open What's new in the SQL Reference (for Oracle 9i), i believe you can easily overlook this new query syntax flavor as it is not particularly emphasized, in spite of its virtues.

I mean, every time a new feature is introduced either you try them all soon or you just place a sort of mental bookmark on it while thinking hum, this sounds cool, I'll see where i can use it.
Then you completely forgot it.

Otherwise, if the description isn't too fancy, you may end up thinking that you can live perfectly well without it.

I must admit that probably the latter applied to my case as i didn't realize the existence of the subquery factoring clause until i read about it somewhere two years ago. But even at that time, i failed to see its potential, until i recently hit against a big SQL statement made up of a bunch of nested SQL subqueries containing bind variables.

One of the situations i see most frequently when developing with Oracle Application Express, is the need of splitting a list of string values into a recordset. This usually involves calling a function returning a user defined object type in form of an object table.

select column_value
from table(csv_to_table('firenze,roma,venezia'));

column_value
--------------
firenze
roma
venezia

Where csv_to_table is a function i created that splits up a string separated by commas by default.

Now, imagine that a colon separated list of values, returned by a multiselect list for instance, so typical of Oracle Application Express, serves as a parameter for a query, using a bind variable:

select column_value from table(csv_to_table(:user_supplied_list, ':'));

Suppose you need to perform some kind of operation involving each pair of items in the list, excluding the case where both items are the same.

This means creating a Cartesian product between two exact copies of our input list and perform such calculation:

WITH user_defined_values AS
( SELECT COLUMN_VALUE AS my_value
FROM TABLE(csv_to_table(:user_supplied_list, ':'))
)
SELECT my_function(a.my_value, b.my_value) as result
FROM user_defined_values a,
user_defined_values b
WHERE a.my_value != b.my_value;

The main advantage i see here is in that SQL statements become much easier to read.
This can be even more noticeable with more complex queries, like those taking two or three pages of conditions, GROUP BYs, OLAP functions, you know what i mean, those that you love so much, especially when they are written by someone else in a hurry...

So, this sample query can be seen as a sort of template that you can build upon for developing more complex structures while retaining some code readability.

For instance, this simple template could work well in a situation where you need to calculate the reciprocal distance between each pair of cities specified by a user, supposing you have a routing algorithm that can compute the distance between any two locations.

WITH list_of_locations AS
( SELECT COLUMN_VALUE AS location_name
FROM TABLE(csv_to_table(:user_supplied_list, ':'))
)
SELECT calculate_distance_between(a.location_name, b.location_name) as result
FROM list_of_locations a,
list_of_locations b
WHERE a.location_name != b.location_name;

Of course this is a simplification of a real problem, but let's suppose that locations are specified using a syntax like, "town,province,region", just to shake up things a little bit.

So, supposing we want to further refine the query above, by using a numerical primary key instead of the location name, we get:

WITH list_of_locations AS
(
SELECT location_id
  FROM locations
 WHERE (town,province,region) IN
(
  SELECT get_nth_from_csv(COLUMN_VALUE,1) as town,
         get_nth_from_csv(COLUMN_VALUE,2) as province,
         get_nth_from_csv(COLUMN_VALUE,3) as region
    FROM TABLE(csv_to_table(:user_supplied_list, ':'))
)
)
SELECT calculate_distance_between(a.location_id, b.location_id) as result
 FROM list_of_locations a,
      list_of_locations b
WHERE a.location_id != b.location_id;

Note also that for commutative functions, it's normally useless to compute both fn(A,B) and fn(B,A) as they would return the same result, so you can optimize the non-equijoin to return just a triangular matrix, provided the values can be compared using lesser-than (<) or greater than (>).

WITH list_of_locations AS
(
SELECT location_id
  FROM locations
 WHERE (town,province,region) IN
(
 SELECT get_nth_from_csv(COLUMN_VALUE,1) as town,
        get_nth_from_csv(COLUMN_VALUE,2) as province,
        get_nth_from_csv(COLUMN_VALUE,3) as region
   FROM TABLE(csv_to_table(:user_supplied_list, ':'))
)
)
SELECT calculate_distance_between(a.location_id, b.location_id) as result
FROM list_of_locations a,
list_of_locations b
WHERE a.location_id < b.location_id

Instead of N*(N-1) values, you end up with just N*(N-1)/2 values.

However, this holds only for commutative functions and certainly calculate_distance_between is not the best example of such a function.

I say this because yesterday i was on a trip with my family and we were visiting friends living in a small town near Rome.
We managed to arrive as close as 100 meters from the crossroad where theoretically we'd had to turn left, just to discover that some fanciful city planner recently decided to convert the main road from two-way to one-way.

All right i thought, piece of cake, let's go back and see how we can bypass it.
It was 12.00 PM.

At 12.25 we passed in front of the main town square for the third time in a row and i was yelling insults at the address of the mayor.

Then we decided to take a radical decision and go out of the town and see if our GPS could sort out this mess starting from a different point.
And so it did, by proposing a route of 4.8Km, around the town, a very pleasant journey through lovely hills and green fields, at the end of which we arrived at the destination from the opposite direction.

4.8 km instead of 100 meters, a small journey for a car, but a giant leap for mankind, especially for the heirs of the Romans, world's famous city planners.

When finally we met our friends they were quite surprised to see us right there because usually they have to go out and pick up the lost people along the route!
I never got lost in New York, in London or in Mexico City, but i was almost to the point of giving up the task in Mentana, a town of nearly 16,000 inhabitants or perhaps 20,000 including the sheep.

Moral:
a distance of 100 meters from B to A can easily increase up to 4800 meters when you swap A and B and you are in Mentana.

Happy factoring!

Annals of Oracle's Improbable Errors