Problems with IN, ANY, EXISTS and ALL

Discussion:

(too old to reply)

o***@yahoo.com.ar

2007-10-02 20:40:11 UTC

Hello:

I want to share with you an investigations I made related to the SQL weakness and interesting characteristics of this language on database processing in the following site:

www.dbtheoryinpractice.com

All the examples in this work have been verified with the DB2's SQL language.

I hope you enjoy it.

Knut Stolze

2007-10-04 19:36:03 UTC

Permalink

Post by o***@yahoo.com.ar
I want to share with you an investigations I made related to the SQL
weakness and interesting characteristics of this language on database
www.dbtheoryinpractice.com

I wouldn't call this "problems" but rather "possible pitfalls". Here are a
few comments...

If you have "v comparison NULL", then this will evaluate to "unknown". In
WHERE clauses, this "unknown" is interpreted like "false" (but it is not
the same as "false"). Thus, if you have "a.amount > ALL ( SELECT ... )",
the whole predicate will evaluate to "unknown" (if at least one row in the
subselect is NULL) after you applied the expansion for the ALL quantifier,
which will finally be interpreted like "false" and discard the row. Hence,
the only possible option for any database management system adhering to the
SQL standard is to return the empty set for your example. I would call
this a "user error" or a bad design choice made when developing the SQL
standard. (But the whole treatment of NULLs is rather convoluted in SQL.)

The INTERSECT example is indeed strange at a first glance. But the question
is how "unknown" comparisons are treated in the case of INTERSECT. (I
don't know for sure and I'm too lazy to look it up in SQL:2003 right now.)
If "NULL = xxx" is treated like "true" for set operations (as it is done in
CHECK constraints), then the results are correct and we have a user error -
again - if you expect a different result. Your reformulated query assumes
that "unknown" for set comparisons is treated like "false" as is done in
the WHERE clause. If this assumption is not correct, then the reformulated
query is simply not semantically equivalent.

The 3rd example is mostly a database design issue, as I gathered from the
initial paragraph.

--
Knut Stolze
DB2 z/OS Utilities Development
IBM Germany

o***@yahoo.com.ar

2007-10-06 15:02:35 UTC

Permalink

Thank you very much indeed for commenting some of my posts, you are the first one who did not comment my posts without sarcasm

Some aspect about your commnets:

In the first part of your comments I agree with you, as you can realise null could have some other interpretation beside from "unknown information", this is precisely the problem I try to point, and this may be a source of misinterpretation in SQL queries.

In the case of operator INTERSECT I can tell you that the problem is that in equivalent queries using INTERSECT and simple joins you get different results and further than the SQL standard, it is something that should be taken into account by the people who build DBMS, because as you can verify it is not a problem in an specific DBMS, I can dare to tell you that it is a problem in most of them.

O. F. Domejean

Post by Knut Stolze
The INTERSECT example is indeed strange at a first
glance. But the question
is how "unknown" comparisons are treated in the case
of INTERSECT. (I
don't know for sure and I'm too lazy to look it up in
SQL:2003 right now.)
If "NULL = xxx" is treated like "true" for set
operations (as it is done in
CHECK constraints), then the results are correct and
we have a user error -
again - if you expect a different result. Your
reformulated query assumes
that "unknown" for set comparisons is treated like
"false" as is done in
the WHERE clause. If this assumption is not correct,
then the reformulated
query is simply not semantically equivalent.
The 3rd example is mostly a database design issue, as
I gathered from the
initial paragraph.
--
Knut Stolze
DB2 z/OS Utilities Development
IBM Germany

Knut Stolze

2007-10-08 19:03:23 UTC

Permalink

Post by o***@yahoo.com.ar
In the case of operator INTERSECT I can tell you that the problem is that
in equivalent queries using INTERSECT and simple joins you get different
results and further than the SQL standard, it is something that should be
taken into account by the people who build DBMS, because as you can verify
it is not a problem in an specific DBMS, I can dare to tell you that it is
a problem in most of them.

I spent some time trying to track down the semantics in the standard. I
found that INTERSECT (and EXCEPT and UNION) are defined based on the notion
of "duplicate rows". I'm not 100% sure, but I believe that clause
8.10, "<unique predicate>" (Specify a test for the absence of duplicate
rows.) is applicable to determine when two rows are duplicates. In short,
it says that two rows are duplicates if the corresponding columns are both
either null or have the same value. Thus, two rows with only NULLs in the
columns are considered as duplicates, which explains that the INTERSECT
operator returns those rows.

If a DBMS doesn't do that, it looks to me as if that product would not
conform to the standard in this respect. And we are back that what you
listed is a "user error" (or SQL design problem) but not a problem with DB2
because DB2 adheres to the standard (or my interpretation of it).

--
Knut Stolze
DB2 z/OS Utilities Development
IBM Germany