HAVING or WHERE

Hs

unread,

Jun 24, 2008, 12:20:51 PM6/24/08

to

I have two Tables:

CREATE TABLE "BUY"
(
"CENTER_ID" INTEGER NOT NULL,
"ID" INTEGER NOT NULL,
"FARMER_ID" INTEGER NOT NULL,
"FACTORY_ID" INTEGER NOT NULL,
"BUY_WEIGHT" INTEGER,
PRIMARY KEY ("CENTER_ID", "BUY_ID")
);

CREATE TABLE "FARMER"
(
"CENTER_ID" INTEGER NOT NULL,
"ID" INTEGER NOT NULL,
"NAME" VARCHAR(25),
PRIMARY KEY ("CENTER_ID", "ID")
);

I executed two queries:
1) SELECT FARMER.ID,SUM(BUY_WEIGHT) FROM FARMER INNER JOIN BUY ON FARMER.ID=BUY.FARMER_ID
WHERE BUY.FACTORY_ID=1 AND BUY.CENTER_ID=1
GROUP BY FARMER.ID

2)SELECT FARMER.ID,SUM(BUY_WEIGHT) FROM FARMER INNER JOIN BUY ON FARMER.ID=BUY.FARMER_ID
WHERE BUY.FACTORY_ID=1
GROUP BY FARMER.ID
HAVING BUY.CENTER_ID=1

for both the result was same but the execution of the first query longs more than 90(s) and of second query less than 1(s) !!
The plan for the first was: PLAN SORT (JOIN (FARMER NATURAL,BUY INDEX (RDB$PRIMARY4)))
and for second was: PLAN SORT (MERGE (SORT (BUY NATURAL),SORT (FARMER NATURAL)))
What is happening? Generaly what is the difference in resuls and in background between calling HAVING or WHERE for non aggregation fields?

Thanks.

Bill Todd [TeamB]

unread,

Jun 24, 2008, 1:18:16 PM6/24/08

to

Hs wrote:

> What is happening? Generaly what is the difference in resuls and in
> background between calling HAVING or WHERE for non aggregation fields?

In case one the optimizer elected to read farmer in natural order and
use the primary index of buy to locate the record in buy that matched
the current record in farmer. This means that every record in buy was
located in the index then read from disk before the WHERE conditions
could be applied.

In case two the optimizer elected to sort farmer, sort buy then merge
the two streams. Although you cannot tell from the plan the WHERE and
HAVING conditions were probably applied to buy as it was read in
natural order so only the matching records had to be sorted and merged.
This resulted in much less I/O than locating all of the buy records one
at a time through the index.

Why did the optimizer elect to use the index in one case and do a sort
merge in the other? Only one (or possibly two) people know and they are
in InterBase R&D.:)

--
Bill Todd (TeamB)

willr

unread,

Jun 26, 2008, 2:17:19 PM6/26/08

to

According to "The Practical SQL Handbook", 3rd ed. the HAVING word
causes grouping to occur first and then eliminates rows -- possibly
cutting down the set to where a sort/merge is faster

--
Will R
PMC Consulting

willr

unread,

Jun 26, 2008, 3:29:22 PM6/26/08

to

This is actually pretty interesting...

See some comments after this quote....

**********************

Just as a WHERE clause reduces the number of rows returned by a SELECT
clause, the HAVING clause can be used to reduce the number of rows
returned by a GROUP BY clause. The syntax of HAVING is:

See page 6-31 of the EmbedSql.pdf file. The Language Ref is not a lot of
help here...

For example, the following cursor declaration returns the average salary
for all employees in each department. The GROUP BY clause assures that
average salaries are calculated and retrieved based on department names.
The HAVING clause restricts retrieval to those groups where the average
salary is greater than 60,000, while the ORDER BY clause arranges
retrieved rows alphabetically by department name.

EXEC SQL
DECLARE SIXTY_THOU CURSOR FOR
SELECT DEPARTMENT, AVG(SALARY)
FROM DEPARTMENT D, EMPLOYEE E
WHERE D.DEPT_NO = E.DEPT_NO
GROUP BY DEPARTMENT
HAVING AVG(SALARY) > 60000
ORDER BY DEPARTMENT;

+++ /*Note*/ HAVING can also be used without GROUP BY. In this case, all
rows retrieved by a SELECT are treated as a single group, and each
column named in the SELECT clause is normally operated on by an
aggregate function.

For more information about search conditions, see “Restricting row
retrieval with WHERE” on page 6-26. For more information about
subqueries, see “Using subqueries” on page 6-51.

****************************

A little digging around found that three of the big 8 were faster on the
where condition -- others faster on a having...

Conditions independent of the aggregate should go inside the WHERE.

Conditions dependent on the aggregate should go in the HAVING clause.
Your second query is interesting because I think it is an implied "GROUP
BUY" -- or else it shouldn't work -- except that the Interbase team
seems to have "fudged" the "GROPUP BY" by saying that anything not
stated directly is in implied groups of one...

Go figger.

I think Bill had it right.

Q: Why?
A: Why Not?

A few SQL Tuning books support his contention.