GROUP BY 'ALL' problem

Sydney Lotterby

unread,

Jan 16, 2004, 11:12:27 AM1/16/04

to

(SQL2K w/sp3a)
Three tables (see end for defns) -
_respData populated fully (like a customer master table - name, address
etc)
_respPOTF a lookup populated fully (like an inventory table containing
all the products, descrips and prices
_respOrders an empty table to contain customer orders linked to the other
two tables

What I want is to see a result set like the one I have plugged below i.e.
with all NULLs for Quantity column because no Orders data exists.
I thought I would get this by using "group by ALL" as in the script below.
But I get an empty result set instead.

Any suggestions?

tia

======================================

select p.description, p.seq, p.price, o.quantity
from _respPOTF p
join _resporders o on o.respPOTFID = p.respPOTFID
join _respdata d on o.respDataID = d.respDataID
where d.respDataID = 2649
group by ALL p.description, p.seq, p.price, o.quantity
order by p.seq

respPOTFID Seq Description
Price Quantity
----------- ----------- -------------------------------------------------- -
-------------------- -------
1 1 PACKAGES
NULL NULL
2 2 Beyond Credibility Package
345.0000 NULL
3 3 Teen Pack 100.0000
NULL
4 5 BOOKS
NULL NULL
5 6 Beyond Credibility Softcover
15.0000 NULL
6 7 Children Demand 11.0000
NULL
7 8 The Deceivers
11.0000 NULL
8 9 In Search of Certainty
10.0000 NULL
9 10 Family Huddle 2
13.0000 NULL
10 11 Youth Huddle 2
13.0000 NULL
11 12 More Than a Story
5.0000 NULL
12 13 New Evidence 25.0000
NULL
13 15 WORKBOOKS
NULL NULL
14 16 Credibility Matters Workbook
11.0000 NULL
15 17 Credibility Matters Leaders
7.0000 NULL
16 18 The Revolt Workbook
10.0000 NULL
17 19 The Revolt Leaders
7.0000 NULL
18 20 True or False Older Kids
6.0000 NULL
19 21 True or False Younger Kids
6.0000 NULL
20 22 True or False Leaders
13.0000 NULL
21 24 AUDIO & VIDEO
NULL NULL
22 25 Credibility Matters Video Series
150.0000 NULL
23 26 The Revolt Video Series
100.0000 NULL
24 27 Beyond Credibility book on tape
18.0000 NULL
25 28 Beyond Credibility book on cd
20.0000 NULL
26 29 Why Credibility Matters 3-Tape Set
15.0000 NULL
27 30 Why Credibility Matters 3-CD Set
18.0000 NULL

(27 row(s) affected)

CREATE TABLE [dbo].[_respData] (
[respDataID] [int] IDENTITY (1, 1) NOT NULL ,
[respBatchesID] [int] NULL ,
[CARDID] [int] NULL ,
[FIRSTNAME] [nvarchar] (50) NULL ,
[MIDDLENAME] [nvarchar] (10) NULL ,
[LASTNAME] [nvarchar] (50) NULL ,
...
[BouncedCode] [varchar] (50) NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO

CREATE TABLE [dbo].[_respOrders] (
[respOrdersID] [int] IDENTITY (1, 1) NOT NULL ,
[respPOTFID] [int] NULL ,
[respDataID] [int] NULL ,
[Quantity] [int] NULL ,
[Price] [money] NULL
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[_respPOTF] (
[respPOTFID] [int] NOT NULL ,
[Seq] [int] IDENTITY (1, 1) NOT NULL ,
[Description] [varchar] (50) NULL ,
[Price] [money] NULL
) ON [PRIMARY]
GO

Russell Fields

unread,

Jan 16, 2004, 11:28:33 AM1/16/04

to

Sydney,

If there are no orders, then are the joins still going to return a row?
Should you be using LEFT OUTER JOINS in this case?

Russell Fields
"Sydney Lotterby" <syd...@infosearch.com> wrote in message
news:OF0QKuE3...@TK2MSFTNGP11.phx.gbl...

Joe Celko

unread,

Jan 16, 2004, 12:01:15 PM1/16/04

to

You need to start over with the data design and then the schema design.
Everything is NULL-able. You have no keys. Have you ever heard of a
NVARCHAR(50) name? Well, with a column like that you will get one.
Price appears in two places, so we don't know if it is price-per-unit or
price-per-order. What encoding scheme has a NVARCHAR(50) value? This
looks like someoen generated code blindly from ACCESS or some other
non-SQL product without doing any design at all.

Why would you think of using GROUP BY ALL; it is a proprietary syntax
that will nto port and can behave differently from realse to release.
You can do the same thing in Standard SQL with an OUTER JOIN and a GROUP
BY.

Using an underscore as the first character of a data element name
guarantees conflicts with host languages, and makes the code a bitch to
read. I used to see this in assembly language programming, but I cannot
remember what it meant.

Shouldn't your table be related in a schema instead of standing along
like files? I gather that a respondent can order only one item, so the
orders might look like this:

CREATE TABLE RespOrders
(resppotf_id INTEGER NOT NULL
REFERENCES Resppotf (resppotf_id)
ON UPDATE CASCADE
ON DELETE CASCADE,
respdata_id INTEGER NOT NULL
REFERENCES Respdata (respdata_id)
ON UPDATE CASCADE
ON DELETE CASCADE,
quantity INTEGER NOT NULL CHECK(quantity > 0),
PRIMARY KEY (resppotf_id, respdata_id));

Then you can get the price from the inventory table.

We can come up with a kludge using outer joins, but that is not your
real problem.

--CELKO--
===========================
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, datatypes, etc. in your
schema are.

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!

Sydney Lotterby

unread,

Jan 16, 2004, 12:38:36 PM1/16/04

to

Thanks Russell - for the info and the way you responded!
I know I can use LEFT OUTER JOINS and have now modified it to do so. I was
just interested in how to do this with GROUP BY ALL.

Once again, the abrasive Mr. Celko pompously makes many myopic and
inaccurate statements. In fact the whole reason I wanted to use GROUP BY
ALL is because I saw an example of it in the Guru's Guide to Transact-SQL by
Ken Henderson (pp150-151). Interestingly, Celko wrote the forward to that
book. In it he says "What Ken Henderson wanted to do is to write the best
possible book on rea, practical programming in Transact-SQL available, bar
none. He succeeded... Neat stuff and some tricks I had not seen before!"

After re-reading Henderson on p150 ("GROUP BY ALL generates all possible
groups -- even those that do not meet the query's search criteria.
Aggregate values in gorups that fail the search criteria are returned as
NULL.")
I changed up my query to include the aggregate sum(o.quantity) (see below)
but still got zero results. So, I would still be interested in how to do
this with GROUP BY ALL if anyone has any ideas.

select p.description, p.seq, p.price, sum(o.quantity)

from _respPOTF p
join _resporders o on o.respPOTFID = p.respPOTFID
join _respdata d on o.respDataID = d.respDataID
where d.respDataID = 2649

group by ALL p.description, p.seq, p.price --, o.quantity
order by p.seq

"Russell Fields" <Russel...@NoMailPlease.Com> wrote in message
news:e3pFNGF3...@TK2MSFTNGP10.phx.gbl...

Joe Celko

unread,

Jan 16, 2004, 1:40:37 PM1/16/04

to

So you want to learn how to writer proprietary and not how to write a
schema, or Standard SQL. The GROUP BY ALL is a shorthand for a hidden
outer join that has a GROUP BY on the preserved table. To use the BOL
example:

SELECT type, AVG(price)
FROM titles
WHERE royalty = 10
GROUP BY ALL type;

is really

SELECT T1.type, AVG(T2.price)
FROM titles AS T1
LEFT OUTER JOIN
titles AS T2
ON T2.royalty = 10
AND T1.title_id = T2.title_id
GROUP BY T1.type;

There area few other ways to write this, but this is easiest to
understand.

Sydney Lotterby

unread,

Jan 16, 2004, 1:53:35 PM1/16/04

to

Thank you.

"Joe Celko" <joe....@northface.edu> wrote in message
news:OVl68AG3...@TK2MSFTNGP10.phx.gbl...

Tom Moreau

unread,

Jan 16, 2004, 2:53:04 PM1/16/04

to

I just ran both queries against the pubs DB. GROUP BY ALL has a higher query cost at a ratio of 60.36 : 39.64.

--
Tom

---------------------------------------------------------------
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
SQL Server MVP
Columnist, SQL Server Professional
Toronto, ON Canada
www.pinnaclepublishing.com/sql

"Joe Celko" <joe....@northface.edu> wrote in message news:OVl68AG3...@TK2MSFTNGP10.phx.gbl...

Anith Sen

unread,

Jan 16, 2004, 3:08:08 PM1/16/04

to

I think it is that messy index :-) I got a better ratio with INDEX=0 hint.

--
- Anith
( Please reply to newsgroups only )

Delbert Glass

unread,

Jan 16, 2004, 5:53:56 PM1/16/04

to

>There area few other ways to write this, but this is easiest to understand.

Maybe it was before I wrote the following query ;-)

SELECT
type,
AVG(CASE WHEN royalty = 10 THEN price ELSE null END)
FROM titles oo
GROUP BY type
ORDER BY type

Jeepers, even the plan is easier to understand.

Bye,
Delbert Glass

"Joe Celko" <joe....@northface.edu> wrote in message
news:OVl68AG3...@TK2MSFTNGP10.phx.gbl...

Steve Kass

unread,

Jan 16, 2004, 6:38:18 PM1/16/04

to

Joe,

GROUP BY ALL is not quite the same as your outer join, as you can see
here:

SELECT type, COUNT(COALESCE(price,1))

FROM titles
WHERE royalty = 10
GROUP BY ALL type;

SELECT T1.type, COUNT(COALESCE(T2.price,1))

FROM titles AS T1
LEFT OUTER JOIN
titles AS T2
ON T2.royalty = 10
AND T1.title_id = T2.title_id
GROUP BY T1.type;

While the join probably answers likely-to-occur questions, GROUP BY ALL
actually works like this:

SELECT
type,
COUNT(pre_aggregate)
FROM (
select type, COALESCE(price,1) as pre_aggregate
from titles
where royalty = 10
union all
select type, NULL
from titles
) T
group by type

SK

Joe Celko

unread,

Jan 16, 2004, 7:56:37 PM1/16/04

to

I figure the "macro transformation template" to be like this, which does
give the same results:

SELECT T1.type, COUNT(T2.coalesce_price)

FROM titles AS T1
LEFT OUTER JOIN

(SELECT title_id, COALESCE(price,1), royalty
FROM titles
WHERE royalty = 10) AS T2(title_id, coalesce_price, royalty)
ON T1.title_id = T2.title_id
GROUP BY T1.type;

I have to move all of the unpreserved table predicates and aggregates to
the right side (T2), and leave the left side as a pure (SELECT * FROM
<table>) AS T1 expression and put just the join condition in the ON
clause.

The first version should have been:

SELECT T1.type, AVG(T2.price)-- group from T1, aggs from T2
FROM (SELECT * FROM titles) AS T1 (..) -- pure table
LEFT OUTER JOIN
(SELECT <aggregate exp list>
FROM titles
WHERE ON royalty = 10) -- Sarg predicates
AS T2 (..)
ON T1.title_id = T2.title_id -- self-join predicate
GROUP BY T1.type; -- group by list in T1

We can probably do a vanilla group by in T2, then use a SELECT DISTINCT
on T1, then a LEFT OUTER JOIN on them to get the GROUP BY ALL.

Steve Kass

unread,

Jan 16, 2004, 10:37:35 PM1/16/04

to

Joe Celko wrote:

Now the fun question is how to rewrite it if there is no primary key. ;)

>We can probably do a vanilla group by in T2, then use a SELECT DISTINCT
>on T1, then a LEFT OUTER JOIN on them to get the GROUP BY ALL.
>
>

And to take it one step further, avoid DISTINCT with a table of book types.

create table book_types (
type varchar(20) not null primary key
)

insert into book_types
select distinct type from titles
go

SELECT T1.type, COALESCE(T2.aggregate,0) aggregate
FROM book_types as T1
LEFT OUTER JOIN (
SELECT type, COUNT(COALESCE(price,1)) as aggregate

FROM titles
WHERE royalty = 10

GROUP BY type
) T2
ON T1.type = T2.type

SK

Delbert Glass

unread,

Jan 20, 2004, 8:09:46 PM1/20/04

to

Joe asks (Sydney):

>Why would you think of using GROUP BY ALL; it is a proprietary syntax
>that will nto port and can behave differently from realse to release.
>You can do the same thing in Standard SQL with an OUTER JOIN and a GROUP
>BY.

Because, the present value of the cost of GROUP-BY-ALL queries
is very likely lower then
the present value of the cost of NON-GROUP-BY-ALL queries.

Don't forget to include cost of
writting, debugging, testing, reading/examining/studying again and again,
unnecessary debugging due to worrying
it's not doing the right thing
when debugging a problem it's not causing,
and converting/porting [1], etc.

[1] If any of the NON-GROUP-BY-ALL queries,
need to be modify to run reasonable on a new system
with the data present at that future time,
what's the cost of just making sure the verbose query
does exactly GROUP-BY-ALL? What's the cost of assuming
a verbose query does exactly GROUP-BY-ALL, when in fact,
it does something else?

Also, because
the query optimizer is free to pick the best plan
from several different approaches not just
the best plan for the approach writen.
(that's worth reading again)
Furthermore, inaddition to not being too costly (or should I say too lazy)
as compared to a coder in considering which approach to use,
to query optimizer "reappraises" it's decision from time to time
at relitively low cost compared to a coder reappraising a coder's decision
(which likely was more of punt then a decision due to the cost
of a coder making the decision even once (or due to lazyness))
much less frequently.
Additionally, not only is the optimizer more likely to have
(and to apply (and to apply quickly and error free))
greater knowledge of different approaches, the cost of considering
new approaches in the future via the query optimizer rather than the coder
can be greaterly lower. (Provided one is not in great need of an approach
not yet known to the query optimizer.)

Here is an example.

What's the cost to reappraise all of one's verbose non-group-by-all queries
and replace with the approach I posted when appropriate?

vs

What's the cost to reappraise all of one's group-by-all queries
and replace with the approach I posted when appropriate?

vs

What's the cost to reappraise all of one's group-by-all queries
and replace with the approach I posted when appropriate
if the query optimizer became knownledgeable of the approach?

Replace "cost to" with "cost of not" and consider those too.
Also, what's the cost of using non-improved non-group-by-all queries
whenever they would have been improved had they been coded as group-by-all
whenever the query optimizer becomes knownledgeable of a new approach?

-----

(being a little chopping)

Sometimes the cost of not coding a non-group-by-all yourself now is too
high.
In reguard to actual use.

The query I posted will most likely read all the rows, but only once each.
The rest of the queries will most likely read all the rows, plus read
at least some (if not all) of them a second time.

>We can probably do a vanilla group by in T2, then use a SELECT DISTINCT
>on T1, then a LEFT OUTER JOIN on them to get the GROUP BY ALL.

We can, here is the query:

select t1.type, t2.avg_price
from
(
SELECT type, avg(price) as avg_price
FROM titles

WHERE royalty = 10
GROUP BY type

) t2
right join
(SELECT DISTINCT type FROM titles) t1
--(SELECT type FROM titles GROUP BY ALL type) t1
on t1.type = t2.type

Furthermore, provided it's efficient to read just the royalty = 10 rows,
there exists the possibility the result will be returned
without reading all of the rows!

Bye,
Delbert Glass

"Joe Celko" <joe....@northface.edu> wrote in message

news:u79cDTJ3...@TK2MSFTNGP10.phx.gbl...