Attribute must be GROUPed.... ?

Daniele Orlandi

unread,

Apr 30, 2003, 5:22:39 PM4/30/03

to

This is a multi-part message in MIME format.
--------------040702000107030508030401
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

I always had been doubious about the "must" in the error message
"Attribute must be GROUPed or used in an aggregate function".

I certainly agree that there are many situations in which having an
ungrouped attribute in the target list is nonsense (undefined), but I
sometimes find situations in which it would be perfetcly meaningful.

For example, suppose that you join two tables and the field you group by
appears to be the primary key of one of the tables. In this case if I
put any of the fields in that table in the target list, it will always
be well defined and would make certain queries much simpler.

In SQL:

# create table cities (id int primary key,name text);
# create table people (fullname text,city_id int references cities(id));
# select name,count(*) from people,cities where people.city_id=cities.id
group by city_id;

name | count
------------+-------------
New York | 10
Los Angeles | 15
Seveso | 1
....

Here follows a patch that makes this behaviour configurable via a
configuration variable. I apologize in advance if there's something
stupid, it's my first patch on postgresql...

The default is the current behaviour but I would propose to change the
default to be more permissive.

Bye!

--
Daniele Orlandi
Planet Srl

--------------040702000107030508030401
Content-Type: text/plain;
name="pgsql-group.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="pgsql-group.diff"

Index: src/backend/parser/parse_agg.c
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/backend/parser/parse_agg.c,v
retrieving revision 1.52
diff -u -r1.52 parse_agg.c
--- src/backend/parser/parse_agg.c 2003/04/03 18:04:09 1.52
+++ src/backend/parser/parse_agg.c 2003/04/30 20:39:29
@@ -34,6 +34,8 @@
static bool check_ungrouped_columns_walker(Node *node,
check_ungrouped_columns_context *context);

+bool force_targets_grouped = true;
+
/*
* check_ungrouped_columns -
* Scan the given expression tree for ungrouped variables (variables
@@ -246,13 +248,21 @@
(Node *) groupClauses);

/*
- * Check the targetlist and HAVING clause for ungrouped variables.
+ * Check the targetlist clause for ungrouped variables.
*/
- clause = (Node *) qry->targetList;
- if (hasJoinRTEs)
- clause = flatten_join_alias_vars(qry, clause);
- check_ungrouped_columns(clause, pstate,
- groupClauses, have_non_var_grouping);
+
+ if (force_targets_grouped)
+ {
+ clause = (Node *) qry->targetList;
+ if (hasJoinRTEs)
+ clause = flatten_join_alias_vars(qry, clause);
+ check_ungrouped_columns(clause, pstate,
+ groupClauses, have_non_var_grouping);
+ }
+
+ /*
+ * Check the HAVING clause for ungrouped variables.
+ */

clause = (Node *) qry->havingQual;
if (hasJoinRTEs)
Index: src/backend/utils/misc/guc.c
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/backend/utils/misc/guc.c,v
retrieving revision 1.119
diff -u -r1.119 guc.c
--- src/backend/utils/misc/guc.c 2003/04/25 19:45:09 1.119
+++ src/backend/utils/misc/guc.c 2003/04/30 20:39:31
@@ -43,6 +43,7 @@
#include "optimizer/paths.h"
#include "optimizer/prep.h"
#include "parser/parse_expr.h"
+#include "parser/parse_agg.h"
#include "storage/fd.h"
#include "storage/freespace.h"
#include "storage/lock.h"
@@ -533,6 +534,10 @@
{
{"transaction_read_only", PGC_USERSET, GUC_NO_RESET_ALL}, &XactReadOnly,
false, NULL, NULL
+ },
+ {
+ {"force_targets_grouped", PGC_USERSET, GUC_NO_RESET_ALL}, &force_targets_grouped,
+ true, NULL, NULL
},

{
Index: src/backend/utils/misc/postgresql.conf.sample
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/backend/utils/misc/postgresql.conf.sample,v
retrieving revision 1.77
diff -u -r1.77 postgresql.conf.sample
--- src/backend/utils/misc/postgresql.conf.sample 2003/04/19 00:37:28 1.77
+++ src/backend/utils/misc/postgresql.conf.sample 2003/04/30 20:39:31
@@ -219,3 +219,4 @@
#statement_timeout = 0 # 0 is disabled, in milliseconds
#db_user_namespace = false
#preload_libraries = ''
+#force_targets_grouped = true
Index: src/bin/psql/tab-complete.c
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/bin/psql/tab-complete.c,v
retrieving revision 1.76
diff -u -r1.76 tab-complete.c
--- src/bin/psql/tab-complete.c 2003/04/03 20:18:16 1.76
+++ src/bin/psql/tab-complete.c 2003/04/30 20:39:31
@@ -522,6 +522,7 @@
"enable_tidscan",
"explain_pretty_print",
"extra_float_digits",
+ "force_targets_grouped",
"from_collapse_limit",
"fsync",
"geqo",
Index: src/include/parser/parse_agg.h
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/include/parser/parse_agg.h,v
retrieving revision 1.25
diff -u -r1.25 parse_agg.h
--- src/include/parser/parse_agg.h 2003/01/17 03:25:04 1.25
+++ src/include/parser/parse_agg.h 2003/04/30 20:39:32
@@ -17,4 +17,6 @@

extern void parseCheckAggregates(ParseState *pstate, Query *qry);

+extern bool force_targets_grouped;
+
#endif /* PARSE_AGG_H */

--------------040702000107030508030401
Content-Type: text/plain
Content-Disposition: inline
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majo...@postgresql.org

--------------040702000107030508030401--

Stephan Szabo

unread,

Apr 30, 2003, 5:49:53 PM4/30/03

to

On Wed, 30 Apr 2003, Daniele Orlandi wrote:

> I always had been doubious about the "must" in the error message
> "Attribute must be GROUPed or used in an aggregate function".

AFAIK it's a requirement of the SQL spec. (SQL92(draft) 7.9 SR 7, "each
<column reference> in each <value expression> that references a column
of T shall reference a grouping column or be specified within a <set
function specification>."

> For example, suppose that you join two tables and the field you group by
> appears to be the primary key of one of the tables. In this case if I
> put any of the fields in that table in the target list, it will always
> be well defined and would make certain queries much simpler.

Well, it'd mean you didn't have to put the extra columns in the group by
list to make them grouping columns.

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majo...@postgresql.org)

Daniele Orlandi

unread,

Apr 30, 2003, 6:03:52 PM4/30/03

to

Stephan Szabo wrote:
>
> AFAIK it's a requirement of the SQL spec. (SQL92(draft) 7.9 SR 7, "each
> <column reference> in each <value expression> that references a column
> of T shall reference a grouping column or be specified within a <set
> function specification>."

I see... How should the "shall" term be considered ? I don't have much
knowledge of the SQL specs language.

How other DBMS behave in this case ? I know that mysql doesn't enforce
this requirement but... mysql is not a perfect reference wrt standards
compliance.

> Well, it'd mean you didn't have to put the extra columns in the group by
> list to make them grouping columns.

This is what I currently do as a workaround, but it's not much clean
expecially when you have many ungrouped fields in the target list.

Bye!

--
Daniele Orlandi
Planet Srl

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Stephan Szabo

unread,

Apr 30, 2003, 6:17:37 PM4/30/03

to

On Thu, 1 May 2003, Daniele Orlandi wrote:

> Stephan Szabo wrote:
> >
> > AFAIK it's a requirement of the SQL spec. (SQL92(draft) 7.9 SR 7, "each
> > <column reference> in each <value expression> that references a column
> > of T shall reference a grouping column or be specified within a <set
> > function specification>."
>
> I see... How should the "shall" term be considered ? I don't have much
> knowledge of the SQL specs language.

"In the Syntax Rules, the term shall defines conditions that are
required to be true of syntactically conforming SQL language."

I think most people would write "must", although I think "shall" might be
more correct.

> How other DBMS behave in this case ? I know that mysql doesn't enforce
> this requirement but... mysql is not a perfect reference wrt standards
> compliance.

I'm not sure actually, someone else will need to speak to this (well, I
can test Oracle later, but that's it).

> > Well, it'd mean you didn't have to put the extra columns in the group by
> > list to make them grouping columns.
>
> This is what I currently do as a workaround, but it's not much clean
> expecially when you have many ungrouped fields in the target list.

Yeah, that can get to be a problem... In any case, you'll probably get
other comments. Oh yeah, and you'll probably be asked for documentation
comments if it's even considered since you're adding a visible GUC entry.
:)

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majo...@postgresql.org so that your
message can get through to the mailing list cleanly

Tom Lane

unread,

Apr 30, 2003, 8:32:59 PM4/30/03

to

Stephan Szabo <ssz...@megazone23.bigpanda.com> writes:
> Yeah, that can get to be a problem... In any case, you'll probably get
> other comments. Oh yeah, and you'll probably be asked for documentation
> comments if it's even considered since you're adding a visible GUC entry.
> :)

Well, it won't be --- diking out required error checks without providing
a substitute isn't my idea of a useful patch ;)

The SQL99 spec defines an improved version of this behavior which I
think is what Daniele really would like to have. Basically it says that
if column A can be proved functionally dependent on column B then you
only need to GROUP BY column B, and then you can use column A without
having to explicitly mention it in the GROUP BY list. "Functionally
dependent" means there is no possibility of A values being different in
rows with the same B value. The spec has a whole lot of verbiage about
possible ways to deduce functional dependency, but one easy one is where
column B is a primary key and column A is in its table.

If someone wants to implement the SQL99 behavior (or even just a useful
subset of it), that would be cool with me. It looks like a lot of work
though.

regards, tom lane

Andrew Dunstan

unread,

Apr 30, 2003, 8:33:10 PM4/30/03

to

----- Original Message -----
From: "Stephan Szabo" <ssz...@megazone23.bigpanda.com>
> > I see... How should the "shall" term be considered ? I don't have much
> > knowledge of the SQL specs language.
>
> "In the Syntax Rules, the term shall defines conditions that are
> required to be true of syntactically conforming SQL language."
>
> I think most people would write "must", although I think "shall" might be
> more correct.
>

This is common usage in requirements docs - I suspect it has a government,
perhaps a military origin. Even after vetting and helping to author a great
many requirements docs it still makes me cringe, but then I don't split
infinitives either ;-)

andrew

Andrew Sullivan

unread,

May 1, 2003, 8:49:51 AM5/1/03

to

On Wed, Apr 30, 2003 at 08:31:27PM -0400, Andrew Dunstan wrote:

> This is common usage in requirements docs - I suspect it has a government,
> perhaps a military origin. Even after vetting and helping to author a great
> many requirements docs it still makes me cringe, but then I don't split
> infinitives either ;-)

<totally ot>
It shouldn't make you cringe. According to Fowler
(<http://www.bartleby.com/116/213.html>), this version of "shall" is
an example of the "pure system" in the distinction between shall and
will. So it's neither military nor government in origin, but
mediaeval. Anyway, it's not much different from "Thou shalt not
steal."
</totally ot>

A

--
----
Andrew Sullivan 204-4141 Yonge Street
Liberty RMS Toronto, Ontario Canada
<and...@libertyrms.info> M2P 2A8
+1 416 646 3304 x110

David Walker

unread,

May 1, 2003, 2:22:24 PM5/1/03

to

I use min(fieldname) as fieldname which is a little more than I want to type
but doesn't disturb my groupings.

On Wednesday 30 April 2003 05:02 pm, Daniele Orlandi wrote:
> Stephan Szabo wrote:
> > AFAIK it's a requirement of the SQL spec. (SQL92(draft) 7.9 SR 7, "each
> > <column reference> in each <value expression> that references a column
> > of T shall reference a grouping column or be specified within a <set
> > function specification>."
>

> I see... How should the "shall" term be considered ? I don't have much
> knowledge of the SQL specs language.
>

> How other DBMS behave in this case ? I know that mysql doesn't enforce
> this requirement but... mysql is not a perfect reference wrt standards
> compliance.
>

> > Well, it'd mean you didn't have to put the extra columns in the group by
> > list to make them grouping columns.
>
> This is what I currently do as a workaround, but it's not much clean
> expecially when you have many ungrouped fields in the target list.
>

> Bye!

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Thomas Swan

unread,

May 6, 2003, 10:07:53 PM5/6/03

to

David Walker wrote:

>I use min(fieldname) as fieldname which is a little more than I want to type
>but doesn't disturb my groupings.
>

Min( ) and max( ) somehow seem to also reduce to a limit 1 on duplicate
values of which I'm not sure is of much use in the discussion. But if
the groups are large then your doing O(n) passes over each column/field.

I've seen some databases do the equivalent of a unique on the columns
(*A) and where there were different values break the grouping which IMHO
is completely wrong. Others have only displayed the first match and
discarded the rest for the group on a non-aggregated column/field. (*B).

TABLEA
a | b |c
---+---+---
1 | 1 | 3
1 | 2 | 3
3 | 3 | 3
2 | 2 | 3

select a, b from TABLEA group by c would produce

Behavior A
a | b
---+---
1 | 3
2 | 3
3 | 3

OR

Behavior B
a | b
---|---
1 | 3

If I had to pick a behavior I would have picked behavior B because one
could infer that the nongrouped value was unimportant or if you needed
an aggregate operation it would have been specified. Another option
would be to be completely indiscriminate and go with the first tuple it
found and skip to next token/match/group (*C). Even C* would be ok
IMO, it might even be faster.

Behavior C1 |
a | b |
---|--- |
1 | 3 |
|
Behavior C2 |
a | b |
---|--- +----- All are possible outcomes...
3 | 3 |
|
Behavior C3 |
a | b |
---|--- |
2 | 3 |

>
>On Wednesday 30 April 2003 05:02 pm, Daniele Orlandi wrote:
>
>
>>Stephan Szabo wrote:
>>
>>
>>>AFAIK it's a requirement of the SQL spec. (SQL92(draft) 7.9 SR 7, "each
>>><column reference> in each <value expression> that references a column
>>>of T shall reference a grouping column or be specified within a <set
>>>function specification>."
>>>
>>>
>>I see... How should the "shall" term be considered ? I don't have much
>>knowledge of the SQL specs language.
>>
>>How other DBMS behave in this case ? I know that mysql doesn't enforce
>>this requirement but... mysql is not a perfect reference wrt standards
>>compliance.
>>
>>
>>
>>>Well, it'd mean you didn't have to put the extra columns in the group by
>>>list to make them grouping columns.
>>>
>>>
>>This is what I currently do as a workaround, but it's not much clean
>>expecially when you have many ungrouped fields in the target list.
>>
>>Bye!
>>
>>
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 4: Don't 'kill -9' the postmaster
>
>