Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

create a SQL query with unique column values

0 views
Skip to first unread message

bastos...@gmail.com

unread,
Sep 22, 2005, 3:21:28 PM9/22/05
to
Hi,
I need to create a SQL query where a given column will have unique
values. So far I tried this:

SELECT DISTINCT col1, col2, col3, col4 from SOMETABLE

but the distinct part seems to consider col1, col2, col3 and col4 as
the same entity. I just want to make one column distinct, not all
columns.
Thank you,
Sergio S Bastos

Gene Wirchenko

unread,
Sep 22, 2005, 4:46:00 PM9/22/05
to

Suppose that col1 is the column that is to be distinct. What
values for col2, col3, and col4 should your query return for a
sometable of:
col1 col2 col3 col4
---- ---- ---- ----
10 8 5 29
10 9 17 -2
10 74 -11 95
23 19 74 10
23 -9 -7 83
23 -23 0 55
? What makes any of the values any better than any other?

Sincerely,

Gene Wirchenko

--CELKO--

unread,
Sep 22, 2005, 10:49:49 PM9/22/05
to
Here is how a SELECT works in SQL ... at least in theory. Real
products will optimize things, but the code has to produce the same
results.

a) Start in the FROM clause and build a working table from all of the
joins, unions, intersections, and whatever other table constructors are
there. The <table expression> AS <correlation name> option allows you
give a name to this working table which you then have to use for the
rest of the containing query.

b) Go to the WHERE clause and remove rows that do not pass criteria;
that is, that do not test to TRUE (i.e. reject UNKNOWN and FALSE). The
WHERE clause is applied to the working set in the FROM clause.

c) Go to the optional GROUP BY clause, make groups and reduce each
group to a single row, replacing the original working table with the
new grouped table. The rows of a grouped table must be group
characteristics: (1) a grouping column (2) a statistic about the group
(i.e. aggregate functions) (3) a function or (4) an expression made up
those three items.

d) Go to the optional HAVING clause and apply it against the grouped
working table; if there was no GROUP BY clause, treat the entire table
as one group.

e) Go to the SELECT clause and construct the expressions in the list.
This means that the scalar subqueries, function calls and expressions
in the SELECT are done after all the other clauses are done. The
"AS" operator can also give names to expressions in the SELECT
list. These new names come into existence all at once, but after the
WHERE clause, GROUP BY clause and HAVING clause has been executed; you
cannot use them in the SELECT list or the WHERE clause for that reason.


If there is a SELECT DISTINCT, then redundant duplicate rows are
removed. For purposes of defining a duplicate row, NULLs are treated
as matching (just like in the GROUP BY). The unit of work is a *whole*
row.

f) Nested query expressions follow the usual scoping rules you would
expect from a block structured language like C, Pascal, Algol, etc.
Namely, the innermost queries can reference columns and tables in the
queries in which they are contained.

g) The ORDER BY clause is part of a cursor, not a query. The result
set is passed to the cursor, which can only see the names in the SELECT
clause list, and the sorting is done there. The ORDER BY clause cannot
have expression in it, or references to other columns because the
result set has been converted into a sequential file structure and that
is what is being sorted.

As you can see, things happen "all at once" in SQL, not "from left to
right" as they would in a sequential file/procedural language model. In
those languages, these two statements produce different results:
READ (a, b, c) FROM File_X;
READ (c, a, b) FROM File_X;

while these two statements return the same data:

SELECT a, b, c FROM Table_X;
SELECT c, a, b FROM Table_X;

bastos...@gmail.com

unread,
Sep 23, 2005, 4:31:13 AM9/23/05
to
well I want to filter through a unique column and with the most recent
date...
So from this table:

col1 col2 col3 col4
---- ---- ---- ----

10 8 5 2005-09-23
10 9 17 2005-08-16
10 74 -11 2005-09-12
23 19 74 2005-09-23
23 -9 -7 2005-08-16
23 -23 0 2005-09-12

I would want:

col1 col2 col3 col4
---- ---- ---- ----

10 8 5 2005-09-23
23 19 74 2005-09-23

Jarl Hermansson

unread,
Sep 23, 2005, 9:12:48 AM9/23/05
to
"bastos...@gmail.com" <bastos...@gmail.com> wrote in
news:1127464273.6...@z14g2000cwz.googlegroups.com:

Here's a Core SQL-99 compliant solution:

select col1, col2, col3, col4
from sometable t1
where col4 = (select max(col4) from sometable t2
where t1.col1 = t2.col1)


(Will actually return all rows for the most recent date, for each
distinct value of col1.)


HTH,
Jarl

rl...@interfold.com

unread,
Sep 23, 2005, 9:47:25 AM9/23/05
to


I'm pretty much a newbie at this, but wouldn't

SELECT col1, col2, col3, MAX(col4) AS mindate FROM tablename GROUP BY
col1 ORDER BY mindate;

work? I've duplicated your example table and it gives me the "correct"
answer, but perhaps the logic is still wrong; I don't yet really
understand the internal hierarchies of SELECT, especially using GROUP.
Try it on a bigger dataset.

hth,

r

Jarl Hermansson

unread,
Sep 23, 2005, 10:50:17 AM9/23/05
to
rl...@interfold.com wrote in news:1127483245.849703.173440
@o13g2000cwo.googlegroups.com:

>
> SELECT col1, col2, col3, MAX(col4) AS mindate FROM tablename GROUP BY
> col1 ORDER BY mindate;
>

This statement is invalid, you can't specify col2 and col3 in the SELECT
list like this.

Only constants, columns used in the GROUP BY clause, and columns used in
set functions may be included in the SELECT list.


/Jarl

Gene Wirchenko

unread,
Sep 23, 2005, 1:36:15 PM9/23/05
to
On 23 Sep 2005 01:31:13 -0700, "bastos...@gmail.com"
<bastos...@gmail.com> wrote:

This is still a bit ambiguous. Do you want the latest col4 in
the table or for each different col1? Suppose the first row was
10 8 5 2005-09-22
Would that change what you want to see? I expect yes, but you are the
one deciding.

Sincerely,

Gene Wirchenko

rl...@interfold.com

unread,
Sep 23, 2005, 3:40:24 PM9/23/05
to

Well, I did. I'll give you the point because I don't yet know enough to
argue. But why did the client take it? Why didn't it throw up an error?
My remark about the logic concerned whether the GROUP BY statement
would simply pick the first row that satisfied the GROUP BY criterium
or would it accede to the MAX limiter and display the MAX(col4) from
within each GROUP. FWIW I tried this using MySQL-4.1.14, perhaps not
the world's most standards compliant DB, but it's what I have in front
of me. As I said, I was given an error-free result set. Are you saying
I should have received no result set, or that the result set I did
receive is suspect? It's worth getting this straight in my head now; I
have a three million row table at work and I'm not interested in
visually back-checking results :)

thx,

r

Bill Karwin

unread,
Sep 23, 2005, 4:29:16 PM9/23/05
to
rl...@interfold.com wrote:

> Jarl Hermansson wrote:
>>>SELECT col1, col2, col3, MAX(col4) AS mindate FROM tablename GROUP BY
>>>col1 ORDER BY mindate;
>>
>>you can't specify col2 and col3 in the SELECT
>>list like this.
>
> Well, I did.

I'm guessing you're using MySQL. MySQL permits this usage, though it is
semantically ambiguous, and gives arbitrary results.

It _should_ generate an error, and in some RDBMS implementations, it
does generate an error.

Regards,
Bill K.

0 new messages