Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Getting the top 5 DISTINCT and making sure you got the highest

0 views
Skip to first unread message

metaperl

unread,
Aug 12, 2009, 3:26:13 PM8/12/09
to
For this schema:
http://github.com/metaperl/DBIx--Shootout/raw/7c090193e788f6d2825108cb8058d1a1e8f157ab/lib/DBIx/Shootout/GMAX/schema.pdf

How would you get the top 5 salaries (and their department at the
time) such that the top 5 lists any employee only once?

My fear with my current query is that the GROUP BY will arbitrarily
reduce multiple instances of an employee to one row, without choosing
the row with the highest salary:

-- top 5 salaries, distinct by employees, and their department at the
time
SELECT
*
FROM
salaries INNER JOIN dept_emp USING (emp_no) INNER JOIN departments
USING (dept_no)
WHERE
salaries.to_date <= dept_emp.to_date
GROUP BY
emp_no
ORDER BY
salary DESC
LIMIT 5 ;

If you want to download the sample database in SQLite format, here's
how to do so:
#!/bin/bash -x

export TGZ=sampledb.tgz

wget http://sampledb.svn.sourceforge.net/viewvc/sampledb.tar.gz?view=tar
mv sampledb.tar.gz* $TGZ
tar xvfz $TGZ

Lennart

unread,
Aug 13, 2009, 1:43:38 AM8/13/09
to
On 12 Aug, 21:26, metaperl <metap...@gmail.com> wrote:
> For this schema:http://github.com/metaperl/DBIx--Shootout/raw/7c090193e788f6d2825108c...

>
> How would you get the top 5 salaries (and their department at the
> time) such that the top 5 lists any employee only once?
>
> My fear with my current query is that the GROUP BY will arbitrarily
> reduce multiple instances of an employee to one row, without choosing
> the row with the highest salary:
>
> -- top 5 salaries, distinct by employees, and their department at the
> time
> SELECT
>   *
> FROM
>   salaries INNER JOIN dept_emp USING (emp_no) INNER JOIN departments
> USING (dept_no)
> WHERE
>   salaries.to_date <= dept_emp.to_date
> GROUP BY
>   emp_no
> ORDER BY
>   salary DESC
> LIMIT 5 ;
>

I'm not sure what GROUP BY emp_no is supposed to mean (in this
context). Assume the following table:

create table T (x int not null, y int not null, primary key (x,y));
insert into T (x,y) values (1,1), (1,2);

What would be the result of:

select x, y from T group by x;

?

Start by figuring out how to get the top 5 salaries from the salary
table


/Lennart

Terrence Brannon

unread,
Aug 13, 2009, 9:34:26 AM8/13/09
to
On Aug 13, 1:43 am, Lennart <Erik.Lennart.Jons...@gmail.com> wrote:

>
> I'm not sure what GROUP BY emp_no is supposed to mean (in this
> context).

An employee can have many salaries in the salaries table. I only care
about one salary per employee, so I did a GROUP BY. But because I care
about the highest salary of an employee, I am not sure which salary
row will be retained in the GROUP BY employee_id

Assume the following table:
>
> create table T (x int not null, y int not null, primary key (x,y));
> insert into T (x,y) values (1,1), (1,2);
>
> What would be the result of:
>
> select x, y from T group by x;

Thats just the point. I'm not sure I have a guarantee on that. And I
know that I want the (1,2) row, not the (1,1) row.


> Start by figuring out how to get the top 5 salaries from the salary
> table

SELECT * FROM salaries ORDER BY salary DESC LIMIT 5;

but this might list the same employee multiple times... take
Microsoft. Bill Gates might have 5 different salaries, but they might
always the highest. I only want 1 salary from any particular employee.
And I want it to be his highest.

Thomas Kellerer

unread,
Aug 13, 2009, 10:08:17 AM8/13/09
to
Terrence Brannon wrote on 13.08.2009 15:34:
> SELECT * FROM salaries ORDER BY salary DESC LIMIT 5;
>
> but this might list the same employee multiple times... take
> Microsoft. Bill Gates might have 5 different salaries, but they might
> always the highest. I only want 1 salary from any particular employee.
> And I want it to be his highest.

SELECT emp_no,
max(salary)
FROM salaries
GROUP BY emp_no
ORDER BY 2 DESC
LIMIT 5

Terrence Brannon

unread,
Aug 13, 2009, 4:14:37 PM8/13/09
to
On Aug 13, 10:08 am, Thomas Kellerer <OTPXDAJCS...@spammotel.com>
wrote:

>
> SELECT emp_no,
>         max(salary)
> FROM salaries
> GROUP BY emp_no
> ORDER BY 2 DESC
> LIMIT 5

Ok, thanks a lot. I'd forgotten about that aspect of GROUP BY :)

0 new messages