Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Finding top x percent of data

234 views
Skip to first unread message

nyc...@yahoo.com

unread,
Dec 5, 2007, 3:18:50 PM12/5/07
to
I am trying to do the following w/ SQL for db2 9...
I want to find the top x percent of rows. For example, if i wanted to
find the average sale price of the top 10 most expensive items in a
store, how could I go about doing that, w/o a stored procedure? I've
been searching, and have come across many solutions for other RDBMS,
but not yet for DB2.

Thanks in advance.

Sanjuro

unread,
Dec 5, 2007, 3:22:43 PM12/5/07
to
Could you give us a solution in other RDBMS that does successfully
what you want to achieve?

Cheers,
Sanjuro

nyc...@yahoo.com

unread,
Dec 5, 2007, 3:41:03 PM12/5/07
to

It looks like SQL server can do something like this:

SELECT TOP 10 PERCENT
OrderID, CustomerID, EmployeeID, OrderDate
FROM dbo.Orders
WHERE EmployeeID = 5
ORDER BY OrderDate

Sanjuro

unread,
Dec 5, 2007, 4:58:41 PM12/5/07
to

Use this in DB2:

SELECT OrderID, CustomerID, EmployeeID, OrderDate


FROM dbo.Orders
WHERE EmployeeID = 5
ORDER BY OrderDate

FETCH FIRST 10 ROWS ONLY

Note: This would give you first 10 rows and not first 10 Percent Rows.

Hope this helps.

mirof007

unread,
Dec 5, 2007, 5:59:01 PM12/5/07
to

You can use the count() and rank() OLAP function to rank the rows
first, then select only those whose rank is less than or equal to
total count divided by 10:

WITH TMP AS
(Select OrderID, CustomerID, EmployeeID, OrderDate,
count() over() as total, dense_rank() over(order by OrderDate) as
d_rank
FROM dboOrders
WHERE EmployeeID = 5)
SELECT OrderID, CustomerID, EmployeeID, OrderDate
FROM TMP
WHERE d_rank <= total / 10;

I haven't tested the above to verify that there are no syntax errors,
but you should get the idea. Hope this helps.

Regards,
Miro
FROM (Select OrderID, CustomerID, EmployeeID, OrderDate,

ChrisC

unread,
Dec 5, 2007, 6:54:08 PM12/5/07
to
mirof007 wrote:

> WITH TMP AS
> (Select OrderID, CustomerID, EmployeeID, OrderDate,
> count() over() as total, dense_rank() over(order by OrderDate) as
> d_rank
> FROM dboOrders
> WHERE EmployeeID = 5)
> SELECT OrderID, CustomerID, EmployeeID, OrderDate
> FROM TMP
> WHERE d_rank <= total / 10;

Nice trick. However, don't use dense_rank() for this purpose - it is
likely that you will end up with more than 10% of the data (or
whatever amount you choose). dense_rank packs in the ranks - while
rank leaves the holes in the list. For instance:

OrderDate rank() dense_rank() row_number()
2007-12-04 1 1 1
2007-12-04 1 1 2
2007-12-05 3 2 3

So, if you wanted the first 2/3rd's of that data, using rank would
work, but dense_rank would return 3 rows. Note that you can also use
row_number in this case as well.

Also syntactically, you need a value in the count() statement above -
you can't just leave it blank. A count(1) would work.

Thanks,
Chris

mirof007

unread,
Dec 5, 2007, 7:00:02 PM12/5/07
to

You're right, depending on whether you care whether all or none
entries with the same order date are included, either rank() or a
simple row_number() would work better here, thanks for the correction.

Regards,
Miro

nyc...@yahoo.com

unread,
Dec 6, 2007, 1:43:58 PM12/6/07
to

The query you wrote makes sense, and I'm sure I can get it working in
my case. Thanks a lot for the assistance.

Regards,
Jason

nyc...@yahoo.com

unread,
Dec 10, 2007, 3:56:22 PM12/10/07
to

Just an update, I was able to write the query using the above model.
Thanks again

0 new messages