one pg specific method that a lot of people overlook for this sort of
problem is custom aggregates.
create or replace function maxfoo(foo, foo) returns foo as
$$
select case when $1.t > $2.t then $1 else $2 end;
$$ language sql immutable;
create aggregate aggfoo(foo)
(
sfunc=maxfoo,
stype=foo
);
create table foo(id int, t timestamptz default now());
insert into foo values (1);
insert into foo values (1);
select (f).* from (select aggfoo(foo) as f from foo group by id) q;
postgres=# select (f).* from (select aggfoo(foo) as f from foo group by id) q;
id | t
----+----------------------------
1 | 2011-02-24 14:01:20.051-06
(1 row)
where this approach can be useful is when you have a very complicated
aggregation condition that can be awkward to express in a join.
merlin
--
Sent via pgsql-performance mailing list (pgsql-pe...@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
> Is there a more elegant way to write this, perhaps using PG-specific
> extensions?
SELECT DISTINCT ON (data.id_key)
data.id_key, data.time_stamp, data.value
FROM data
ORDER BY data.id_key, data.time_stamp DESC;
Michael Glaesemann
grzm seespotcode net
hm. not only is it faster, but much more flexible...that's definitely
the way to go.
merlin
But wait, there's more! You haven't tested the Windowing Function
solution. I'll bet it's even faster.
SELECT id_key, time_stamp, value
FROM (
SELECT id_key, time_stamp, value,
row_number()
OVER ( PARTITION BY id_key
ORDER BY time_stamp DESC)
as ranking
FROM thetable
) as filtered_table
WHERE ranking = 1
--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com
> SELECT id_key, time_stamp, value
> FROM (
> SELECT id_key, time_stamp, value,
> row_number()
> OVER ( PARTITION BY id_key
> ORDER BY time_stamp DESC)
> as ranking
> FROM thetable
> ) as filtered_table
> WHERE ranking = 1
Why did you use row_number instead of rank?
I am now curious how the speed compares though. I still think the
DISTINCT ON will be faster, but it would be a great surprise.
--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sth...@peak6.com
______________________________________________
See http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email
Because I assumed he only wanted one row in the event of ties.
Hmmm, although with that schema, there won't be ties. So it's pretty
much arbitrary then.
> I am now curious how the speed compares though. I still think the
> DISTINCT ON will be faster, but it would be a great surprise.
Hopefully we'll find out! The windowing functions are usually much
faster for me. I think in 9.0 or 9.1 someone replumbed DISTINCT ON to
use a bunch of the window function internals, at which point it'll cease
to matter.
--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com
--
Thanks to all .... I had a tickling feeling at the back of my mind that there was a neater answer here. For the record, times (all from in-memory cached data, averaged over a bunch of runs):
Dependent subquery = 117.9 seconds
Join to temp table = 2.7 sec
DISTINCT ON = 2.7 sec
So the DISTINCT ON may not be quicker, but it sure is tidier.
Cheers
Dave
Hi Dave
Yes, 100% the best solution .... I did the same thing a while back, I just have a separate copy of the data in a "latest" table and the Java code just runs a second SQL statement to update it when writing a new record (I've never been a trigger fan).
I found myself looking at the "find the latest" query again though in the process of building a "demo mode" into our application, which will replay a finite set of data on a rolling loop by moving it forward in time, and also has to simulate the continuous updating of the "latest" table so the the business logic will be appropriately fooled.
My next tweak will be to cache the "latest" table in the Java layer ;-)
Cheers
Dave
Why not test the windowing version I posted?
--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com
--
Unfortunately, I'm running 8.3.3 and to my knowledge the windowing stuff wasn't added til 8.4.
Dave
> Dave Crooke <dcr...@gmail.com> wrote:
>
>> create table data
>> (id_key int,
>> time_stamp timestamp without time zone,
>> value double precision);
>>
>> create unique index data_idx on data (id_key, time_stamp);
>
>> I need to find the most recent value for each distinct value of
>> id_key.
>
> Well, unless you use timestamp WITH time zone, you might not be able
> to do that at all. There are very few places where timestamp
> WITHOUT time zone actually makes sense.
I don't think PostgreSQL keeps track of actual time zone values, just
as it doesn't keep track of the character encoding of TEXT columns.
Unless suppressed with WITHOUT TIME ZONE, PostgreSQL makes up some
time zone on demand. This makes TIMESTAMP WITH TIME ZONE not that
useful, and it's often to use TIMESTAMP WITHOUT TIME ZONE with times
in UTC.
True -- TIMESTAMP WITH TIME ZONE is always stored in UTC, which makes
it part of a consistent time stream. If you use TIMESTAMP WITHOUT
TIME ZONE, then unless you go to a lot of trouble you have a gap in
your time line in the spring and an overlap in autumn. With enough
work you can dance around that, but it's a heck of lot easier when
you can count on the UTC storage.
It sounds like you've successfully managed to find a way to dance
around it, so it might not be worth trying to refactor now; but I'd
bet your code would be simpler and more robust if you worked with the
data type intended to represent a moment in the stream of time
instead of constantly trying to pin the WITHOUT TIME ZONE to a time
zone (UTC) explicitly.
-Kevin