> Configurable by #define's in lwlock.c
Given that we already have dtrace/systemtap probes around the lwlocks,
is there some way you could use those instead of extra #defines?
--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com
--
Sent via pgsql-hackers mailing list (pgsql-...@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Stephen,
No - that is still a TODO for me - real life getting in the way :-)
Cheers
Mark
it applies with some hunks, compiles fine and seems to work...
i'm still not sure this is what we need, some more comments could be helpful.
what kind of questions are we capable of answer with this and and what
kind of questions are we still missing?
for example, now we know "number of locks that had to wait", "total
time waiting" and "max time waiting for a single lock"... but still we
can have an inaccurate understanding if we have lots of locks waiting
little time and a few waiting a huge amount of time...
something i have been asked when system starts to slow down is "can we
know if there were a lock contention on that period"? for now the only
way to answer that is logging locks
about the patch itself:
you haven't documented either. what is the pg_stat_lock_waits view
for? and what are those fieldx it has?
i'll let this patch as "needs review" for more people to comment on it...
--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157
I'm pretty sure the logic of this patch is not correct.
in pgstat_init_lock_wait(LOCKTAG *locktag)
...
+ l_curr = htabent->l_counts.l_tot_wait_time;
+ INSTR_TIME_SET_CURRENT(l_start);
+ INSTR_TIME_ADD(l_curr, l_start);
+ htabent->l_counts.l_tot_wait_time = l_curr;
in pgstat_end_lock_wait(LOCKTAG *locktag)
...
+ l_start = htabent->l_counts.l_tot_wait_time;
+ INSTR_TIME_SET_CURRENT(l_end);
+ INSTR_TIME_SUBTRACT(l_end, l_start);
+ htabent->l_counts.l_tot_wait_time = l_end;
So l_start = time cumulatively waited previously + time at start of this wait.
l_end - l_start is equal to:
= time at end of this wait - ( time at start of this wait + time
cumulatively waited previously)
= (time at end of this wait - time at start of this wait) - time
cumulatively waited previously
= (duration of this wait) - time waited cumulatively previously.
That minus sign in the last line can't be good, can it?
Also
+ htabent->l_counts.l_tot_wait_time = l_end;
+
+ if (INSTR_TIME_GET_MICROSEC(l_end) >
INSTR_TIME_GET_MICROSEC(htabent->l_counts.l_max_wait_time))
+ htabent->l_counts.l_max_wait_time = l_end;
The total wait time is equal to the max wait time (which are both
equal to l_end)?
One or both of those has to end up being wrong. At this stage, is
l_end supposed to be the last wait time, or the cumulative wait time?
One of the things in the patch review checklist is about compiler
warnings, so I am reporting these:
lock.c: In function `LockAcquire':
lock.c:797: warning: passing arg 1 of `pgstat_init_lock_wait' discards
qualifiers from pointer target type
lock.c:802: warning: passing arg 1 of `pgstat_end_lock_wait' discards
qualifiers from pointer target type
Cheers,
Jeff
> what kind of questions are we capable of answer with this and and what
> kind of questions are we still missing?
>
> for example, now we know "number of locks that had to wait", "total
> time waiting" and "max time waiting for a single lock"... but still we
> can have an inaccurate understanding if we have lots of locks waiting
> little time and a few waiting a huge amount of time...
>
> something i have been asked when system starts to slow down is "can we
> know if there were a lock contention on that period"? for now the only
> way to answer that is logging locks
>
>
Right - there still may be other aggregates that need to be captured....
it would be great to have some more feedback from the field about this.
In my case, I was interested in seeing if the elapsed time was being
spent waiting for locks or actually executing (in fact it turned out to
be the latter - but was still very useful to be able to rule out locking
issues). However , as you mention - there maybe cases where the
question is more about part of the system suffering a disproportional
number/time of lock waits...
> about the patch itself:
> you haven't documented either. what is the pg_stat_lock_waits view
> for? and what are those fieldx it has?
>
>
Yeah, those fields are straight from the LOCKTAG structure. I need to
redo the view to be more like pg_locks, and also do the doco. However
I've been a bit hesitant to dive into these last two steps until I see
that there is some real enthusiasm for this patch (or failing that, a
feeling that it is needed...).
> i'll let this patch as "needs review" for more people to comment on it...
>
>
Agreed, thanks for the comments.
Mark
> One of the things in the patch review checklist is about compiler
> warnings, so I am reporting these:
>
> lock.c: In function `LockAcquire':
> lock.c:797: warning: passing arg 1 of `pgstat_init_lock_wait' discards
> qualifiers from pointer target type
> lock.c:802: warning: passing arg 1 of `pgstat_end_lock_wait' discards
> qualifiers from pointer target type
>
>
>
>
Right, will look at too.
Cheers
Mark
Aren't the huge ones already loggable from the deadlock detector?
With the max, we can at least put an upper limit on how long the
longest ones could have been. However, is there a way to reset the
max? I tried deleting data/pg_stat_tmp, but that didn't work. With
cumulative values, you can you take snapshots and then take the
difference of them, that won't work with max. If the max can't be
reset except with an initdb, I think that makes it barely usable.
> something i have been asked when system starts to slow down is "can we
> know if there were a lock contention on that period"? for now the only
> way to answer that is logging locks
I was surprised to find that running with track_locks on did not cause
a detectable difference in performance, so you could just routinely do
regularly scheduled snapshots and go back and mine them over the time
that a problem was occurring. I just checked with pgbench over
various levels of concurrency and fsync settings. If potential
slowness wouldn't show up there, I don't know how else to look for it.
Cheers,
Jeff
It seems that this patch had open TODO items at the beginning of the
CommitFest (so perhaps we should have bounced it immediately), and I
think that's still the case now, so I am going to mark this as
Returned with Feedback. A lot of good reviewing has been done,
though, so many this can be submitted for a future CommitFest in
something close to a final form.
Thanks,
...Robert