Re: indexes on functions and create or replace function

Tom Lane

unread,

Aug 28, 2008, 2:30:54 PM8/28/08

to

"Matthew Dennis" <mde...@merfer.net> writes:
> Given table T(c1 int) and function F(arg int) create an index on T using
> F(c1). It appears that if you execute "create or replace function F" and
> provide a different implementation that the index still contains the results
> from the original implementation, thus if you execute something like "select
> * from T where F(c1)" after replacing the function that it now misses rows
> that should be returned. In other words, the index isn't aware the function
> is now returning different values. That's not the correct/expected behavior
> is it? I would have expected that replacing the function would have caused
> any indexes that depend on that function to be reindexed/recreated with the
> new function implementation.

If it did that, you (or someone) would complain about the enormous
overhead imposed on trivial updates of the function. Since determining
whether the function actually did change behavior is Turing-complete,
we can't realistically try to determine that in software. So we leave
it up to the user to reindex if he makes a behavioral change in an
indexed function.

(Changing the behavior of an allegedly IMMUTABLE function has a number
of other pitfalls besides that one, btw.)

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-...@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Matthew Dennis

unread,

Aug 28, 2008, 10:21:59 PM8/28/08

to

On Thu, Aug 28, 2008 at 9:30 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:

If it did that, you (or someone) would complain about the enormous
overhead imposed on trivial updates of the function. Since determining
whether the function actually did change behavior is Turing-complete,
we can't realistically try to determine that in software. So we leave
it up to the user to reindex if he makes a behavioral change in an
indexed function.

I have no doubt that someone would complain about it, but I think it's better than the alternative. I received no errors, no warnings and no indication whatsoever that my queries were now returning incorrect results. If people are worried about the case of changing the text/implementation of the function but not the behavior and needing to avoid the reindexing overhead, it should prevent you from doing it unless you explicitly say no reindexing is required. How about having NOREINDEX and/or REINDEX options in the create or replace function syntax? If no indexes depend on the function, no option is required. If there are indexes on the function, require one or the other (thus handling all cases). I have two main issues I'm trying to address:

1) Future queries not returning matching rows.
2) Having to manually go find all the indexes that use that function, drop them, replace the function, recreate them. This just seems like tedioum that really should live interal to PG.

Matthew Dennis

unread,

Aug 28, 2008, 10:25:49 PM8/28/08

to

On Thu, Aug 28, 2008 at 9:30 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:

(Changing the behavior of an allegedly IMMUTABLE function has a number
of other pitfalls besides that one, btw.)

I'm interested in knowing what they are - could you point me in the right direction (I've read the docs on immutable, etc) or briefly discuss them here please? Thanks...

Christophe

unread,

Aug 28, 2008, 10:32:32 PM8/28/08

to

On Aug 28, 2008, at 3:21 PM, Matthew Dennis wrote:
> I have no doubt that someone would complain about it, but I think
> it's better than the alternative.

Determining if changing any function will cause an index to break is
not a straight-forward problem. I don't believe that PG right now
keeps a graph of which functions call which, so replacing a function
deep in the logical calling hierarchy could break the index as easily
as one mentioned at the top.

Tom Lane

unread,

Aug 28, 2008, 11:22:06 PM8/28/08

to

The main one I can think of offhand is that a call of the function might
have been folded to a constant in some cached plan somewhere, and
there's no mechanism to cause that plan to get redone. (This might or
might not get fixed in 8.4 --- since the plan no longer contains any
reference at all to the function, it's not exactly trivial to fix.)

Another thing that's sort of related to the OP's complaint is something
like a table CHECK constraint that calls a user-defined function.
If you alter the function, is the system supposed to run around and
re-verify that constraint on every row? (And if so, what's supposed to
happen on a failure?) We don't enforce any such thing at the moment.

(In fact, putting the two concepts together, it's possible that
redefining a user function that's used in a UNIQUE index might mean that
the UNIQUE condition now fails ... what should happen then?)

Matthew Dennis

unread,

Aug 29, 2008, 12:49:16 AM8/29/08

to

On Thu, Aug 28, 2008 at 5:32 PM, Christophe <x...@thebuild.com> wrote:

On Aug 28, 2008, at 3:21 PM, Matthew Dennis wrote:

I have no doubt that someone would complain about it, but I think it's better than the alternative.

Determining if changing any function will cause an index to break is not a straight-forward problem. I don't believe that PG right now keeps a graph of which functions call which, so replacing a function deep in the logical calling hierarchy could break the index as easily as one mentioned at the top.

Yes, I can see that would indeed be a problem. Are there future plans to start tracking such dependencies? It seems like it would be a good idea in general.

Christophe

unread,

Aug 29, 2008, 12:52:45 AM8/29/08

to

On Aug 28, 2008, at 5:49 PM, Matthew Dennis wrote:
> Yes, I can see that would indeed be a problem. Are there future
> plans to start tracking such dependencies? It seems like it would
> be a good idea in general.

I believe the EXECUTE statement would thwart such plans.

Alvaro Herrera

unread,

Aug 29, 2008, 1:51:22 AM8/29/08

to

Christophe wrote:
>
> On Aug 28, 2008, at 6:10 PM, Matthew Dennis wrote:
>> I'm not sure I follow. Couldn't you track which statements were
>> prepared that called a function and either reprepare (just like
>> reindex, recheck, etc) or in the case of dropping a function, refuse to
>> drop it because something depends on it?
>
> EXECUTE in PL/pgSQL accepts a string and executes it as a statement.
> (That's different from the SQL-level EXECUTE.) I'm not sure how one
> would track dependencies there.

There's no way at all in the general case (a function name could be
passed as a parameter, for example). I think Matthew is suggesting to
track dependencies at run time, but that seems a recipe for burnt
fingers and an overall performance loss, for what seems a dubious gain.

Also, you have to keep in mind that we support pluggable languages. The
function's source code is just an opaque string. It could be anything.
I doubt anyone uses PL/LOLCODE functions in production, but PL/Java
functions are more likely, and for these we only have a class and method
name, and little else.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Christophe

unread,

Aug 29, 2008, 2:01:45 AM8/29/08

to

On Aug 28, 2008, at 6:51 PM, Alvaro Herrera wrote:
> Also, you have to keep in mind that we support pluggable
> languages. The
> function's source code is just an opaque string.

Oh, ouch, right.

I think that this is one of those cases where it's better that we
simply advertise: BE AWARE OF THIS, rather than try to provide a half-
baked solution that gives the illusion of safety without the reality.

Matthew Dennis

unread,

Aug 29, 2008, 2:04:07 AM8/29/08

to

On Thu, Aug 28, 2008 at 8:15 PM, Christophe <x...@thebuild.com> wrote:

On Aug 28, 2008, at 6:10 PM, Matthew Dennis wrote:

I'm not sure I follow. Couldn't you track which statements were prepared that called a function and either reprepare (just like reindex, recheck, etc) or in the case of dropping a function, refuse to drop it because something depends on it?

EXECUTE in PL/pgSQL accepts a string and executes it as a statement. (That's different from the SQL-level EXECUTE.) I'm not sure how one would track dependencies there.

Yes, sorry about that - it should have been obvious the execute you were talking about from the context. In any case, you don't track dependencies there for the same reason you don't track what SQL my application sends. The plpgsql execute statement, as I understand it, means "take this string and execute like a client sent it to you". It's explicit in the definition/description/documentation of the function that you aren't executing a "static" thing that would have dependencies you would track. However, when you used execute, you should get no different repsonse than what a client would get under the same circumstances.

Christophe

unread,

Aug 29, 2008, 2:09:44 AM8/29/08

to

On Aug 28, 2008, at 7:04 PM, Matthew Dennis wrote:
> The plpgsql execute statement, as I understand it, means "take this
> string and execute like a client sent it to you".

Of course, the string could come from anywhere. There's no inherent
reason that I can think of (except good taste) that you could not
write a function that retrieved a string out of a field in a table,
executed it, and returned that as a value for use in an index. The
client didn't send the string along, but it still creates dependencies.

Anyway, as was correctly pointed out, tracking dependencies in PL/
pgSQL is bad enough, but PL/Java, PL/Perl...

Matthew Dennis

unread,

Aug 29, 2008, 2:22:57 AM8/29/08

to

On Thu, Aug 28, 2008 at 9:01 PM, Christophe <x...@thebuild.com> wrote:

On Aug 28, 2008, at 6:51 PM, Alvaro Herrera wrote:

Also, you have to keep in mind that we support pluggable languages. The
function's source code is just an opaque string.

Oh, ouch, right.

I think that this is one of those cases where it's better that we simply advertise: BE AWARE OF THIS, rather than try to provide a half-baked solution that gives the illusion of safety without the reality.

Yes, but in the case of pluggable languages, you still load something that constitutes the "source". In the case of PL/Java, the jar for example. Whenever it changes, the stuff that depends on it up the chain is invalidated/rechecked/whatever. When I have a table that a view references, it tracks that. When I have a function that a index references, it kind of tracks that (I can't drop it because of the dependency, but I can replace it - with no warning/error btw). It just seems a worthy goal to aspire to...

The cases about taking a string and sending it via execute don't seem to fit here for 1) cases where it is impossible to track the dependencies can be trivially constructed and 2) the very nature of the execute statement makes it obvious that it I shouldn't expect it to be tracked. Poor Analogy: In C, if foo calls bar and I remove bar I expect the compiler to tell me. If elsewhere in my code, I construct a memory address of where I believe bar should be and execute it I have no expectations for the compiler to tell me bar was removed.

Matthew Dennis

unread,

Aug 29, 2008, 2:27:17 AM8/29/08

to

On Thu, Aug 28, 2008 at 9:09 PM, Christophe <x...@thebuild.com> wrote:

On Aug 28, 2008, at 7:04 PM, Matthew Dennis wrote:

The plpgsql execute statement, as I understand it, means "take this string and execute like a client sent it to you".

Of course, the string could come from anywhere. There's no inherent reason that I can think of (except good taste) that you could not write a function that retrieved a string out of a field in a table, executed it, and returned that as a value for use in an index. The client didn't send the string along, but it still creates dependencies.

Anyway, as was correctly pointed out, tracking dependencies in PL/pgSQL is bad enough, but PL/Java, PL/Perl...

Actually because reading the string from a table prevents the function from being immutable (it could be stable), it could not be used in an index. However, you're right though that where the string came from is really not important. My point is that nobody would have expectations of execute tracking dependencies of the sql it executes for the same reason nobody has expectations that sql that lives in application code will have it's dependencies tracked by PG...

Christophe

unread,

Aug 29, 2008, 2:33:53 AM8/29/08

to

On Aug 28, 2008, at 7:22 PM, Matthew Dennis wrote:
> Yes, but in the case of pluggable languages, you still load
> something that constitutes the "source". In the case of PL/Java,
> the jar for example.

This would mean that, for example, if you changed any single function
(no matter how distant and irrelevant to the one used to create a
functional index), the jar would change, so we would have to
invalidate all functional indexes written using functions contained
in that jar. I'd certainly rebel at that: It would make deploying a
new version of the jar very expensive, and unavoidably so.

I'd have to say that the current situation falls into the category of
"slightly annoying," but it has the benefit that whether or not to
rebuild the index is left up to me.

Matthew Dennis

unread,

Aug 29, 2008, 2:37:29 AM8/29/08

to

On Thu, Aug 28, 2008 at 8:51 PM, Alvaro Herrera <alvh...@commandprompt.com> wrote:

There's no way at all in the general case (a function name could be
passed as a parameter, for example). I think Matthew is suggesting to
track dependencies at run time, but that seems a recipe for burnt
fingers and an overall performance loss, for what seems a dubious gain.

I wouldn't think that it would be that much of a performance loss (except for DDL type things where you were setting up the dependency in the first place) though you would certainly know better than me. In any case, I know it would be a lot of work and I wasn't suggesting doing it all at once, but rather as a rule of thumb going forward - a continuous improvement to be made as other work is done. If people in a better position than I to gage this really think performance would suffer a great deal or that it's not doable (or shouldn't be done), I respect that. It seems that we should at least give warnings where we can though and probably doc it somewhere. "hey user, you replaced a plpgsql function that was used in an index, your index may now be invalid. Please see deatils at xyz".

Alvaro Herrera

unread,

Aug 29, 2008, 2:41:20 AM8/29/08

to

Matthew Dennis wrote:

> The cases about taking a string and sending it via execute don't seem to fit
> here for 1) cases where it is impossible to track the dependencies can be
> trivially constructed and 2) the very nature of the execute statement makes
> it obvious that it I shouldn't expect it to be tracked. Poor Analogy: In C,
> if foo calls bar and I remove bar I expect the compiler to tell me. If
> elsewhere in my code, I construct a memory address of where I believe bar
> should be and execute it I have no expectations for the compiler to tell me
> bar was removed.

The analogy is poor, yes. A better analogy is the use of dlopen() on a
library. If the library is not present at run time, the compiler will
not tell you anything.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

--

Matthew Dennis

unread,

Aug 30, 2008, 5:27:32 PM8/30/08

to

On Thu, Aug 28, 2008 at 7:45 PM, Matthew Dennis <mde...@merfer.net> wrote:

Another question though. Since I could potentially start transaction, drop indexes/checks, replace function, create indexes/checks, commit tranasaction could I deal with the case of the constant folding into the cached plan by flushing the entire cache in the same transaction? Is cache flushing transactional? The cases I have for this are infrequent in time and the overhead of reindexing things, rechecking checks/unique indexes already dwarf the performance lost to flushing the cache.

On a related note, if I had a maintenence window where I can shutdown all DB access, make the referenced changes to the functions/indexes/caches/checks and restart PG - in your opinion, are there other likely problems to changing an immutable function under those circumstances, or should that be pretty safe? In other words, I have a function that has indexes on it that does the wrong thing - what do I do to replace it?

In the thread below, we kind of got side tracked on some other stuff and I never got an answer to the questions above. Does anyone have any insight/suggestions about the best way to replace a function that is used by an index?

http://groups.google.com/group/pgsql.general/browse_thread/thread/92289ef0c2f5a109/8f96fb24bdd668e8

Peter Eisentraut

unread,

Sep 1, 2008, 1:35:50 PM9/1/08

to

Matthew Dennis wrote:
> Given table T(c1 int) and function F(arg int) create an index on T using
> F(c1). It appears that if you execute "create or replace function F"
> and provide a different implementation that the index still contains the
> results from the original implementation, thus if you execute something
> like "select * from T where F(c1)" after replacing the function that it
> now misses rows that should be returned. In other words, the index
> isn't aware the function is now returning different values. That's not
> the correct/expected behavior is it? I would have expected that
> replacing the function would have caused any indexes that depend on that
> function to be reindexed/recreated with the new function implementation.

I have added a Todo item about this. But as you see, the problem is
quite large and involved.