> On 07 Sep 2016, at 22:10, Ivan Novick <
ino...@pivotal.io> wrote:
>
> It sounds like there are a few fundamental PostgreSQL things we are pushing to get added as infrastructure that could enable the features we want:
>
> 1) Ability to store original view text in the database
In upstream this has little chance of happening, it has been up for discussion
many times and always been rejected (not that I bear any weight on that call
but a patch implementing it would have to solve all the pre-existing issues
raised). That being said, nothing stops us from saving the query text in the
database in a normal table or in a COMMENT on the actual view as per my earlier
email in this thread.
> 2) Ability to invalidate a view instead of giving a dependency error (Would need a place in the catalog to mark views as valid/invalid)
What is the difference between an invalidated view and a dropped view? Since a
view is a compiled representation of a query you still need to compile the
query to make it valid again much like if it’s dropped. A transaction
involving an invalid view will still ROLLBACK albeit I guess with a different
error message.
The one different can be that an invalid view retains the dependency chain but
that opens up another can of worms, what to do when database changes are
hamstrung/blocked by dependencies in abaondoned invalid views?
> 3) Ability to re-compile the view and attempt to make it valid again.
See my reply to Yandong Yao earlier today for a comment on this as well as
below for an alternative approach which is less brute-force.
https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/_TpeAiFAcsw/gXLBQzl0AwAJ
> Above is basically what the customers/users of other database systems like Oracle/TD/IBM are asking for in PG based databases like GPDB.
Depending on what they use today they are likely to want radically different
things though? In a post to -hackers an ex Greenplum employee gives a good
summary of the current state at the time of writing (which given that this is
base level plumbing I assume hasn’t changed drastically):
https://www.postgresql.org/message-id/C75261D0.3FE4%25cwelton%40greenplum.com
The conclusion being that every product is slightly broken in disjoint ways =)
Now, as stated earlier in this thread, not doing anything is clearly not an
improvement. Personally I think we can do better than all of the above. We
need to supply good tooling but I am personally not convinced that this tooling
belongs in the database (apart from an API assisting said tooling). Below is a
proposal based off previous discussions here and a prior suggestion from Heikki
in another thread.
First let’s identify which problems it is we are optimizing for:
* The DBA own the relations, application developers own the views
* A relation can have many views
* Figuring out which views that need to be changed on ALTER TABLE is hard
* Everyone wants automagic but human intervention is needed in some cases
* Fixing the views is generally punted to after-the-fact
Did I miss any?
Based on the above set, my take is that we thus need tooling that can assist
the relation owner in figuring out a) what is impacted by a change; b) what can
safely be changed automatically; c) what needs human intervention to be fixed;
d) what will be broken if the objects in c are dropped/invalidated. Further it
should ideally pre-empt problems rather than return a list of now defunct
database objects (or invalidating them silently).
Assumptions/Requirements:
* The query text is available *somewhere*, it can be in the database but it
can also be in a Git repository, doesn’t matter (pluggable part of the tool
to adapt to customer needs?) The reason for wanting the original query text
is that formatting and comments etc which can greatly assist in fixing a
query aren’t propagated to the parsetree.
* We can extend the view text *in some way* with unittest data and queries.
It can be a separate file in a Git repo or in a database column should we
want to store it there.
* The tool should not rely on 100% coverage for the previous assumptions,
if no test exists then it shouldn’t fail, if no query test exist it
shouldn’t fail. Since the tool will be retrofitted on top of production
setups, make it possible to back-fill all the details.
With this, the tool should then for a change of datatype of column c in table t
perform:
1. Identify recursively every database object that needs to be changed
when altering the type of column c.
2. The set of views identified as dependent are V, for each view v in V
emit and run a script testing the change. If the unittest query isn’t
available then mark that by running select no_test(); which includes that
information in the report. If the view text isn’t available, take the
parsetree generated definition from the database.
begin;
create table tt (like t);
insert into table tt values (<testcase data 1>), .. (<testcase data n>);
alter table tt alter column c type foo;
create view v as <query text with t replaced with tt>;
select ok(<testcase query 1>);
..
select ok(<testcase query n>);
rollback;
3. Generate a report for the processing in step 2, which views can be
safely recompiled and which will break as well as the dependency chain
outlining why they break (a view not even impacted at all by the change of
c might be broken due to it selecting from a view that ended up being
dropped).
4. Iff the operator requests to continue, generate DDL to alter the type
by recreating the views that passed test and drop the views that failed the
test (if any).
5. The DDL can be inspected and version controlled before executed in the
database. If a staging environment exists it can also be tested there
before going into production. This way the underlying view queries can be
fixed and the process repeated until satisfaction that business logic and
processes won’t suddenly fail.
The functionality we need from the database is thus to identify all dependant
objects and test functionality much like what pgTAP offers (albeit a subset of
it).
Basically it’s a twist on recompiling from the textual representation that adds
a level of certainty on the success, and also properly identifies what needs to
be addressed in case of failures. It does add work since the views need tests
but that shouldn’t be much of a showstopper, having tests is a Good Thing.
Also, if executed without any tests etc it would still work, just be less user
friendly and give less value.
Without having given it too much thought I think something like the above also
could be used in Chef (and possibly Puppet) for automating node convergence on
database changes for anyone that have a CI pipeline like that. That would need
further validation though.
Is this proposal a way forward in giving our users a better experience? Am I
missing any parts of the equation?
cheers ./daniel