Sqitch changes vs. transactional DDL

126 views
Skip to first unread message

Jonathan Rogers

unread,
May 27, 2015, 4:58:39 PM5/27/15
to sqitch...@googlegroups.com
PostgreSQL has transactional DDL, a feature we use and depend on daily. We commonly write psql scripts which create tables, functions and triggers, all within a single transaction so that we get all changes or none of them. I would like to use the approach to managing functions described in the Sqitch tutorial, which creates a Sqitch change for each function, separate from other DDL. Unfortunately, it seems that this approach loses the advantages of transactional DDL since each change executes in its own transaction. Is there any way to get the best of both worlds?

David E. Wheeler

unread,
May 27, 2015, 5:25:20 PM5/27/15
to Jonathan Rogers, sqitch...@googlegroups.com
On May 27, 2015, at 1:58 PM, Jonathan Rogers <jonatha...@gmail.com> wrote:

> PostgreSQL has transactional DDL, a feature we use and depend on daily. We commonly write psql scripts which create tables, functions and triggers, all within a single transaction so that we get all changes or none of them. I would like to use the approach to managing functions described in the Sqitch tutorial, which creates a Sqitch change for each function, separate from other DDL. Unfortunately, it seems that this approach loses the advantages of transactional DDL since each change executes in its own transaction. Is there any way to get the best of both worlds?

Not without putting all changes you want in a single transaction into a single change deploy script with a BEGIN at the top and a COMMIT at the end. However, if you make use of the revert scripts, changes will be reverted on failure. Not quite the same, I grant you, but good enough for most folks. I tend to only collect multiple DDL statements in a single script when it’s essential that they all be committed in a single commit.

Best,

David

Jonathan Rogers

unread,
May 28, 2015, 4:04:10 AM5/28/15
to David E. Wheeler, sqitch...@googlegroups.com
Thanks for the quick response. We do frequently rely on the atomicity of
a single commit when deploying changes to production. Revert scripts are
a nice idea, but using them requires significantly more work than just
putting all related changes in a transaction. A revert script also isn't
guaranteed to return the database to the state it was in before the
deploy script was executed. In particular, it is impossible to write a
revert script for a deploy script which destroys information in a table,
as when issuing a DELETE, DROP or TRUNCATE.

Though sqitch's general approach fits our workflow, I think that we will
continue to use psql's "\i" directive within deploy scripts to define or
redefine objects such as views and functions rather than making each
object a sqitch change.

-- Jonathan Rogers

Adrian Klaver

unread,
May 28, 2015, 9:50:26 AM5/28/15
to Jonathan Rogers, David E. Wheeler, sqitch...@googlegroups.com
I am not following. I have written revert scripts with DROP TABLE. Can
you be more specific?

>
> Though sqitch's general approach fits our workflow, I think that we will
> continue to use psql's "\i" directive within deploy scripts to define or
> redefine objects such as views and functions rather than making each
> object a sqitch change.
>
> -- Jonathan Rogers
>


--
Adrian Klaver
adrian...@gmail.com

Jonathan Rogers

unread,
May 28, 2015, 10:53:00 AM5/28/15
to Adrian Klaver, David E. Wheeler, sqitch...@googlegroups.com
If there's a table with a bunch of rows in it and you DELETE FROM or
TRUNCATE it, how could you get those rows back without consulting some
sort of backup? An UPDATE also frequently loses information.

--
Jonathan Rogers

David E. Wheeler

unread,
May 28, 2015, 12:13:12 PM5/28/15
to Jonathan Rogers, Adrian Klaver, sqitch...@googlegroups.com
On May 28, 2015, at 7:52 AM, Jonathan Rogers <jonatha...@gmail.com> wrote:

> If there's a table with a bunch of rows in it and you DELETE FROM or
> TRUNCATE it, how could you get those rows back without consulting some
> sort of backup? An UPDATE also frequently loses information.

I find that I don’t often have DML in my deployment scripts, usually only for data that rarely changes (like a table listing states or something). If I had to update such a table, I would do so in a single deploy script, and handle any other related updates (FK references, etc.), in a single transaction. It’s rare that I want to do a whole mess of DML at once; generally, I’m able to factor my database changes into discrete chunks that make sense to use as individual changes that depend on each other. But if I did have such a level of interdependency, I would put those in a single script, too.

But that script would just have DML, no DDL. DDL statements would be in their own scripts.

I wrote up my thoughts on DML deploys a while back.

http://theory.so/sqitch/data/2013/08/28/data-deployment-with-sqitch/

See especially the “Data Deploy Best Practice” section at the end. Deploy hooks are still on the to-do list.

Best,

David

Adrian Klaver

unread,
May 31, 2015, 3:52:16 PM5/31/15
to Jonathan Rogers, David E. Wheeler, sqitch...@googlegroups.com
Alright, now I am little confused:) You original question was about
being able to destroy data. So you want to be able to recreate data? For
the simple case in Postgres I could see SELECT INTO(or CREATE TABLE AS)
old_data.table_name_old run before you do DROP/DELETE/TRUNCATE. For
getting the data you would need to use a function that looks for the
table_name_old and if it exists pull the data in, otherwise return an
empty row. This assumes the table definition and hence the mapping of
data(types) to columns does not change. Otherwise you need to come up
with a way of dealing with that. Per Davids response I would see
breaking this into smaller chunks and use requires to tie the parts
together


--
Adrian Klaver
adrian...@gmail.com

Jonathan Rogers

unread,
Jun 2, 2015, 7:56:04 PM6/2/15
to Adrian Klaver, David E. Wheeler, sqitch...@googlegroups.com
I don't want to recreate data, but to write a Sqitch revert script to
accompany a deploy script containing data-destroying operations such as
DELETE or TRUNCATE, that would be necessary. I'm pointing this out to
emphasize that revert scripts are not a sufficient substitute for
putting several related changes into a transaction.

Rather than writing revert scripts, I want to be able to guarantee that
a set of related changes are all applied or none of them is. Postgres
provides this via transactions, which I use constantly for deploying
changes on both development and production. If I start using Sqitch, I
won't abandon this very useful tool. Sqitch doesn't require revert
scripts to exist, so I can just skip that feature AFAICT.

--
Jonathan Rogers

Jonathan Rogers

unread,
Jun 2, 2015, 8:21:11 PM6/2/15
to David E. Wheeler, Adrian Klaver, sqitch...@googlegroups.com
BTW, thanks for a great tool and responses.

On 05/28/2015 12:13 PM, David E. Wheeler wrote:
> On May 28, 2015, at 7:52 AM, Jonathan Rogers <jonatha...@gmail.com> wrote:
>
>> If there's a table with a bunch of rows in it and you DELETE FROM or
>> TRUNCATE it, how could you get those rows back without consulting some
>> sort of backup? An UPDATE also frequently loses information.
>
> I find that I don’t often have DML in my deployment scripts, usually only for data that rarely changes (like a table listing states or something). If I had to update such a table, I would do so in a single deploy script, and handle any other related updates (FK references, etc.), in a single transaction. It’s rare that I want to do a whole mess of DML at once; generally, I’m able to factor my database changes into discrete chunks that make sense to use as individual changes that depend on each other. But if I did have such a level of interdependency, I would put those in a single script, too.
>

I often need to execute DML statements in deploy scripts. Often, it's in
static tables such as your states example. Sometimes, it's to augment or
correct dynamic tables.

> But that script would just have DML, no DDL. DDL statements would be in their own scripts.

I often have DML and DDL that are mutually dependent. For example, I've
added a column which refers to another table, updated the column, then
added a foreign key. Other times, I've added a constraint on a table
implemented by a CHECK or trigger after updating the table to comply
with the new constraint.

>
> I wrote up my thoughts on DML deploys a while back.
>
> http://theory.so/sqitch/data/2013/08/28/data-deployment-with-sqitch/
>
> See especially the “Data Deploy Best Practice” section at the end. Deploy hooks are still on the to-do list.

It's a nice idea, but I'm not sure that the extra effort to always
separate DML from DDL is worth it for my needs. I'm also not sure what
this means: "Conditions must deal with side-effects, such as foreign key
constraints." I can't see how it makes sense to describe a declarative
data constraint as a "side-effect."

After experimenting with small, concise Sqitch changes for a while, I'm
wondering if it would be possible to get the benefits of that approach
without abandoning the benefits of transactions. For example, Sqitch
could concatenate the SQL from all scripts to deploy and pass them to
psql as a unit, allowing them all to execute within a single
transaction. Of course, each deploy script would need to be free of
BEGINs and COMMITs.

--
Jonathan Rogers

Adrian Klaver

unread,
Jun 2, 2015, 8:25:13 PM6/2/15
to Jonathan Rogers, David E. Wheeler, sqitch...@googlegroups.com
So what is this referring to?:

"If there's a table with a bunch of rows in it and you DELETE FROM or
TRUNCATE it, how could you get those rows back...."

> but to write a Sqitch revert script to
> accompany a deploy script containing data-destroying operations such as
> DELETE or TRUNCATE, that would be necessary. I'm pointing this out to
> emphasize that revert scripts are not a sufficient substitute for
> putting several related changes into a transaction.

You are talking different use cases. sqitch is a schema change
management tool, where you may revert at some point in the future.
Transactions are a point in time tool, unless you really want to hold a
transaction open for some indefinite time period. So if you want to
sqitch to find changed data you will need to store it somewhere.
Actually that applies to a transaction if you plan on doing a COMMIT.

>
> Rather than writing revert scripts, I want to be able to guarantee that
> a set of related changes are all applied or none of them is.

add --requires does not work?

> Postgres
> provides this via transactions, which I use constantly for deploying
> changes on both development and production.

You can use transactions in a script.

> If I start using Sqitch, I
> won't abandon this very useful tool. Sqitch doesn't require revert
> scripts to exist, so I can just skip that feature AFAICT.
>

No you don't, though I am still at a loss for you are trying to do.
Maybe a simple example of what you are doing would help me understand.



--
Adrian Klaver
adrian...@gmail.com

David E. Wheeler

unread,
Jun 3, 2015, 1:25:44 PM6/3/15
to Jonathan Rogers, Adrian Klaver, sqitch...@googlegroups.com
On Jun 2, 2015, at 4:56 PM, Jonathan Rogers <jonatha...@gmail.com> wrote:

> I don't want to recreate data, but to write a Sqitch revert script to
> accompany a deploy script containing data-destroying operations such as
> DELETE or TRUNCATE, that would be necessary. I'm pointing this out to
> emphasize that revert scripts are not a sufficient substitute for
> putting several related changes into a transaction.

Yes, by all means, if you have tightly-coupled changes, put them in a single deploy script that starts with BEGIN and ends with COMMIT.

> Rather than writing revert scripts, I want to be able to guarantee that
> a set of related changes are all applied or none of them is. Postgres
> provides this via transactions, which I use constantly for deploying
> changes on both development and production. If I start using Sqitch, I
> won't abandon this very useful tool. Sqitch doesn't require revert
> scripts to exist, so I can just skip that feature AFAICT.

It’s a matter of taste, I guess. If you use `sqitch deploy --mode all`, then in the event of a deploy failure, it will revert all the way back to where you started the deploy. Of course, it’s up to you to get the deploy scripts right (and yes, you can write unmodified data to an unlogged table to recover it in a revert script, as Adrian suggested). So it’s a guarantee of sorts. Not as strong as a single transaction, but it’d be a lot harder to put *all* of your changes into a single transaction. Harder to maintain.

Best,

David

David E. Wheeler

unread,
Jun 3, 2015, 1:33:50 PM6/3/15
to Jonathan Rogers, Adrian Klaver, sqitch...@googlegroups.com
On Jun 2, 2015, at 5:21 PM, Jonathan Rogers <jonatha...@gmail.com> wrote:

> I often have DML and DDL that are mutually dependent. For example, I've
> added a column which refers to another table, updated the column, then
> added a foreign key.

Yep, I’d put those into a single deploy script, too.

> Other times, I've added a constraint on a table
> implemented by a CHECK or trigger after updating the table to comply
> with the new constraint.

You can put these into a single script, but it’s not necessary. The CHECK constraint will only execute against new rows.

> It's a nice idea, but I'm not sure that the extra effort to always
> separate DML from DDL is worth it for my needs. I'm also not sure what
> this means: "Conditions must deal with side-effects, such as foreign key
> constraints." I can't see how it makes sense to describe a declarative
> data constraint as a "side-effect.”

If you add a FK constraint, a side-effect is that existing rows must have values that conform to that constraint.

> After experimenting with small, concise Sqitch changes for a while, I'm
> wondering if it would be possible to get the benefits of that approach
> without abandoning the benefits of transactions. For example, Sqitch
> could concatenate the SQL from all scripts to deploy and pass them to
> psql as a unit, allowing them all to execute within a single
> transaction. Of course, each deploy script would need to be free of
> BEGINs and COMMITs.

I think it might be useful to add a command that would create a single file you could distribute. Maybe call it `collect`, `assemble`, `gather`, or `dump`? Would be nice for packaging and shipping your project without a Sqitch plan and all the change files, just a single SQL file (or perhaps one for each tag in the plan, to make upgrades easier?). I’d be happy to merge such a feature into Sqitch.

Best,

David


Reply all
Reply to author
Forward
0 new messages