SQL Output of Migrations (and rollbacks)

117 views
Skip to first unread message

Austin Ziegler

unread,
May 4, 2023, 9:27:34 PM5/4/23
to elixi...@googlegroups.com
For a variety of reasons[1], we are not using Ecto Migrations, but we’re looking at using Parker Selbert’s amazing Oban and eventually Oban Pro which do. There’s a manually updated repository (https://github.com/btwb/oban_migrations_sql) which helps with Oban, but there’s no assistance with Oban Pro.

We’re using Sqitch[2], which uses database native SQL scripts and can do a few other things with the addition of pgTAP[3].

I’d love to be able to run an Ecto migration on a temporary database and have the *executed SQL* written to files in the order of execution so that I could then transfer the migration and rollback SQL to our Sqitch repo.

This could probably have other benefits for some development shops which might need to have a record of schema changes made against production databases for audit.

I’d be interested in working on this feature, but I’m not quite sure where I would start. My thoughts on this are:

1. It should be an application level configuration. (`config :app, App.Repo, migration_sql_output: true`)
2. It should optionally be configurable for an output directory (`config: app, App.Repo, migration_sql_output_path: "priv/repo/sql"` or maybe `config app, App.Repo, migration_sql_output: "priv/repo/sql").)
3. Within the output directory, forward migrations should be written to `migrations/` and rollback migrations should be written to `rollbacks/`.
4. For my immediate purposes, the migrations could be written as `<ouput_path>/migrations/<migration_filename>.sql`, replacing the `.exs`. For auditing purposes, it might be more desirable to write this as `<output_path>/migrations/<migration_run_timestamp>/<migration_filename>.sql` or `<output_path>/rollbacks/<migration_run_timestamp>/<migration_filename>.sql`.

This *may* require changes to various underlying database drivers.

Would this be worthwhile? Should it be something that's more generic (some sort of execution hook that a library can use in order to do this)?

-a

[1] The main reason is that we want to keep database migrations independent of code changes, sort of an extreme version of the Safe Ecto Migrations based on a very old Rails database zero downtime migrations blog post (https://www.cloudbees.com/blog/rails-migrations-zero-downtime).
--

José Valim

unread,
May 5, 2023, 4:02:27 AM5/5/23
to elixi...@googlegroups.com
Two questions:

1. Is the --log-migrations-sql command not enough?

2. Alternatively, could you run migrations and then execute `mix ecto.dump` to get the SQL dump of the database? I guess this would only work the first time though.

--
You received this message because you are subscribed to the Google Groups "elixir-ecto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-ecto...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-ecto/CAJ4ekQsbhnXLhAaxemv5u8Mcx-ie%3Dor8humP%2Bvdfx9i27zUMiQ%40mail.gmail.com.

Austin Ziegler

unread,
May 5, 2023, 10:44:20 AM5/5/23
to elixi...@googlegroups.com
Thanks for the suggestions. Neither one is *entirely* useful. The use of dumpfiles, as you note, is really only useful on the first pass and doesn’t capture changes like `ALTER TABLE foo ADD COLUMN bar TEXT` since it represents a static state.

The suggestion to capturing logs with `--log-migrations-sql` and `--log-migrator-sql` could work well for the audit case, but not so much for my case. This would involve a lot of manual log parsing/editing — and if there’s any data updates, the logs are much harder to parse without using something like https://github.com/fuelen/ecto_dev_logger or a similar function I wrote a while back to "inspect" SQL in a pipeline.

For my needs (converting Ecto migrations into Sqitch migrations), I would prefer to have each migration and corresponding rollback migration contained in its own files. I really *do* prefer Sqitch over Ecto for migrations, even though it goes against the "easy way".

Looking at a few things, I think that with a few small changes, I could write a migration monitor library through telemetry. There would need to be some changes to Ecto to support this:

1. Telemetry events would need to be emitted for the start and end of each migration file. This event should probably include the migration filename and the direction the migration is running.
2. I would need a way to hook into the startup of (all) migrations to register the telemetry handler.
3. I would need a way to hook into the end of either:
  a. each migration to capture all of the migration's telemetry events and write the migration SQL file; OR
  b. all migrations (before the task shuts down) to capture all migration telemetry events and write the migration SQL files.

That is, I need some way to serialize the telemetry handler as part of the migration process so that the migration task doesn’t stop before the files get written.

If such migration hooks aren’t something that should be exposed, then this should probably be part of ecto_sql.

-a

José Valim

unread,
May 5, 2023, 10:46:10 AM5/5/23
to elixi...@googlegroups.com
Telemetry events are a good idea. Note you can also manually run migrations with the Ecto.Migrator API, so you can hook up the before and after migrations by wrapping the API directly. That's what the Mix tasks do anyway. So perhaps we only need 1 (events before and after each file)?

Austin Ziegler

unread,
May 5, 2023, 11:53:01 AM5/5/23
to elixi...@googlegroups.com
I’ll fork ecto_sql to play with this and see what I can come up with for the additional telemetry event and a wrapper library.

-a

Micah Mayo

unread,
May 5, 2023, 12:57:10 PM5/5/23
to elixir-ecto
@Austin,

You shouldn’t need to fork ecto_sql to accomplish what you’re doing. I’ve dealt with this sort of constraint (we have to use gh-ost for migrations) via telemetry. We created a mix task (gen.ghost) that:

1. Attached an event handler to repo telemetry events
2. Start an Agent process so we can pass some data back from the handler to the mix task
3. Use the Ecto Migrator API to get the list of pending migrations
4. Iterate over migrations, passing them to a modified version phoenix release migrator, which will apply or roll back a single migration
5. The telemetry event generated by Repo will have the executed SQL query, along with the updates to the schema migrations table. We saved those via the agent.
6. Once the individual migration is complete, we grab those queries and write a JSON blob that gh-ost likes in a sub folder in priv — the file is something like <migration_num>_up.json
7. We then repeat the process in reverse, executing rollbacks on all the migrations we ran, making <migration_num>_down.json files. 

There are a lot of steps here, but it was actually pretty simple to implement. The hard part was sorting out how to get the data out of Ecto. We only do this right before submitting a PR so it doesn’t add any overhead day-to-day, but it is required to be included in any PR that has a migration. 

We like this process because:
* we keep a parallel history of the ecto migrations and the generated SQL
* developers can keep using the migration APIs and not worry too much about the SQL
* we do have an opportunity to scrutinize the actual SQL generated by the migration in review
* putting together the gh-ost migration is sort’ve tedious, and since we save the migration in the correct format, running it is earlier.

Austin Ziegler

unread,
May 5, 2023, 1:09:08 PM5/5/23
to elixi...@googlegroups.com
The fork is to explore adding telemetry events *from* the migration API as I suggested. We do not use Ecto migrations in our development flow (our database schemas are kept in entirely separate repositories), and would only be using this to capture new migrations from third party tooling like Oban so that they can be added to the schema repos using the required Sqitch changeset.

Once the new telemetry events are added, then I can get the output that I want (I think) and can process this with a "normal" mix task.

After playing with it, I will see whether the new telemetry events are useful and how it might best be used, since this is (for us) an infrequently executed task for only one dependency. At that point, I’ll open pull requests as required and possibly release a library that does what I’m trying to do.

-a

--
You received this message because you are subscribed to the Google Groups "elixir-ecto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-ecto...@googlegroups.com.

Micah Mayo

unread,
May 5, 2023, 1:27:56 PM5/5/23
to elixir-ecto
So my suggestion was based on your initial description of the feature you were proposing:

We’re using Sqitch[2], which uses database native SQL scripts and can do a few other things with the addition of pgTAP[3].

 > I’d love to be able to run an Ecto migration on a temporary database and have the *executed SQL* written to files in the order of execution so that I could then transfer the migration and rollback SQL to our Sqitch repo.

This is exactly the scenario we have and how we run migrations, except with gh-ost instead of squitch. The mix task I described does exactly what you’re describing here. The fact that we in addition to third party library migrations, we also use ecto migrations to build our database during development, or that we store the captured SQL in the project repo instead of an external repo, doesn’t seem relevant to the mechanism for capturing the migration.

At any rate, this seemed a more direct path to what you were looking for so I thought I’d share. 

Austin Ziegler

unread,
May 15, 2023, 9:21:01 PM5/15/23
to elixi...@googlegroups.com
Micah,

Thanks for your input. I’ve looked over your recommendations and I *mostly* agree with your approach. It might even be something that I can add into our umbrella application only under development so we don’t have to keep a separate repository for the migration generator (if I can get this working with a dynamic Repo, then it won’t even be something that is visible to the deploy flow).

That said, I thought I’d share my recent tests and the changes that I want to propose for ecto_sql to offer that extra telemetry event. Where I’ve placed it seems to be the place where least repetition will be required, but there is some loss of data from the perspective of the migrator API vs wrapping `ecto.migrate`.


Using the capture logic in https://github.com/halostatue/ecto_migrate_capture, I’ve been able to produce the following output with `mix ecto.migrate.capture && mix ecto.rollback.capture --all` where I have separate migrations for each of Oban's 11 schema migrations.

priv/repo/migrations/migrate/20230515212821/00000000000000_ecto_migrate_preamble.sql
priv/repo/migrations/migrate/20230515212821/20230515193510_oban_version1.sql
priv/repo/migrations/migrate/20230515212821/20230515193523_oban_version11.sql
priv/repo/migrations/rollback/20230515212825/00000000000000_ecto_rollback_preamble.sql
priv/repo/migrations/rollback/20230515212825/20230515193510_oban_version1.sql
priv/repo/migrations/rollback/20230515212825/20230515193523_oban_version11.sql


The code in EctoMigrateCapture is a bit messy, and I probably could have chosen to record an event stream that got filtered later, but this works reasonably well. The biggest issue is that the Ecto.Migrator API returns pending migrations as `{state, version, string}` such as `{:down, 20230515193510, "oban_version_1"}` and because I’m only emitting the module in the telemetry event (`EctoMigrateCapture.Repo.Migrations.ObanVersion1`), that module is converted back as `oban_version1` instead of `oban_version_1`.

I’ve lifted the formatting code from ecto_dev_logger, but I could have just as well lifted it from code we implemented internally four or five years ago. The `preamble` code are queries that are executed before the first migration is run, generally to find which schema version is the starting point (these are marked with `schema_migration: true`, IIRC). I had to *skip* `schema_migration: true` as a filter because Oban doesn't currently implement it in its delegated migrations (and this was a bit of a "footnote" update to a point release of ecto_sql).

For my needs, I would probably *eliminate* all queries related to the `schema_migrations` table, since it’s completely unused by Sqitch. It would be useful to have those queries tagged *explicitly* in telemetry, but string matching is probably sufficient for my needs (I wouldn’t be overriding the name of the schema migrations name). I probably need to add an index value so that it’s clear in which *order* the migration vs rollback ran. I also probably need to have it generate both in sequence, so this would definitely be a custom task for me.

So, to my feature request questions / suggestions:

1. I think that the new telemetry event should be added, even if it’s somewhat different than what I provided. It would be great if we could get *both* the module name and the "string" version name, but as the module *mostly* converts back to the string nicely, I think that is a nice-to-have, not a must-have. (I implemented this as a span, but I really only *need* the start event; other peoples' needs may benefit from the *stop* event.)

2. I think that EctoSQL should provide a query string formatter with callbacks to EctoSQL adapter modules for dialect differences, as well as an option in the formatter function for a callback to user code. I’d be happy to share what we’ve done at Kinetic (I think we handle a few possible data type cases better for normal SQL inspection, but especially as PostgreSQL types are so extensible…), but what @fuelen has done with fuelen/ecto_dev_logger is also a really good start.

3. Even though I will probably adjust my immediate approach to do what has been suggested by Micah as my needs are very *custom*, I *still* think that being able to write the executed SQL to a file during migration would be a good optional flag to add to `ecto.migrate` and `ecto.roleback`.

4. I think that the query capture via telemetry should be documented somewhere in EctoSQL documentation in the event that someone *else* has a similar need.

-a


Reply all
Reply to author
Forward
0 new messages