Flywaydb java based migration and "history preserving approach"

220 views
Skip to first unread message

joseph...@gmail.com

unread,
Jan 3, 2016, 6:10:23 PM1/3/16
to jOOQ User Group
Hi

When reading the documentation, I was surprised to see the migration scripts for flyway were all sql files and not java code, yet flyway provides such hooks as well (http://flywaydb.org/documentation/migration/java.html).

Is there a reason for sticking with sql files ?

Still, I've rewritten the "Using jOOQ with Flyway" part of the documentation http://www.jooq.org/doc/3.7/manual-single-page/#jooq-with-flyway with the java based migration and jOOQ createTable statements and the like. I could contribute it back if you wish so, let me know.

Still, while working on it and thinking how we currently do migrations at work, with sql files, I felt like the approach taken in the documentation was a bit "incomplete": each time a new migration is added, the model is generated again and then one should look for compilation error. Don't get me wrong, it's already pretty and for sure needed. Yet the complete disparition of the previous model bothers me. At work we do like this and from times to times it bits us back. How was it before? Why did we change this stuff this way? Is it really the correct name for the old Foo columns?

As such, I tried the following approach:
- a model package, containing mostly empty classes subclassing the latest generated version
- a package per previous version, named currently model.previous.vX. This contains the generated code "at the time" and is used for the migration scripts.

When some change is to be done, I :
- create a new java migration, with just strings for the new stuff and using the latest version model for the current state
- run the new migration
- generate the new model
- copy/paste the new model in model.previous.vX+1
- update the migration script with the freshly generated model constants
- change the classes in the model package to extends this new model contents

The non migration code only uses the classes in the model package.

In the end:
- history is preserved,
- the migration scripts are typesafe,
- non migration code isn't burdened by migration management.

Does it look like a good approach?

Thanks in advance

Best
joseph

Lukas Eder

unread,
Jan 4, 2016, 4:33:32 AM1/4/16
to jooq...@googlegroups.com
2016-01-04 0:10 GMT+01:00 <joseph...@gmail.com>:
Hi

When reading the documentation, I was surprised to see the migration scripts for flyway were all sql files and not java code, yet flyway provides such hooks as well (http://flywaydb.org/documentation/migration/java.html).

Is there a reason for sticking with sql files ?

Yes! No way I'm going to migrate e.g. an Oracle database with Java, when there is PL/SQL. :-)

But YMMV of course.

Still, I've rewritten the "Using jOOQ with Flyway" part of the documentation http://www.jooq.org/doc/3.7/manual-single-page/#jooq-with-flyway with the java based migration and jOOQ createTable statements and the like. I could contribute it back if you wish so, let me know.

Sure, that would be very useful! Thank you very much for offering this.

I would need you to sign the contributor agreement first:

Then, you can contribute your changes in any way you find suitable. We currently don't host the manual sources on GitHub (yet), so a pull request is currently not possible, I'm afraid.
 
Still, while working on it and thinking how we currently do migrations at work, with sql files, I felt like the approach taken in the documentation was a bit "incomplete": each time a new migration is added, the model is generated again and then one should look for compilation error. Don't get me wrong, it's already pretty and for sure needed. Yet the complete disparition of the previous model bothers me. At work we do like this and from times to times it bits us back. How was it before? Why did we change this stuff this way? Is it really the correct name for the old Foo columns?

That's true. It would be useful to maintain versions of the schema also in the "current" Java code, not only in version control. Sure, version controlling generated sources is one way to tackle the problem (there are some discussions around that on this list). But having all versions at runtime in some form might also be useful. We have a pending feature request for this, but haven't thought this through yet:

It looks as though we can certainly learn from you here.
 
As such, I tried the following approach:
- a model package, containing mostly empty classes subclassing the latest generated version

Interesting, could you give an example about this subclassing that you're doing?
 
- a package per previous version, named currently model.previous.vX. This contains the generated code "at the time" and is used for the migration scripts.

When some change is to be done, I :
- create a new java migration, with just strings for the new stuff and using the latest version model for the current state
- run the new migration
- generate the new model
- copy/paste the new model in model.previous.vX+1
- update the migration script with the freshly generated model constants
- change the classes in the model package to extends this new model contents

The non migration code only uses the classes in the model package.

In the end:
- history is preserved,
- the migration scripts are typesafe,
- non migration code isn't burdened by migration management.

Does it look like a good approach?

It does indeed. And I'd like to learn more about how you're doing this. Specifically, if you find anything missing from jOOQ's DDL support, I'd love to know - there's still lots of room for improvement!

What I think would be very useful for the community would be a blog post with more details, in case you do write blogs. I think you've found a great solution to something that many people are thinking about, and that neither jOOQ nor Flyway currently offer an out-of-the-box solution for.

Cheers,
Lukas 

--
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

joseph...@gmail.com

unread,
Jan 4, 2016, 5:37:45 PM1/4/16
to jOOQ User Group
Hi Lukas

Once again, thanks for your very nice answer.

I reply inline.

Still, I've rewritten the "Using jOOQ with Flyway" part of the documentation http://www.jooq.org/doc/3.7/manual-single-page/#jooq-with-flyway with the java based migration and jOOQ createTable statements and the like. I could contribute it back if you wish so, let me know.

Sure, that would be very useful! Thank you very much for offering this.

I would need you to sign the contributor agreement first:

Sure, but actually I stumbled upon the Corporate Contributor Information part: my employer indeed works in software but has no policy I know of regarding Open Source and my contract is totally free of such restriction. I'll try to get some answer internally but I don't expect much (nor fast)... 


Then, you can contribute your changes in any way you find suitable. We currently don't host the manual sources on GitHub (yet), so a pull request is currently not possible, I'm afraid.

ok

well, actually I changed the way migration are triggered to have them done from a main method. It feels closer from "real life" to me, yet it makes the example a bit more complex. Is still fine for you ?

If fine, I think I'll start with a github project and some markdown text for the documentation.
 
Still, while working on it and thinking how we currently do migrations at work, with sql files, I felt like the approach taken in the documentation was a bit "incomplete": each time a new migration is added, the model is generated again and then one should look for compilation error. Don't get me wrong, it's already pretty and for sure needed. Yet the complete disparition of the previous model bothers me. At work we do like this and from times to times it bits us back. How was it before? Why did we change this stuff this way? Is it really the correct name for the old Foo columns?

That's true. It would be useful to maintain versions of the schema also in the "current" Java code, not only in version control. Sure, version controlling generated sources is one way to tackle the problem (there are some discussions around that on this list). But having all versions at runtime in some form might also be useful. We have a pending feature request for this, but haven't thought this through yet:

Well, I'm unsure about annotation there. What if I deleted some columns, what to annotate? And changes could be subtler, like some type change, and then how to reflect this ? What if multiple changes were done over time ?

It looks as though we can certainly learn from you here.

Not so fast ;)

Actually we have the issue currently in our production code : changes over time are only visible in the migration files.
Which has two main drawbacks :
- migrations aren't typesafe (and in 10+ years of ongoing dev, typo were made, actually they happen something like twice a year)
- in case of issue, figuring out what was before isn't always easy (especially when some migration actions were done, like altering data to fit into the new schema)

But the solution I was thinking of is totally unproven up to know, so it's a bit earlier to say how good it is.

Let see it in more details :
 
As such, I tried the following approach:
- a model package, containing mostly empty classes subclassing the latest generated version

Interesting, could you give an example about this subclassing that you're doing?

Well, that's pretty simple, for example:
package org.jooq.example.model.tables;

public class Author extends org.jooq.example.model.previous.v3.tables.Author {

}

I'm thinking of doing it for every stuff from the generated model which is used at least once outside of the migration steps.

So when a new migration is done, I simply have to change the extend to v4 and then typesafety kicks in and tells me of any issue.
 
 
(...)

Does it look like a good approach?

It does indeed.

Actually while doing it I was wondering about the verbosity of this approach. I end up with all tables/sequences/whatnot defined whereas only the delta really matters. That's why I'm unsure about it. On the other hand, type safe migration rocks :)
 
And I'd like to learn more about how you're doing this.

I will put it all on github so you can have a look. It's pretty simple right now (really just your doc sample rewritten) so easy to figure out I guess. Eager to have your feedback, I upload this all soon (as soon as possible but that may be by the end of the week only - sorry).
 
Specifically, if you find anything missing from jOOQ's DDL support, I'd love to know - there's still lots of room for improvement!

Well, one little issue was the lack of unique support at table creation, I had to add the constraint later on:
        create.createTable(AUTHOR)
                .column(AUTHOR.ID, SQLDataType.INTEGER.nullable(false))
                .column(AUTHOR.FIRST_NAME, SQLDataType.VARCHAR.length(50))
                .column(AUTHOR.LAST_NAME, SQLDataType.VARCHAR.length(50).nullable(false))
                .column(AUTHOR.DATE_OF_BIRTH, SQLDataType.DATE)
                .column(AUTHOR.YEAR_OF_BIRTH, SQLDataType.INTEGER)
                .column(AUTHOR.ADDRESS, SQLDataType.VARCHAR.length(50))
                .execute();

        create.alterTable(AUTHOR).add(constraint("PK_T_AUTHOR").primaryKey(AUTHOR.ID)).execute();

On the way I spotted as well that the constraint name isn't put in the generated stuff (at least as far as I've seen), which I guess could be an issue sometime (when changing it mainly, by dropping/create). Having it as well in the generated class would be sweet.
 

What I think would be very useful for the community would be a blog post with more details, in case you do write blogs. I think you've found a great solution to something that many people are thinking about, and that neither jOOQ nor Flyway currently offer an out-of-the-box solution for.

Not so fast again ;) lol

Anyway, I've some blog rusting somewhere, I'm not sure it's the best way to communicate around this topic. Let's see when we'll have discuss it more :)

Best

PS : I reply on the "[REQUEST FOR FEEDBACK] New jOOQ code generator: What do you want it to be?" another day

Lukas Eder

unread,
Jan 5, 2016, 2:22:41 AM1/5/16
to jooq...@googlegroups.com
Hi Joseph,

2016-01-04 23:37 GMT+01:00 <joseph...@gmail.com>:
Sure, but actually I stumbled upon the Corporate Contributor Information part: my employer indeed works in software but has no policy I know of regarding Open Source and my contract is totally free of such restriction. I'll try to get some answer internally but I don't expect much (nor fast)... 

If it doesn't work out, you can still outline the general idea, and we'll take it from there. There are no copyright issues from simply talking about ideas :-)

Or, you could write a blog post, in case of which the content belongs to you and we don't have to transfer it. That might be even better, because we won't need to review formatting, etc...

well, actually I changed the way migration are triggered to have them done from a main method. It feels closer from "real life" to me, yet it makes the example a bit more complex. Is still fine for you ?

Ehm, I don't know :-)

Still, while working on it and thinking how we currently do migrations at work, with sql files, I felt like the approach taken in the documentation was a bit "incomplete": each time a new migration is added, the model is generated again and then one should look for compilation error. Don't get me wrong, it's already pretty and for sure needed. Yet the complete disparition of the previous model bothers me. At work we do like this and from times to times it bits us back. How was it before? Why did we change this stuff this way? Is it really the correct name for the old Foo columns?

That's true. It would be useful to maintain versions of the schema also in the "current" Java code, not only in version control. Sure, version controlling generated sources is one way to tackle the problem (there are some discussions around that on this list). But having all versions at runtime in some form might also be useful. We have a pending feature request for this, but haven't thought this through yet:

Well, I'm unsure about annotation there. What if I deleted some columns, what to annotate? And changes could be subtler, like some type change, and then how to reflect this ? What if multiple changes were done over time ?

You're right.

Not so fast ;)

Actually we have the issue currently in our production code : changes over time are only visible in the migration files.
Which has two main drawbacks :
- migrations aren't typesafe (and in 10+ years of ongoing dev, typo were made, actually they happen something like twice a year)
- in case of issue, figuring out what was before isn't always easy (especially when some migration actions were done, like altering data to fit into the new schema)

Yes, the migration is only a delta. Having actual snapshots is much better. That's what you get when you check in jOOQ generated code, but that still doesn't give you access to more than one version at a time.
 
Interesting, could you give an example about this subclassing that you're doing?

Well, that's pretty simple, for example:
package org.jooq.example.model.tables;

public class Author extends org.jooq.example.model.previous.v3.tables.Author {

}

Hmm, I see. That's certainly an interesting approach, although I suspect that you will be missing out on a couple of features, then. For instance, what happens if you want to reference or alias such a table? You'll get a reference to o.j.e.m.p.v3.tables.Author:

AUTHOR.as("t");

Also, the UpdatableRecord in Author is o.j.e.m.p.v3.tables.records.AuthorRecord, not a subtype of it. In addition to this, we might add formal support for table inheritance some time in the future (Oracle / PostgreSQL ORDBMS feature). In that case, we'll make further assumptions about inheritance in generated code. This might not apply to you directly, but the changes that will be required may.

In the other thread that we're having in parallel, I was discussing why it might be a good idea (from our point of view) to make all generated classes "final":

The assumptions that you make right now when you subtype Author may break any time in a future release, even in a patch release, because you implicitly depend on internal API. It certainly works right now, but I'd be wary of this solution, and had better solved it more thoroughly, with an out-of-the-box feature.

I'm thinking of doing it for every stuff from the generated model which is used at least once outside of the migration steps.

So when a new migration is done, I simply have to change the extend to v4 and then typesafety kicks in and tells me of any issue.

How about actually referencing v4 from all of your code, and simply search-replace package names upon migration? The effect would be the same, although the diff would be much bigger. Another option: Generate a "latest" package and always refer to that from code, adding version numbers only to historic versions.

Actually while doing it I was wondering about the verbosity of this approach. I end up with all tables/sequences/whatnot defined whereas only the delta really matters. That's why I'm unsure about it. On the other hand, type safe migration rocks :)

Yep. See my suggestions above that are limited to package names instead.

Specifically, if you find anything missing from jOOQ's DDL support, I'd love to know - there's still lots of room for improvement!

Well, one little issue was the lack of unique support at table creation, I had to add the constraint later on:
        create.createTable(AUTHOR)
                .column(AUTHOR.ID, SQLDataType.INTEGER.nullable(false))
                .column(AUTHOR.FIRST_NAME, SQLDataType.VARCHAR.length(50))
                .column(AUTHOR.LAST_NAME, SQLDataType.VARCHAR.length(50).nullable(false))
                .column(AUTHOR.DATE_OF_BIRTH, SQLDataType.DATE)
                .column(AUTHOR.YEAR_OF_BIRTH, SQLDataType.INTEGER)
                .column(AUTHOR.ADDRESS, SQLDataType.VARCHAR.length(50))
                .execute();

        create.alterTable(AUTHOR).add(constraint("PK_T_AUTHOR").primaryKey(AUTHOR.ID)).execute();

Yes, that would indeed be useful. It's a pending feature request:
 
On the way I spotted as well that the constraint name isn't put in the generated stuff (at least as far as I've seen), which I guess could be an issue sometime (when changing it mainly, by dropping/create). Having it as well in the generated class would be sweet.

You're right, I had missed this feature too, in the past, but forgot to create a feature request for it. Here it is, now:

Let me know if you come to think of another missing feature.
Best,
Lukas

joseph...@gmail.com

unread,
Jan 6, 2016, 7:04:14 PM1/6/16
to jOOQ User Group
Hi Lukas

Or, you could write a blog post, in case of which the content belongs to you and we don't have to transfer it. That might be even better, because we won't need to review formatting, etc...

let's see this matter when we're done ^^ Latter always sounds good isn't it ? ;)

well, actually I changed the way migration are triggered to have them done from a main method. It feels closer from "real life" to me, yet it makes the example a bit more complex. Is still fine for you ?

Ehm, I don't know :-)

Ok, there you're :
https://github.com/cluelessjoe/jOOQ-java-flyway-example
It's pretty rough around the edges, and not only the edges ^^ Hopefully you'll still manage to make sense out of it.

So, what do we have there ? An ApplicationRunner as the production entry point, starting an Application, in which it all starts by some migrations. These migrations are written in Java.

As I'm not a big fan of maven install's time stuff generation, I also switched to Java model generation in the test resources, namely in ModelGenerator

No model versioning there, and actually I encountered a funny issue : inserting data actually requires the model to be there already AFAIK. So actually I'll have to add an extra migration step : in the third I update the schema, then generate the model, then in a fourth step I can data into the DB. Unless I'm missing something for sure.

Hmm, I see. That's certainly an interesting approach, although I suspect that you will be missing out on a couple of features, then. For instance, what happens if you want to reference or alias such a table? You'll get a reference to o.j.e.m.p.v3.tables.Author:

Well, you convinced me: I switched to copy twice the latest generated model, which should resolved the subclassing issue. Fine for you ?


In the other thread that we're having in parallel, I was discussing why it might be a good idea (from our point of view) to make all generated classes "final":

The assumptions that you make right now when you subtype Author may break any time in a future release, even in a patch release, because you implicitly depend on internal API. It certainly works right now, but I'd be wary of this solution, and had better solved it more thoroughly, with an out-of-the-box feature.

fine for me too (now)


How about actually referencing v4 from all of your code, and simply search-replace package names upon migration?

Feels weak to me. What about code being written in feature branches at the same time? Later merge and the likes? Code in projects depending on the current one I can't reach ? That could be pretty nasty. But keep on reading, another way poped up in my mind ;)
 
The effect would be the same, although the diff would be much bigger. Another option: Generate a "latest" package and always refer to that from code, adding version numbers only to historic versions.

actually when thinking more about it, in real life I would do it through different maven projects. One for the migration only, one for business code and one for running the app. I would put the models with version in the migration project and the current one in the application code. Then I would assemble all this in the "application" project. So no risk whatsoever of using some older model in business code. Sounds pretty strong to me.
 

Let me know if you come to think of another missing feature.

That's all for now, but I'm only prototyping, no real production code yet.

By the way, a side question: do you speak at conference ? I'm actually member of the ElsassJug, in Strasbourg, so next door to you so to say. If ever you had something to say/share, I would happily organize an event for you. What do you think of it ?

++
joseph

Lukas Eder

unread,
Jan 17, 2016, 3:57:19 PM1/17/16
to jooq...@googlegroups.com
Hi Joseph,

I'm sorry for the delay.

2016-01-07 1:04 GMT+01:00 <joseph...@gmail.com>:
Ok, there you're :
https://github.com/cluelessjoe/jOOQ-java-flyway-example
It's pretty rough around the edges, and not only the edges ^^ Hopefully you'll still manage to make sense out of it.

So, what do we have there ? An ApplicationRunner as the production entry point, starting an Application, in which it all starts by some migrations. These migrations are written in Java.

As I'm not a big fan of maven install's time stuff generation, I also switched to Java model generation in the test resources, namely in ModelGenerator

No model versioning there, and actually I encountered a funny issue : inserting data actually requires the model to be there already AFAIK. So actually I'll have to add an extra migration step : in the third I update the schema, then generate the model, then in a fourth step I can data into the DB. Unless I'm missing something for sure.

Interesting approach, this does look promising, but I can see that Maven's different phases can get in the way of getting this right. Obviously, as the schema migration is related to the jOOQ code generation, which is again a pre-requisite for parts of the schema migration, things can get hairy rather quickly. This is not an issue with text-based migrations, obviously, as random statements can be put in random order in these text files.

With type safe jOOQ / Flyway migrations, I suspect that migrations would need to be split into individual Maven modules, which makes me wonder, if this doesn't produce too much overhead (at least with Maven)?

In any case, I've also registered a feature request to add an example to the jOOQ repository and manual. Just as a reminder:

Let me know if you come to think of another missing feature.

That's all for now, but I'm only prototyping, no real production code yet.

OK, let me know if you have anything further to present. I'm very curious about your progress.
 
By the way, a side question: do you speak at conference ? I'm actually member of the ElsassJug, in Strasbourg, so next door to you so to say. If ever you had something to say/share, I would happily organize an event for you. What do you think of it ?

Sure, I'm always open for conference talks. Strasbourg is indeed not too far from Zurich. I'll contact you off-list for this.

Best Regards,
Lukas

joseph...@gmail.com

unread,
Jan 20, 2016, 6:35:59 PM1/20/16
to jOOQ User Group
Hi Lukas

I was thinking back on the topic and something "hit" me.

What if the DDL was distinct from the generated POJO?

For example, let's say I could write something close from :

Table table = create.createTable("AUTHOR")
                .addColumn("ID", SQLDataType.INTEGER.nullable(false))
                .addCumn("FIRST_NAME", SQLDataType.VARCHAR.length(50))
                .column("ADDRESS", SQLDataType.VARCHAR.length(50))
                .build();

Then maybe I could do something like ModelUpdater.apply(table);

Then, later on, maybe I could something close to:
Table tablev1 = table.change().removeColumn("ID").addColumn("IDENTIFIER", SQLDataType.INTEGER.nullable(false)).build();

And then:
Schema schemaV1 = Schema.of(tablev1);
Schema changes = Schema.diff().from(Schema.of(table)).to(schemaV1);
ModelUpdater.apply(changes);

And then maybe I could run something like:
GenerationTool.generate(configuration, schemaV1);

What's the big deal?

Well, currently DSLContext.createTable() can just be used to directly act on the database.

The idea here is to add a Schema in between, which could be used as well for generating stuff in the database but also kept as the meta model.

Then later on it could be used as a starting for the updated schema. And for sure it could be used as well to generate POJO. These nasty Pojos would then be purely "record" description, the schema/meta model being kept to describe the model.

Actually, I was thinking that such a "schema", with collections of tables, procedures and what's not in it, could make easier migration betweens schemas (in the case of generated pojos having as well, hard coded, the model definition).

Am I definitely too tired and it's all crap or does it make sense somehow ?


Back to the ongoing discussion, I wouldn't go for Maven: control of when generation happens it too low IMHO. Actions should happen not at install time but on startup (for the db migration work) or on demand for the pojo part.

I guess I'll some examples at some point (with 3modules : migration, domain and application).

thanks again :)

best
joseph

joseph...@gmail.com

unread,
Jan 21, 2016, 6:15:09 PM1/21/16
to jOOQ User Group, joseph...@gmail.com
Hi

Does the above reply make any sense? Actually, I fear not, and I fear even more my previous idea wasn't this great...

Actually I've forgotten to add something which I had in mind: inside this metamodel DSL, I would actually use collections, for tables and the likes. Thus I would expect comparisons between meta models to be easier to achieve. On the other hand, the generated metamodel is quite sweet as well, so there I would actually wrap the metamodel pojo around the above metamodel, for example something along the lines:
class Foo implements Table{

private final Table table = new TableBuilder("Foo").addColumn("ID", SQLDataType.INTEGER.nullable(
false))...().build();

public Column<Integer> getID(){
return table.getColumn("ID");
}
}

Eager to see if it's only nonsense :)

best
joseph

Lukas Eder

unread,
Jan 25, 2016, 11:25:39 AM1/25/16
to jooq...@googlegroups.com, joseph...@gmail.com
Hello Joseph,

I like the way this discussion is developing. Essentially, what you're suggesting is to have an abstract metadata representation of any form, and then the possibility of calculating the diff between two versions, possibly including all the DDL statements that are required to transition between the two.

I had thought about this before:

But it's a lot of work, especially if this should work for all the 21 currently supported RDBMS :)

I don't think that the best way to represent this meta model would be via CREATE TABLE statements. The semantics of concrete DDL commands is perhaps too specific to the creation of the tables. Any "command" like "removeColumn" would feel unnatural.

Also, the existing generated code has been designed specifically to be used in jOOQ's DSL to express queries. It is far from being optimised for use-cases as the one you're describing.

While jOOQ's main feature is its close-to-SQL-DSL, I think that any meta model that is used for these kinds of things should be XML based (e.g. this existing schema: http://www.jooq.org/xsd/jooq-meta-3.5.4.xsd). A potential model manipulation API / DSL would just wrap the underlying XML data structure.

But again, while these ideas seem interesting, they're lots of work. I don't think that project will be on our roadmap any time soon.

Best Regards,
Lukas

--
Reply all
Reply to author
Forward
0 new messages