Migration vs Seed when you need to manipulate data

4,700 views
Skip to first unread message

Tianwen Chen

unread,
Sep 22, 2011, 9:05:08 PM9/22/11
to Ruby or Rails Oceania, leoliang...@hotmail.com
Hi everyone,

I just want to raise a discussion as titled and try to find out some
best practices.

I personally choose migration over seed only when we need to make
changes after the app is online.

From what I can see, the purpose of seed is when you have a completely
new database, you can populate essential data for the app to use. The
purpose of migration is when you need to make changes to the database
schema, or data which people don't suggest to.

Let's now think about the following scenarios:
1. Initial stage of the app
Apparently, you should use seed in here instead of migration.
2. The app is already in production running, and you need to populate
more data for some of the tables. For example, need to add more
categories for products.
Here, you can use either seed or migration. But for seed, you need
to identify the existing record in the database in case you might
create duplicate records.
3. The app is already in production running, and you need to remove
some data in the database. For example, the client wants to remove the
products and as well as their categories.
Here, you will prefer migration over seed. As seed shouldn't be
used to remove data from literal.

Let's now take a look at their benefits and drawbacks:

Benefits of using seed:
1. separate the obligation of data populating from migration at
initial stage, place the thing where it should be

Drawbacks of using seed:
1. no timestamp, you can't trace the history when changes were made
2. have to prevent inserting duplicate records when you want to insert
new record
2. no rollback as you just can't
3. it is not included in the db:migrate rake task, therefore, you have
to run the db:seed every time after the deployment until you automate
this process in capistrano.
PS: by using seed_fu or other gems, you may now take off the second
drawback.

Benefits of using migration:
1. timestamp is provided
2. you can rollback the changes as you want
3. you can either insert, update, and delete data

Drawbacks of using migration:
1. should separate the data changes and database schema changes.
(Here, I want to say, even if I put the data manipulation in db
migrate, there is no harm of doing this. If you want to argue this is
messy, how does db migration doesn't look messy to you, the reason we
want to have db migration is we wish to trace when we did the changes
and whether we can revert. This is what seed not capable for.)

Welcome to discuss:)

Kind Regards,

Tian



Tianwen Chen

unread,
Sep 22, 2011, 9:27:06 PM9/22/11
to Ruby or Rails Oceania
And in summary, the best practice I reckon is:

Use seed at initial stage. Use migration after initial stage.

Tian

Anthony Richardson

unread,
Sep 22, 2011, 9:39:38 PM9/22/11
to rails-...@googlegroups.com
I use migration to transform data as part of schema changes. (e.g creating default values for new columns, or copying data from one table to another if it is being broken out in to a new table).
I use rake task to change data if part of a business change. I ensure the rake task is idempotent.

Cheers,

Anthony

--
You received this message because you are subscribed to the Google Groups "Ruby or Rails Oceania" group.
To post to this group, send email to rails-...@googlegroups.com.
To unsubscribe from this group, send email to rails-oceani...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/rails-oceania?hl=en.


Chris Rode

unread,
Sep 22, 2011, 10:18:14 PM9/22/11
to rails-...@googlegroups.com
Personally I have never been comfortable with using seed at all, I am of the opinion that all data insertions and alterations should happen in migrations. I agree that the generally accepted use case for seed is for initial data load, however I would argue that what constitutes "initial data load" changes in most applications, running the risk of having the seed quickly become out of date.

Take the example of look-up data. When first creating the lookup table (in a migration), you then also create the seed data to populate it. This is then pushed to the development branch and the rest of the team gets that change.

Now, before the application is ready for production release, the lookup table and data undergoes several changes. We now have 2 choices:

1. We change the seed so that it incorporates the new changes, wipe the development databases and start again. OR
2. As discussed above with data changes, we simply use the migrations to translate the data across so we have a consistent history of the database over the lifetime of the project.

In my opinion the 2nd is much better, it minimises the file changes on each schema change, thus minimising the capacity for error.

The 2nd also brings in a problem though.. the seeds are no longer relevant to run on initial deployment.. they need to be run at a specific step in the migration chain. I have not used seed_fu so I am not sure if it caters for this eventuality, however a cursory glance at the github page suggests not. Fortunately, though we already have something to do that: Migrations

In short seeds seem superfluous to needs in a successful project to me. I look forward to hearing everyones views

Chris Berkhout

unread,
Sep 22, 2011, 10:37:51 PM9/22/11
to rails-...@googlegroups.com
Ideally, migrations will be able to transform the database structure
(and data added at any stage along the way) from nothing to the final
state. In this case, seed data could go in migrations.

However, I think it's not uncommon to have migrations (especially for
data) that are dependent on model code, which itself can change. In
that case, migrations still provide a valuable service in being able
to roll back and forward between recent versions of the database. You
probably didn't want to deploy a new instance by running all
migrations anyway.

That's where the schema comes in.

Schema + seeds should == a fresh database, ready to use.

I use migrations for structure and data changes, and keep them
isolated from models wherever that can be done easily. I depend on the
schema + seeds always producing a fresh, working DB.

I think that's usually a good way to go. I'd reconsider the approach
if I had some (rare) application where data is more critical, and a
"fresh" DB isn't useful.

Cheers,
Chris

Mikel Lindsaar

unread,
Sep 22, 2011, 11:02:06 PM9/22/11
to rails-...@googlegroups.com
We use seed data as the barebones you need to get an application up and running for development.

Using seed files for "initial app load" are not so good, or at least, get used once. Because once you have an app in production, you will never do an initial app load again, you'll be restoring from a backup.

Also, using seeds to improve or change data is not good because they are not versioned. You are better off doing this in a migration, and then adjusting your seed file so it works for the new developer coming on board.

For example, getting a new developer on board can be a pain, so we always do the following on our apps:

The README contains "this is what you need to do to get the app running" for a new developer. This includes where the official git repository is, what database is needed, any external tools (sphinx, redis etc) and installation instructions for those, any other external dependencies (s3, reporting and the like) as well as some general information about the production environment (where it is hosted and what monitoring services are on it, like newrelic.com and stillalive.com).

Note, we don't store production passwords in that README, we put them in our encrypted vault that we use for our client and application secrets as part of our Sentinel service.

Finally, all config/*.yml files are saved as config/*.example.yml with the basics of what you need for development. You copy these over to get into development quickly.

So with all of the above done, the new developer reads that to get into development they need to:

rake db:schema:load
rake db:seed

And they have a usable development environment with all the required logins already setup.

My 2 cents.


Mikel Lindsaar
http://rubyx.com/
http://lindsaar.net/

Xavier Shay

unread,
Sep 25, 2011, 11:55:39 AM9/25/11
to rails-...@googlegroups.com

On 22/09/11 6:05 PM, Tianwen Chen wrote:
> Hi everyone,
>
> I just want to raise a discussion as titled and try to find out some
> best practices.

I feel there are actually three levels of "seed data" in a lot of apps,
each a superset of the previous:
1) Bare minimum required data for tests to run
2) Seed data for production
3) Development data

The task to load production should be idempotent, but you can get away
with the others not being (just blow away the DB, load).

Ideally your production task is smart enough to deal with changing data
and can auto-migrate it for you.

Xav

Reply all
Reply to author
Forward
0 new messages