Fixtures for Dummies?

Tim Chase

unread,

May 11, 2007, 3:57:33 PM5/11/07

to django...@googlegroups.com

I'm having some difficulty wrapping my head around Django's
fixtures and am looking for some good ground-up documentation.
Most of the documentation I've stumbled across seems to describe
fixtures as if the reader already knows all about them, where to
stash them, how to create them (either automated'ly or by-hand),
and where to stash them.

My understanding is that the initial_data fixture is _the_ way to
load, well, initial data into the DB. I've got some CSV files
that would make this quite handy (though they'd be easy to
transform with a little sed/awk/vim into JSON or SQL "INSERT"
statements, or whatever format is easiest to understand/work-with).

I've gathered that one can create {projname}/{appname}/fixtures/
directories. Does some sacred-named file go in this directory?
According to

http://www.djangoproject.com/documentation/django-admin/#loaddata-fixture-fixture

"each fixture has a unique name", but I'm missing the naming
conventions if there are any. My guess would be (and given
Django's sensible design, guesses are often fairly close to being
right) that one would create files here named after one's models,
so if you had a projname.appname.models.MyModel one would create
{projname}/{appname}/fixtures/mymodel.{json,py,sql,txt,csv?} file
and populate it with the associated'ly formatted data?

I'm also somewhat confused about the interplay between fixtures
and "initial SQL":

www.djangoproject.com/documentation/model-api#providing-initial-sql-data

Is one better than the other? Are they complementary? Used for
different purposes? (perhaps "Initial SQL Data" for globally
initial data, and Fixtures for testing data?)

Is there some sort of _Django Fixtures for Dummies_ writeup that
I've missed?

Thanks,

-tim

Russell Keith-Magee

unread,

May 11, 2007, 11:13:37 PM5/11/07

to django...@googlegroups.com

On 5/12/07, Tim Chase <django...@tim.thechases.com> wrote:
>
> I'm having some difficulty wrapping my head around Django's
> fixtures and am looking for some good ground-up documentation.
> Most of the documentation I've stumbled across seems to describe
> fixtures as if the reader already knows all about them, where to
> stash them, how to create them (either automated'ly or by-hand),
> and where to stash them.

Thanks for the feedback. I agree that we could improve on this area.
There are little bits of fixture-related details spread all over the
docs (django-admin, serialization, testing) - there is definitely room
for a single dedicated document.

> My understanding is that the initial_data fixture is _the_ way to
> load, well, initial data into the DB. I've got some CSV files
> that would make this quite handy (though they'd be easy to
> transform with a little sed/awk/vim into JSON or SQL "INSERT"
> statements, or whatever format is easiest to understand/work-with).

You are correct - initial_data fixtures is the preferred mechanism for
loading data.

We don't currently have support for CSV as a fixture format; however,
if you feel enthused, you should be able to write a CSV serialization
module. Although it sounds daunting, it actually shouldn't be that
hard - you just need to be able to get the CSV into a Python dict with
a particular structure.

If you don't want to go down that path, we do have built in support
for JSON, XML, and YAML; so you will need to get your CSV data into
one of these formats.

> I've gathered that one can create {projname}/{appname}/fixtures/
> directories. Does some sacred-named file go in this directory?

Yes, you can create those directories. You can also put a FIXTURE_DIRS
setting in your settings file, and add any other fixture directory to
the search path.

There is no sacred-name file, though. More on this later.

> According to
>
> http://www.djangoproject.com/documentation/django-admin/#loaddata-fixture-fixture
>
> "each fixture has a unique name", but I'm missing the naming
> conventions if there are any. My guess would be (and given
> Django's sensible design, guesses are often fairly close to being
> right) that one would create files here named after one's models,
> so if you had a projname.appname.models.MyModel one would create
> {projname}/{appname}/fixtures/mymodel.{json,py,sql,txt,csv?} file
> and populate it with the associated'ly formatted data?

The fixture name is a bit like a tag. You can produce multiple files
called 'XXX.json' (or whatever format you want), and put them
throughout your application, in the various fixture directories. Then,
if you sad 'loaddata XXX', Django will find and load _all_ the
fixtures named XXX, wherever they are stored in your application. The
fixture directories under applications are the default contents of the
search path; the contents of FIXTURE_DIRS is added to the search path.
If any of these directories contain a fixture file called XXX, that
fixture will be loaded.

Naming fixtures after models is certainly one option - however, there
isn't anything magic about fixtures named after models. The only magic
name is 'initial_data', because that fixture (however many files it is
distributed over) is loaded during syncdb.

General advice would be to name your fixtures after the task the data
contained performs. For example:
- 'initial_data' is all the data your application _must_ have to
operate - default table entries, and the like.
- 'sample_data' could be a selection of sample data (entries in a blog
for example) that allow you to test that your application works
- 'weird_cases' could be a selection of sample data that are edge
cases - e.g., a blog with no entries, or a blog entry with no text

I.e., functional, rather than nominative (if you catch my drift).

> I'm also somewhat confused about the interplay between fixtures
> and "initial SQL":
>
> www.djangoproject.com/documentation/model-api#providing-initial-sql-data
>
> Is one better than the other? Are they complementary? Used for
> different purposes? (perhaps "Initial SQL Data" for globally
> initial data, and Fixtures for testing data?)

Initial SQL was the old way of getting data into a project. The
downside is that SQL varies slightly between database backends, so a
Postgres initial SQL block won't usually run on MySQL, and so on. On
top of that, SQL isn't a particularly natural format for describing
raw data. Fixtures let you describe "just the data', and get that data
into you app, regardless of your database backend.

However, Initial SQL still has uses. In particular, if you want to use
some nifty database features, such as triggers, or setting up special
indexes, or modifying the tables that Django produces by default, you
can use initial SQL to modify the tables after they are created. And,
if you have an old app, you can continue to use Initial SQL to load
data into your application.

> Is there some sort of _Django Fixtures for Dummies_ writeup that
> I've missed?

Not so far. As noted before, it is probably worth writing. Feel free
to open a feature request.

Side note: If you want to make a _really_ valuable contribution to
Django, keep notes of everything you find confusing as a newcomer to
fixtures, and write the document yourself. I'd be happy to include it
into the Django official docs so that we can fill this particular
documentation hole. At the very least, please provide a list of things
you think should be addressed by such a document, so that when someone
(probably me) gets around to writing those docs, they know what isn't
obvious.

Yours,
Russ Magee %-)

Christian M Hoeppner

unread,

May 12, 2007, 7:36:25 AM5/12/07

to django...@googlegroups.com

> If you don't want to go down that path, we do have built in support
> for JSON, XML, and YAML; so you will need to get your CSV data into
> one of these formats.

I've been using YAML for this purpose before, when using Symfony, and it's a
quite natural way to describe data, only tied by indentation, which should be
a familiar tie for any pythoner.

My question about all of this is regarding the format of the actual file. For
example, in Symfony, you used:

environment:
table:
field: value
field2: value2

You get the deal. Only the fields described are filled. The rest, if non-null,
are filled with either default values or null.

The environment is a neat thing I've been missing in Django so far. Depending
on the way you call your app, you get a different set of settings loaded.
Neat for development/production/staging/else.

Is there some sort of structure, enforced in order to make the file
django-readable? I'd be writing them by hand to make up testing data.

Also, is the fixtures system capable of being used as data-transfer system?
For example, after developing, "uploading" my data to the server. It's just
my opinion, but I find it easier to execute loaddata from my project, than
remembering all the database access data and executing some sort of
administration script inside it to load some SQL I might have dumped from my
dev database.

Greetz,
Chris Hoeppner
www.pixware.org

Russell Keith-Magee

unread,

May 12, 2007, 8:31:35 AM5/12/07

to django...@googlegroups.com

On 5/12/07, Christian M Hoeppner <ch...@pixware.org> wrote:
>
> The environment is a neat thing I've been missing in Django so far. Depending
> on the way you call your app, you get a different set of settings loaded.
> Neat for development/production/staging/else.

I don't know much about Symfony, but as I understand what you are
saying, the Django approach to achieving this would be to have
multiple fixtures, one for each 'environment'.

> Is there some sort of structure, enforced in order to make the file
> django-readable? I'd be writing them by hand to make up testing data.

Your fixture file has to be valid YAML, and it has to follow a
particular structure, along the lines of:

- fields: {attr1: value, attr2: value }
model: app.model
pk: '1'

If you want a more elaborate example, use ./manage.py dumpdata --format=yaml

> Also, is the fixtures system capable of being used as data-transfer system?

Sure. One of the larger projects at the place I work uses fixtures for
exactly that purpose.

Yours,
Russ Magee %-)

Tim Chase

unread,

May 12, 2007, 8:46:56 AM5/12/07

to django...@googlegroups.com

> You are correct - initial_data fixtures is the preferred mechanism for
> loading data.

Okay...things are starting to make a bit more sense.

I'll post a few more questions and hopefully this thread can
become the foundation for the documentation you describe.

One of the questions coming to mind would involve
FileField/ImageField and their data in fixtures. Say I have an
PhoneNumberType model for an address-book app, and it has icons
for "Home", "Work", "Cell", "Fax", etc. How would one go about
using initial_data to push the images to the proper places in the
MEDIA dir, and then link their filenames to the associated field
in the fixture file?

Totally unrelated, to YAML files need a particular extension?
I've seen YAML files with both a .yaml and a .yml extension (darn
DOS 8.3 folks :) Digging through threads on the list, it seems
.yaml is the convention. However, I didn't see anything other
than an oblique reference to YAML on the djangoproject.com site.

> General advice would be to name your fixtures after the task the data
> contained performs. For example:
> - 'initial_data' is all the data your application _must_ have to
> operate - default table entries, and the like.
> - 'sample_data' could be a selection of sample data (entries in a blog
> for example) that allow you to test that your application works
> - 'weird_cases' could be a selection of sample data that are edge
> cases - e.g., a blog with no entries, or a blog entry with no text

This is quite helpful.

So I put files such as "initial_data.json" in my various
FIXTURE_DIRS of each app to load such data, and then can create
other such fixture-tag-named files (such as sample_data.json) for
testing. This then makes sense in the context of the testing
documentation about associating fixtures with certain tests. Not
all tests need sample data, or some tests might require
pathological test-case data that .

Is there any way to optionally break these fixture files down
into more manageable chunks? Such as per-model? It would be
handy to have

initial_data.json # multiple smaller data-sets
initial_data.colors.json # akin to the X11 rgb.txt
initial_data.makesmodels.json # every known vehicle make/model

etc. in a similar fashion to what was done with the initial-sql
stuff (for backend-specific SQL, attached to a given model).
Otherwise, for a given app, these could grow to an unwieldy size.
There might be problems with interplay and sequencing, so it
might not be a great idea, or at least one to use with caution.

Just a few ideas and questions.

Hopefully this thread can help build the foundation on some
deeper and more consolidated documentation in the future.

Thanks for taking the time to help shed light on it.

-tim

Christian M Hoeppner

unread,

May 12, 2007, 9:31:06 AM5/12/07

to django...@googlegroups.com

> I don't know much about Symfony, but as I understand what you are
> saying, the Django approach to achieving this would be to have
> multiple fixtures, one for each 'environment'.

This one was more of an off-topic comment. Environments in Symfony are about
settings, not about fixtures. You're able to set different settings for each
environment.

> If you want a more elaborate example, use ./manage.py dumpdata
> --format=yaml

What I get:

> [chris@homebox weblog]$ ./manage.py dumpdata --format=yaml
> Unknown serialization format: yaml
> Unable to serialize database: 'yaml'
> None
> [chris@homebox weblog]$

I've done a svn update to the trunk, just in case, and still...

Russell Keith-Magee

unread,

May 12, 2007, 10:26:46 AM5/12/07

to django...@googlegroups.com

On 5/12/07, Christian M Hoeppner <ch...@pixware.org> wrote:
>

> > I don't know much about Symfony, but as I understand what you are
> > saying, the Django approach to achieving this would be to have
> > multiple fixtures, one for each 'environment'.
> This one was more of an off-topic comment. Environments in Symfony are about
> settings, not about fixtures. You're able to set different settings for each
> environment.

Django doesn't have a specifically analogous concept, but there is a
way to get the same functionality.

Write a different settings file for each 'environment', and use the
--settings option on manage.py (or the DJANGO_SETTINGS_MODULE
environment variable if you're using mod_python/fcgi) to point to each
settings file.

If there are settings common to all environments (INSTALLED_APPS
should be fairly common), you can put the common settings into a file,
then put

from common_settings import *

at the top of each of your environment-specific settings files.

> > If you want a more elaborate example, use ./manage.py dumpdata
> > --format=yaml
> What I get:
>
> > [chris@homebox weblog]$ ./manage.py dumpdata --format=yaml
> > Unknown serialization format: yaml
> > Unable to serialize database: 'yaml'
> > None
> > [chris@homebox weblog]$
>
> I've done a svn update to the trunk, just in case, and still...

Ah - Sorry. I forgot to mention that you need to have pyyaml
(http://pyyaml.org/) installed. If it isn't the YAML serializer will
be disabled.

Yours,
Russ Magee %-)

Russell Keith-Magee

unread,

May 12, 2007, 10:36:50 AM5/12/07

to django...@googlegroups.com

On 5/12/07, Tim Chase <django...@tim.thechases.com> wrote:
>

> > You are correct - initial_data fixtures is the preferred mechanism for
> > loading data.
>
> Okay...things are starting to make a bit more sense.
>
> I'll post a few more questions and hopefully this thread can
> become the foundation for the documentation you describe.
>
> One of the questions coming to mind would involve
> FileField/ImageField and their data in fixtures. Say I have an
> PhoneNumberType model for an address-book app, and it has icons
> for "Home", "Work", "Cell", "Fax", etc. How would one go about
> using initial_data to push the images to the proper places in the
> MEDIA dir, and then link their filenames to the associated field
> in the fixture file?

Hrm. That's a little harder. The FileField/ImageFields are serialized
as the file names, but none of the raw data comes along for the ride.
If your MEDIA dir contains all the files you might require, you can
serialize/deserialize the database as much as you like; the File/Image
fields will keep pointing at the files.

However, this doesn't help you move an app from one site to another.
I'm open to any suggestions on how this could be acheived efficiently
(pickling, or otherwise serializing image data into the fixture
probably wouldn't be feasible).

> Totally unrelated, to YAML files need a particular extension?
> I've seen YAML files with both a .yaml and a .yml extension (darn
> DOS 8.3 folks :) Digging through threads on the list, it seems
> .yaml is the convention. However, I didn't see anything other
> than an oblique reference to YAML on the djangoproject.com site.

Django favours .yaml; the file extension on the fixture file must
match the name under which the serializer is registered (yaml, xml,
json, etc).

YAML isn't fully publicised because it's a relatively recent addition
(compared to the other serializers), and its not completely 'batteries
included' - you need to install PyYAML (http://pyyaml.org) for the
YAML serializer to operate. Hence, the docs are lagging behind a bit.

> So I put files such as "initial_data.json" in my various
> FIXTURE_DIRS of each app to load such data, and then can create
> other such fixture-tag-named files (such as sample_data.json) for
> testing. This then makes sense in the context of the testing
> documentation about associating fixtures with certain tests. Not
> all tests need sample data, or some tests might require
> pathological test-case data that .

Exactly.

> Is there any way to optionally break these fixture files down
> into more manageable chunks? Such as per-model?

At present, no. However, it is an interesting suggestion. Feel free to
open an enhancement ticket (and/or work up a patch!)

> There might be problems with interplay and sequencing, so it
> might not be a great idea, or at least one to use with caution.

There shouldn't be a problem with sequencing - all fixtures are
installed as a single transaction ,so any ordering issues should be
sorted out. The only sequencing problem I can forsee is if two
separate fixture files define an object with the same primary key. If
this happens, the last one loaded will win, but there won't be an
error.

Yours,
Russ Magee %-)

Reply all

Reply to author

Forward