How to pre-load model data?

249 views
Skip to first unread message

Mark Phillips

unread,
Sep 8, 2017, 12:57:58 AM9/8/17
to django users
I have a several classes in models.py, and I would like to load some data into these models (or tables in the db), as well as add more data over time through the admin panel. 

I googled for some ideas on how to pre-populate the db tables, and found references to RunSQL  (https://docs.djangoproject.com/en/1.11/ref/migration-operations/#django.db.migrations.operations.RunSQL) to execute some SQL commands during a migration (http://blog.endpoint.com/2016/09/executing-custom-sql-in-django-migration.html). 

Is this the "correct" way to accomplish my goal of populating the data tables with some data, or is there another way the more closely ties the data to the models?

Thanks!

Mark

Bernd Wechner

unread,
Sep 8, 2017, 3:07:54 AM9/8/17
to django...@googlegroups.com, Mark Phillips
Mark,

It depends a bit on where you're getting your data from. One of the safest ways would to build a fixture, and use "manage.py loaddata". The fixture can be in json, xml or yaml formats.

If it's in some other format, then there are other ways of course too. There are always a few different ways you can anything I think. But the way I went about preloading from a CSV source once is I just wrote an importer. Really small python function that just reads the CSV and uses the django ORM to inject it into the database ...

I'm not sure what the advantage of those SQL methods is over using the ORM itself mind you. I wouldn't go there (or better said,  haven't yet ;-).

Regards,

Bernd.

Mark Phillips wrote:
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAEqej2NDE8gqBEoSym1gA5DLD7Ygjo6MbN66mdwWDQ_Xd0%3DudA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


James Schneider

unread,
Sep 8, 2017, 4:45:54 AM9/8/17
to django...@googlegroups.com
Possibly, although for purely adding data that can be coerced through existing models, RunSQL isn't your best choice.

Is this data something that needs to be inserted repeatedly across multiple installations, or is it data used specifically by a single installed instance? Would the data added in the future rely on the data you are currently inserting? 

If these are one-off data loads, I would look at creating a manage.py management command to parse the data and insert it in to the Django database using model forms. At a high level, you would deserialize the data from your source, feed it to a form, and use the form to save that row of data. I suppose you could use a formset to validate/save multiple rows in a single action, but error handling become problematic and isn't any more efficient on the SQL side.

Form classes offer the benefit of validation of data in addition to saving the model instance. This also means that the external data would be validated the same way that an end-user entry would be, if that is important to you. Model forms also handle any M2M relationships transparently.

If you are 150% sure that the data you have is clean, then you may be able to jump straight in to the ORM. 



If this is data that you are passing along to those who install your app, a migration is the way to go. You might even mix the two approaches and create the management command, then simply run the management command from the migration. That strategy would be the most extensible if the command can be run manually for other data provided by the customer.

-James

callsa...@gmail.com

unread,
Sep 8, 2017, 9:04:19 AM9/8/17
to Django users
hi,
think about problem, you mean initial data? see this(https://docs.djangoproject.com/en/1.11/howto/initial-data/)
django way suggestion: 

Mark Phillips

unread,
Sep 8, 2017, 3:13:17 PM9/8/17
to django users
Thanks for all your replies and helpful suggestions. To answer some of your questions - 

1. Where is the data coming from? - It is textual data in a spec/spreadsheet. The data is a lot of meta data (name and one or more values) for describing attributes of scanned documents. There is what I would call a "base set" of data, which means it is what we can think of now based on a review of a representative set of documents. However, as new documents are imported into the application, there may be other types of metadata with multiple values that will have to be created on the fly (hence the need for the admin forms to add meta date in the future). The import function I need is a one-off function. We need it as we develop the models and test them against various documents. Sometimes it is easier to just delete the database and rebuild it when we are developing the app than to back out certain migrations. So a simple way to populate the metadata for development purposes, and then one time when we go into production is what we are looking for. Currently, we have 24 metadata names, and each one can have one to 20 values.

2. The manage.py loaddata is an appealing option. However, it will take further effort to convert the spreadsheet data to any of the formats for this option. I think a csv file reader is more suitable for our purposes. 

3. I will have to think about the validation concepts. These are simple name-value pairs, so it is not clear what I am validating against, unless it is detecting duplicates.

4. I am also looking into a spreadsheet -> csv file > mysql load data as perhaps the easiest way to complete this project. The spreadsheet is easy to create and update the metadata with the least effort, and then it is pretty automatic from the spreadsheet to the database. 

I am open to other suggestions!

Thanks!

Mark


--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.

To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.

Jani Tiainen

unread,
Sep 8, 2017, 6:12:35 PM9/8/17
to django...@googlegroups.com
Hi.

We actually implemented spreadsheet importer for one of our clients.

It was relatively trivial. Just read the row from spreadsheet put it to modelform for validation and save.

There are pretty good libraries to read spreadsheets directly.

Mark Phillips

unread,
Sep 8, 2017, 8:37:34 PM9/8/17
to django users
Brilliant idea! Did not know these importers existed. 

Thanks!

Mark

James Schneider

unread,
Sep 9, 2017, 12:31:56 AM9/9/17
to django...@googlegroups.com


On Sep 8, 2017 12:12 PM, "Mark Phillips" <ma...@phillipsmarketing.biz> wrote:
Thanks for all your replies and helpful suggestions. To answer some of your questions - 

1. Where is the data coming from? - It is textual data in a spec/spreadsheet. The data is a lot of meta data (name and one or more values) for describing attributes of scanned documents. There is what I would call a "base set" of data, which means it is what we can think of now based on a review of a representative set of documents. However, as new documents are imported into the application, there may be other types of metadata with multiple values that will have to be created on the fly (hence the need for the admin forms to add meta date in the future). The import function I need is a one-off function. We need it as we develop the models and test them against various documents. Sometimes it is easier to just delete the database and rebuild it when we are developing the app than to back out certain migrations. So a simple way to populate the metadata for development purposes, and then one time when we go into production is what we are looking for. Currently, we have 24 metadata names, and each one can have one to 20 values.

I use a management command to do exactly that: drop the database, create the database, run migrations, and populate the database with a fair bit of data. I use FactoryBoy to generate hundreds of fake users and other objects upon reset.


If the files are only being used for development, and should not normally be loaded in to the database, then I would definitely recommend a separate management command rather than building the fixture/fake data loading process directly in to the migrations.



2. The manage.py loaddata is an appealing option. However, it will take further effort to convert the spreadsheet data to any of the formats for this option. I think a csv file reader is more suitable for our purposes. 

If it is simple data that is being some what auto generated, I'd consider writing a FactoryBoy definition so you can generate 1 or thousands of entries on demand.


3. I will have to think about the validation concepts. These are simple name-value pairs, so it is not clear what I am validating against, unless it is detecting duplicates.

Forms can do all sorts of validation. Duplicate detection of one of them. Value formatting and type checking is another. You may have some integer values that should only be within a certain range (ie a person can't be 29,000,000 years old, or maybe they can, who knows...). Forms are where input is filtered and massaged to match the business logic in your models. 

If you dump data directly in to models or the database, then you risk dumping in data that hasn't been evaluated by your application which may produce subtle bugs of the worst kind where things work in dev and break in prod because your input channel is different.


4. I am also looking into a spreadsheet -> csv file > mysql load data as perhaps the easiest way to complete this project. The spreadsheet is easy to create and update the metadata with the least effort, and then it is pretty automatic from the spreadsheet to the database. 

I am open to other suggestions!

I'd recommend FBoy and eliminate Excel entirely if I could. Otherwise an Excel parser would be my next choice. 

-James

Derek

unread,
Sep 9, 2017, 11:59:05 AM9/9/17
to Django users
"If you dump data directly in to models or the database, then you risk dumping in data that hasn't been evaluated by your application which may produce subtle bugs of the worst kind where things work in dev and break in prod because your input channel is different."

If this is happening its because you have poor validation on your model clean(s). You should NEVER just rely on a form to check data for you (at least beyond very basic sanitation); that logic properly belongs with the model.  We import direct from spreadsheets to models (note - NOT direct to the database) because we make use of such logic.

Mark Phillips

unread,
Sep 9, 2017, 12:05:18 PM9/9/17
to django users
James,

Thanks for your comments.

Unfortunately, the data is not auto generated, so FactoryBoy may not help much. As far as validation, the data is so simple that the only validation I have come up with is to insure the names of the metadata and metadata values are unique. The metadata and metadata values are used to describe documents imported into the application, so they need to be unique values. We have defined ~200 metadata names and values so far for the production system. The are the two models - 

class MetaData(models.Model):
    metadata_id = models.AutoField(primary_key = True)
    name = models.CharField('metadata name', max_length=200, unique=True)
    description = models.TextField('description')

    def __str__(self):
        return self.name
                   
class MetaDataValue(models.Model):
    metadata_id = models.ForeignKey(MetaData, on_delete=models.CASCADE,)
    value = models.CharField('value', max_length=200, unique=True)

    def __str__(self):
        return self.value

I could use FactoryBoy to generate a test data for these two models, but what I really need is a simple way to fill out these two tables in the database with production data. The same data could be used for testing purposes.

I have been playing with django-import-export. It is a great solution except it does not support InlineModelAdmin objects and material design (ie the import/export buttons don't appear in the admin pages). So I will look for another django import/exporter.

A spreadsheet seems the easiest way to keep a list of the metadata names and metadata values for now. I am also looking at a csv -> json converter to use with "manage.py loaddata".

Still open to other suggestions to simplify this problem!

Thanks again,

Mark

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.

James Schneider

unread,
Sep 9, 2017, 4:32:13 PM9/9/17
to django...@googlegroups.com


On Sep 9, 2017 8:59 AM, "Derek" <game...@gmail.com> wrote:
"If you dump data directly in to models or the database, then you risk dumping in data that hasn't been evaluated by your application which may produce subtle bugs of the worst kind where things work in dev and break in prod because your input channel is different."

If this is happening its because you have poor validation on your model clean(s). You should NEVER just rely on a form to check data for you (at least beyond very basic sanitation); that logic properly belongs with the model.  We import direct from spreadsheets to models (note - NOT direct to the database) because we make use of such logic.

That's not what I'm saying. Validation at the model level should apply to all instances of the model in any situation. Validation at the form level can be much more context specific. The two levels of validation are not mutually exclusive, rather they are complementary.

I'm not saying that there shouldn't be model validation. Quite the opposite, there absolutely should be. I'm saying that the import of the data should follow the same workflow as an end-user would enter it to ensure the data appears correct in the context that it is being entered where possible. 

I'm simple cases, a model is only implemented in a single context (in only one spot in the application and only via one form). Model and form validation would usually be the same. In advanced cases, the same model can be referenced via multiple pages, and each page may place a different set of restrictions on what data may be entered or how it is formatted. This is typical when a model is related to several other models.

Hypothetically, if you have a Widget model, it may have a long list of available colors to choose from in a choices= list. The model validation would be responsible for ensuring that the color selected is one of the available colors in the entire list of colors available.

However, Widgets may only be available in red and blue when paired with a FooBar model. The form for FooBarWidgets would then limit the color choices to just red and blue, and would validate that the color submitted is one of those. The model validation wouldn't complain, because red and blue are in the long list of colors available.

Importing the data directly to the model could potentially mean that a Widget paired with a FooBar can be set to green, which is invalid. The model validation wouldn't help in this case, because green is technically a valid color for widgets, but in the context of FooBars, it isn't.

That's why I mentioned that you need to be 150% confident in your data generation when importing via the ORM directly to avoid "impossible" data in your models, which is worse than missing data IMO.

-James
Reply all
Reply to author
Forward
0 new messages