Importing file to Models - Temporary store to allow confirming?

62 views
Skip to first unread message

Victor Hooi

unread,
Jun 26, 2011, 2:49:09 AM6/26/11
to django...@googlegroups.com
heya,

We have a CSV file that we are importing into a Django application, and then creating the appropriate models and relationships.

At the first page, we have a file upload form where the user selects a file.

We then parse the file, and return a second page showing them what would be created, any validation errors etc.

The user can then decide whether to proceed or not (or possibly to correct any areas on-screen).

What would be the best way of storing the temporary interim models, before it actually hits the database proper?

The CSV file will be fairly big, possibly around 200 Kb in size, and create several hundred models.

Should I store this in the database somewhere, and label those models "temporary"? It seems a bit heavy just for a confirm, and I'm not sure if it's appropriate use of the database. Or is there some way we could store it in Django sessions? Or any other way to do it?

Cheers,
Victor

Shawn Milochik

unread,
Jun 26, 2011, 2:36:32 PM6/26/11
to django...@googlegroups.com
If you're using Django 1.2 or higher you can take advantage of the
multi-database support by adding the 'using' kwarg to your model.save().
This will allow you to ensure that the model saves successfully (is
valid) in the 'holding' database, and move it to your 'live' database later.

You could add a field to the model indicating whether it's 'live' or
'pending,' load all the temp models as pending, then just flip the flag
as each one is "approved." That would front-load all the processing.

You can use a ModelForm to accept the CSV data (preferably in dict form,
from csv.DictReader) to validate the data, then just check for
is_valid() without saving anything to the database. You could then store
those dictionaries somewhere (such as MongoDB) for retrieval when you
actually want to save them to the database.

Each of these options has different advantages. I don't know about
performance. In any case, you may have to use asynchronous processing
(via ZeroMQ or django-celery) so the page can refresh before you Web
server's timeout.

Use whichever seems best for your needs, and maybe someone else has
another option. I'd prefer option #3, because it keeps all the temporary
data out of your production database and makes good use of the
validation properties of the ModelForm class.


Victor Hooi

unread,
Jun 28, 2011, 2:43:34 AM6/28/11
to django...@googlegroups.com

Shawn,

Thanks for the quick reply =).

If we go with the third approach, what advantages are there to persisting the models in MongoDB (or something similar like Redid.or Cassandra), as opposed to a traditional RDBMS? (I just want to get a good handle on the issues).

Furthermore, is there a particular reason you picked MongoDB over the other NoSQL solutions?

Also, in terms of actually persisting the temporary models to MongoDB, I can just use PyMongo and store the dicts themselves - and just user a nested dict to represent all the model instances from a given import file. Any issues with that approach?

Thanks for the tip about using asynchronous processing for the import file - I didn't think the file would be that big/complex to process, but I suppose it's probably the better approach to do it all asynchronously, instead of just hoping it won't grow larger I the future. In this case, I suppose I can just use Ajax on the page itself to check on the status of the queue?

Cheers,
Victor

Shawn Milochik

unread,
Jun 28, 2011, 10:18:27 AM6/28/11
to django...@googlegroups.com
On 06/28/2011 02:43 AM, Victor Hooi wrote:
>
> Shawn,
>
> Thanks for the quick reply =).
>
> If we go with the third approach, what advantages are there to
> persisting the models in MongoDB (or something similar like Redid.or
> Cassandra), as opposed to a traditional RDBMS? (I just want to get a
> good handle on the issues).
>
The schema-less nature will give you the ability to "just work."

> Furthermore, is there a particular reason you picked MongoDB over the
> other NoSQL solutions?
>

No. I heard about it on FLOSS Weekly, checked out, and I love it. Also,
pymongo is great and I'm a Python guy. I've read some comparisons among
the other "NoSQL" databases and some are better for some uses but
MongoDB is great for me.

> Also, in terms of actually persisting the temporary models to MongoDB,
> I can just use PyMongo and store the dicts themselves - and just user
> a nested dict to represent all the model instances from a given import
> file. Any issues with that approach?
>

No issues that I can see. If you're using pymongo then you can just
treat everything as a Python dictionary and MongoDB will consume it no
problem (assuming you stick to certain data types).

> Thanks for the tip about using asynchronous processing for the import
> file - I didn't think the file would be that big/complex to process,
> but I suppose it's probably the better approach to do it all
> asynchronously, instead of just hoping it won't grow larger I the
> future. In this case, I suppose I can just use Ajax on the page itself
> to check on the status of the queue?
>

Sure, although Comet would be better for this use-case. You could also
just e-mail the user from the background process after it's done with a
link to the URL for the next step. AJAX is easier than Comet (no service
to install or set up), and it probably wouldn't generate enough extra
bandwidth to really bother you.

> Cheers,
> Victor
>
> --
> You received this message because you are subscribed to the Google
> Groups "Django users" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/django-users/-/CvCjWNrV4pUJ.
> To post to this group, send email to django...@googlegroups.com.
> To unsubscribe from this group, send email to
> django-users...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-users?hl=en.

Reply all
Reply to author
Forward
0 new messages