I have a CSV file and have to import it in my Django website.
Say I have three models involved: category, sub_category and product.
### models.py ###
class Category(models.Model):
name = models.CharField(max_lenth=100)
class SubCategory(models.Model):
name = models.CharField(max_length=100)
parent_category = models.ForeignKey(Category)
class Product(models.Model):
name = models.CharField(max_length=200)
sub_category = models.ForeignKey(SubCategory)
##########
Say this is my CSV file (category, sub_category, product):
### file.csv ###
clothing man;trousers;levis 501
clothing woman;shirt;nice shirt
[...]
##########
I am not sure on which way to go. Do you have any hints or web
references to look at?
Thanks, Fabio.
--
Fabio Natali
It's not that tricky, is it?
Read the CSV file, split out the fields.
Get or create the category
Get or create the subcategory
Get or create the product
in code:
import csv
data = csv.reader(open('/path/to/csv', 'r'), delimiter=';')
for row in data:
category = Category.objects.get_or_create(name=row[0])
sub_category = SubCategory.objects.get_or_create(name=row[1],
defaults={'parent_category': category})
product = Product.objects.get_or_create(name=row[2],
defaults={'sub_category': sub_category})
http://docs.python.org/library/csv.html
Cheers
Tom
Hey Tom, that's very kind of you, so helpful and fast!
I'll use that in my real scenario (which is a bit more complicated).
I'll be back here soon, reporting success :-) or asking for more help!
Cheers!
--
Fabio Natali
It works like a charm! Thanks again Tom.
--
Fabio Natali
There are few potential problems with the cvs as used here.
Firstly, the file should be opened in binary mode. In Unix-based
systems, the binary mode is technically similar to text mode.
However, you may once observe problems when you move
the code to another environment (Windows).
Secondly, the opened file should always be closed -- especially
when building application (web) that may run for a long time.
You can do it like this:
...
f = open('/path/to/csv', 'rb')
data = csv.reader(f, delimiter=';')
for ...
...
f.close()
Or you can use the new Python construct "with".
P.
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django...@googlegroups.com.
To unsubscribe from this group, send email to django-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Hey Petr! Thank you so much, I immediately followed your advice. File is
closed at the end of the story.
Cheers, Fabio.
--
Fabio Natali
Hi Andre, I didn't forget tips you gave me in a previous thread about
Celery and DSE. Actually I've been studying them for a while. At the
moment things doesn't seem to heavy. I think I'll use them in future
develpment.
Thank you very much, Fabio.
--
Fabio Natali
Dear Anler, thank you for sharing your experience and your code. That's
very kind of you. I'll study it and ask you for questions.
--
Nathan McCorkle
Rochester Institute of Technology
College of Science, Biotechnology/Bioinformatics
Well I can count the lines in each file in a few seconds, so I think
the SQL stuff is slowing everything down (using postgres through
psycodb2)
>
> If it's only a few MB, I see little reason to go as far as to writing it in
> C. Unless you are performing the same import tens of thousands of times, and
> the overhead in Python adds up so much that you get problems.
>
> But, quite frankly, you'll max out MySQL INSERT performance before you max
> out Pythons performance lol - as long as you don't use the ORM for inserts
> :)
when you say 'as long as you don't use the ORM for inserts', do you
mean don't do:
currentDataset.comment="blah"
currentDataset.name="abc12"
currentDataset.relatedObject=otherCurrentObject.id
etc,etc?
Are you saying I should be doing all that in python, but using raw SQL
instead of the fancy python object-like way? like this:
https://docs.djangoproject.com/en/dev/topics/db/sql/#executing-custom-sql-directly
when you say 'as long as you don't use the ORM for inserts',
https://bitbucket.org/weholt/dse2
And yes, I'm the author of DSE ;-)
Regards,
Thomas
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To post to this group, send email to django...@googlegroups.com.
> To unsubscribe from this group, send email to
> django-users...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-users?hl=en.
--
Mvh/Best regards,
Thomas Weholt
http://www.weholt.org
OTOH, if (like the OP) you have a few small (<1k) CSV files to import,
then going through the ORM is easy, simple and quick to code. The fact
that you can get it to run 1 minute quicker by not using the ORM is
not relevant, unless importing CSV files is the main task of your
server, and it will be doing it 24x7.
Premature optimization is the root of all evil.
Cheers
Tom