Re: How to Handle XML and CSV processing in Django

1,176 views
Skip to first unread message

Russell Keith-Magee

unread,
May 14, 2013, 12:56:43 AM5/14/13
to django...@googlegroups.com

On Tue, May 14, 2013 at 10:27 AM, Muhammad Ali <mgma...@gmail.com> wrote:
Hi All,

I am currently learning Django and getting comfortable with it (having somewhat covered the official tutorial. :) )

I want to develop a Django-based app using open datasets that I downloaded from the web. But I hit a wall when I realized that I don't know how to handle processing the XML or CSV files once I have them.

Am I going to import these files into my app's or project's models.py file? Or, are they to be processed in a different Python file and then be called into one of the other generic Django modules [models, views, etc.] ?

Essentially, there's nothing particularly Django specific about this question. You have a CSV/XML file, you need to write some Python code to read and process those files. The only Django-specific part is when you use Django's model API to create data instances. If I assume you've got a data file containing personal information, and a Django model called Person, your import tool will end up looking something like:

for line in datafile:
   person = Person()
   person.name = line[0]
   person.age = line[1]
   person.address = line[2]
   person.save()

That is, you just iterate through the data file, and create instances of your Django object one by one as you read your data. This example assumes that "line" provides parsed data in a list-like structure where element 0 is the name, element 1 is the age, and so on; if you're using CSV, that's a common format (and Python has a very good built-in CSV parsing library that provides data in this way). If you're using XML, the code will be slightly different (since you're going to be dealing with XML APIs), but the same broad principle stands.
 
Or are the raw (CSV, XML or other) dataset files supposed to be bundled in a certain way with Django? If so, how do I go about it. And, how do I handle frequently-updated data and/or datasets?

There are officially supported serialisation formats (XML, JSON and YAML) that you can use to load data directly into your Django models. However, those files need to be in a specific format, and it's unlikely that your open data source is providing files in that format. You *could* go to the trouble of converting your data file into a "Django compatible" format, but in all honestly, it's going to be easier to just write your own parser.

Yours,
Russ Magee %-)

Muhammad Ali

unread,
May 14, 2013, 1:38:09 AM5/14/13
to django...@googlegroups.com
Russell,

Thank you very much for the insight.

This was very helpful. :)

As for the "Django-specific format" that you mentioned, what are you referring to here?

Thanks for your time.

Russell Keith-Magee

unread,
May 14, 2013, 1:50:10 AM5/14/13
to django...@googlegroups.com
On Tue, May 14, 2013 at 1:38 PM, Muhammad Ali <mgma...@gmail.com> wrote:
Russell,

Thank you very much for the insight.

This was very helpful. :)

As for the "Django-specific format" that you mentioned, what are you referring to here?

If you've got a tutorial project, try running:

./manage.py dumpdata auth --format=json --indent=2

That will dump the contents of the contrib.auth app in the database (i.e., all your users and permissions) in the accepted JSON format.

For more details, look at the documentation for the loaddata [1] and dumpdata [2] management commands, as well as the documentation about serializers [3]:


Yours,
Russ Magee %-)

Muhammad Ali

unread,
May 14, 2013, 12:41:41 PM5/14/13
to django...@googlegroups.com
Thanks a lot for these resources.

All the best,
Muhammad

phi...@bailey.st

unread,
May 14, 2013, 12:54:55 PM5/14/13
to django...@googlegroups.com
Hi Muhammad,

if you have a csv dataset you can try to import the data into mysql
following my short tutorial.


http://bailey.st/blog/2012/02/22/bits-of-python-import-a-csv-file-into-a-mysql-database/


Best,

Phillip


--
www.bailey.st

Muhammad Ali

unread,
May 15, 2013, 12:38:17 PM5/15/13
to django...@googlegroups.com
Philip, thank you for the tutorial. That comes in very handy as well.

All the best. :)

Reply all
Reply to author
Forward
0 new messages