Model Generation for CSV, XLS Files

252 views
Skip to first unread message

Muskan Vaswan

unread,
Nov 25, 2020, 2:53:06 PM11/25/20
to Django developers (Contributions to Django itself)
Hi everyone,
I am Muskan and am very new to this community however not that new to django itself. Contributing to django is something I would really like to do, and I might also be participating in GSoc for the same. 

I have an vague idea of what I want to fix, because I myself have used django and just want to add things as a developer that I would've wanted as a user. So my question is understanding based around this idea of mine. 

I want to know what functionality already exists that makes it easier to directly load a large data set into a django model from a CSV file (going with the simplest format or now). When I had to do it as a user it took me quite a while and was a lot more complicated than I had expected, after being used to smooth transitions with load data. I could not find any better methods to do it, if there indeed exists no other methods, this is something I would like to work on.... So this is just to confirm if my research was thorough enough (very possibly wasn't).

Thank you! I'm excited to begin helping out!

Arvind Nedumaran

unread,
Nov 25, 2020, 6:22:22 PM11/25/20
to Django developers

Hi Muskan,

There isn’t anything that directly lets you load a CSV and generate models for it as far as I know (I may be wrong).

But check out the Django documentation’s HOWTO on integrating with legacy databases - https://docs.djangoproject.com/en/3.1/howto/legacy-databases/.

One possible solution for what you’re trying to do may be imagined as a flag for the ‘inspectdb’ command that lets you pass a CSV file? Just a thought.

Good luck with GSOC.

Onward,
Arvind

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/a3946181-cd91-4b83-b9d6-d8d8f786b6acn%40googlegroups.com.

Philip Mutua

unread,
Nov 26, 2020, 10:00:44 AM11/26/20
to django-d...@googlegroups.com
Hi Musksan,

You need to implement some functionality to upload the CSV file and read it then upload data into into database in bulk. You can django has a way to add multiple objects using bulk_create. Here is a full example I hope it will help.


--

Shoury Sharma

unread,
Nov 26, 2020, 10:55:30 AM11/26/20
to django-d...@googlegroups.com
Hello!
I'm also looking forward for GSOC'21. If you prefer we could try merging this functionalities by submitting patches. 
Regards

--

Muskan Vaswan

unread,
Nov 26, 2020, 1:24:51 PM11/26/20
to Django developers (Contributions to Django itself)
Thank you all for your response, I really appreciate it. This seems to be a very active and helpful community already love django more!

Muskan Vaswan

unread,
Nov 26, 2020, 1:24:51 PM11/26/20
to django-d...@googlegroups.com
I would like to tackle this singularly first if possible, especially for gsoc. Hope you understand.... I appreciate your offer


Jon Dufresne

unread,
Nov 26, 2020, 2:09:13 PM11/26/20
to django-d...@googlegroups.com
Is django-import-export at all along the lines of what you're looking for?


> django-import-export is a Django application and library for importing and exporting data with included admin integration.
> ...
> support multiple formats (Excel, CSV, JSON, ...)

--

Muskan Vaswan

unread,
Nov 26, 2020, 11:50:56 PM11/26/20
to Django developers (Contributions to Django itself)
Django-import- export is along the lines of what I want to implement, this is how i eventually managed to import a csv file as a user.... there are drawbacks to the library, the biggest one being its not the easiest to use without admin integration, which isn't always possible in large projects... 
I wanted to implement something similar to loaddata, but for .csv files, or .xls files rather than just the dumpdata.json  

```python manage.py dumpdata > dumped_data.json```
and ```python manage.py loaddata < dumped_data.json``` do an Excellent job of taking data from existing django models and re using them, I was thinking about adding a similar functionality for datasets into models. 

So something like ```python manage.py loaddata datast.csv <app>.<model>```
Do you think this would be helpful....?

P.S. I'm very new to this so I'm not sure how to go about making a proposal in the community etc, so if I'm doing this wrong, please point me to the correct direction

Mariusz Felisiak

unread,
Nov 27, 2020, 12:25:24 AM11/27/20
to Django developers (Contributions to Django itself)
Hi,

    Please take a look at existing ticket #5253 which was rejected. We should reach a strong consensus on the mailing list to reopen a closed ticket (see triaging guidelines with regards to wontfix tickets), but I think that Russ' comment is still valid.

Best,
Mariusz

Ryan Gedwill

unread,
Nov 27, 2020, 10:50:17 AM11/27/20
to django-d...@googlegroups.com
I recently developed a feature for my company that allows for mass upload of users and a few other core models. I did use Django-rest-framework for the core file upload piece which may not be in the scope of this group, but it was fairly simple and if you wanted to do it for DSoC it'd be pretty straightforward. The only part from DRF that would need to be replicated would be de-serialization of the file.

Paolo Melchiorre

unread,
Nov 27, 2020, 12:49:06 PM11/27/20
to django-d...@googlegroups.com
On Fri, Nov 27, 2020 at 5:51 AM Muskan Vaswan <muskan...@gmail.com> wrote:
> I want to know what functionality already exists that makes it easier to directly load a large data set into a django model from a CSV file (going with the simplest format or now).

Hi Muskan,

In PostgreSQL I use the COPY FROM to load a large set of data directly
into the database table from a CSV file.
https://www.postgresql.org/docs/current/sql-copy.html

In a Django project I used Django Postgres Copy to load 1M lines CSV
file in 1 second:
https://github.com/california-civic-data-coalition/django-postgres-copy

I know my answer is not useful for other supported database backends,
I only tried to share my experience.

Best,
Paolo

--
Paolo Melchiorre

https://www.paulox.net

Muskan Vaswan

unread,
Nov 28, 2020, 1:15:31 AM11/28/20
to Django developers (Contributions to Django itself)

Hi Markzus,

 

Issue #5253 that you pointed me to was indeed very helpful. What I suggested is along the lines of what Adam suggested and tried to accomplish 12 years ago, which clearly shows that there has been space for its implementation for a very long time.

 

At the same time, all of the points brought up by Russ also made sense but I do think that we can resolve those issues to some extent. Allow me to address the little details one by one.

 

Russ pointed out the need for a standardized format in csv files, and in turn Adam suggested that the first line begin the table name. I have a slightly different suggestion. It is a well known convention to put the column names as comma separated values in the first line of the csv file. I suggest we leave it at that.

The column names of the first line will correspond to the fieldnames of the model created by the user, which instead of being referenced in the file itself can be referenced as a command line argument.

This of course operates under the assumption that only one table of data will be in a single file. Which isn't a very bold assumption to make. I'll elaborate on why in just a bit.

 

I'm going to try and do a better job of explain the context of the addition I speak of. Say you get your data off of a random site on the internet in the form of an Excel sheet. Now if you want to use it in django as a model, then right now this is what you'll have to do: Convert it to CSV from Excel. Write a script somethink like the one https://groups.google.com/g/django-developers/c/o1dFA31YwOk/m/XchFFWjnBQAJ talks about (This views file is the code I'm referring to)In which we read line by line and create the objects with a python script and make model objects with it. Creating a sub app to only be able to upload your dataset into your model seems like a stretch. This process could be made seem less if we just made this on the backend of django, which is going to take effort but I do think that it will be useful to the user.

So the whole could simply boil down to `python manage.py loaddata dataset.csv <app>.<ModelName>`

 

If you consider this then the assumption that one table will correspond to a single csv file really is pretty reasonable I think.

 

So I'm not suggesting that we re-open the issue, I do however think that for django which is so user friendly and time efficient, uploading a csv file into a table really shouldn't be so long a process.

 Let me know if I made any sense..

Mariusz Felisiak

unread,
Nov 28, 2020, 3:42:29 AM11/28/20
to Django developers (Contributions to Django itself)
Russ pointed out the need for a standardized format in csv files, and in turn Adam suggested that the first line begin the table name. I have a slightly different suggestion. It is a well known convention to put the column names as comma separated values in the first line of the csv file. I suggest we leave it at that. 

-1
I wouldn't call in a convention. I've imported thousands of CSV files in dozens of different formats in my life. The lack of a standard is still a strong argument against including it in Django. It's not something that we want to maintain in Django. It sounds like a third-party package with a custom serializer is the best way to proceed.

Best,
Mariusz
not Markzus ;)

Muskan Vaswan

unread,
Nov 28, 2020, 9:05:42 AM11/28/20
to Django developers (Contributions to Django itself)
Alright Mariusz, (sorry for messing up your name) then I think that puts an end to this particular conversation, I appreciate your feedback and help, same goes for everyone. I'll try to make myself useful in some other way then! 

Denis Urman

unread,
Nov 28, 2020, 1:42:10 PM11/28/20
to django-d...@googlegroups.com
The CSV module in the Python standard library has solved the issues mentioned in this thread vis a vis format. No need to reinvent the wheel. There are options and flags for field names and delimiters. Before relying on a third party depencency, one must demonstrate the necessity. What edge cases does CSV not account for and why is it our responsibility to solve for them?

Serializing/deserializing from a CSV to a Model is trivial if the Models are already written. We need to figure out how to handle arrays/sets for normalized databases with foreign keys. For example in one of my projects, an Invoice has a Vendor (many to one) Perhaps a flag for which attributes to fetch for the reverse set and which set (list, dictionary, etc.) to return them as?

I'd love to jump on this and come up with a RFC or something. I have a huge amount of business logic related to (de)serialization in my app.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.

Akash T S

unread,
May 18, 2021, 8:56:27 AM5/18/21
to Django developers (Contributions to Django itself)
Hi I need Help
Reply all
Reply to author
Forward
0 new messages