Import csv file on django view

Ronaldo Mata

unread,

Jul 20, 2020, 7:09:35 PM7/20/20

to django...@googlegroups.com

How to deal with encoding when you try to read a csv file on view.

I have a view to upload csv file, in this view I read file and save each row as new record.

My bug is when I try to upload a csv file with a differente encoding (not UTF-8)

how to handle this on django (using request.FILES) I was researching and I found chardet but I don't know how to pass it a request.FILES. I need help please.

Liu Zheng

unread,

Jul 21, 2020, 1:46:51 PM7/21/20

to Django users

Hi. First of all, I think it's impossible to perfectly detect encoding without further information. See the answer in this SO post: https://stackoverflow.com/questions/436220/how-to-determine-the-encoding-of-text There are many packages and tools to help detect encoding format, but keep in mind that they are only giving educated guesses. (Most of the time, the guess is correct, but do check the dev page to see whether there are known issues related to your problem.)

Now let's say you have decided to use chardet. Check its doc page for the usage: https://chardet.readthedocs.io/en/latest/usage.html#usage You'll have more than one solutions. Here are some examples:

1. If the files uploaded to your server are all expected to be small csv files (less than a few MB and not many users do it concurrently), you can do the following:

#in the view to handle the uploaded file: (assume file input name is just "file")
file_content = request.FILES['file'].read()
chardet.detect(file_content)

2. Also, chardet seems to support incremental (line-by-line) detection https://chardet.readthedocs.io/en/latest/usage.html#example-detecting-encoding-incrementally

Given this, we can also read from requests.FILES line by line and pass each line to chardet

from chardet.universaldetector import UniversalDetector

#somewhere in a view function
detector = UniversalDetector()
file_handle = request.FILES['file']
for line in file_handle:
    detector.feed(line)
    if detector.done: break
detector.close()
# result available as a dict at detector.result

Ronaldo Mata

unread,

Jul 22, 2020, 10:26:42 AM7/22/20

to django...@googlegroups.com

Hi Liu thank for your answer.

This has been a headache, I am trying to read the file using csv.DictReader initially i had an error trying to get the dict keys when iterating by rows, and i thought it could be encoding (for this reason i wanted to prepare the view to use the correct encoding). for that reason I asked my question.

1) your first approach doesn't work, if i send utf-8 file, chardet returns ascii as encoding. it seems request.FILES ['file']. read () returns a binary with that encoding.

2) In the end I realized that the problem was the delimiter of the csv but predicting it is another problem.

Anyway, it was a task that I had to do and that was my limitation. I think there must be a library that does all this, uploading a csv file is common practice in many web apps.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/64307441-0e65-45a2-b917-ece15a4ea729o%40googlegroups.com.

Kovy Jacob

unread,

Jul 22, 2020, 10:29:45 AM7/22/20

to django...@googlegroups.com

Could you just use the standard python csv module?

To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAP%3DoziQuZyb74Wsk%2BnjngUpSccOKCYRM_C%3D7KgGX%2BgV5wRzHwQ%40mail.gmail.com.

Ronaldo Mata

unread,

Jul 22, 2020, 10:40:59 AM7/22/20

to django...@googlegroups.com

Hi Kovy, I'm using csv module, but I need to handle the delimiters of the files, sometimes you come separated by "," others by ";" and rarely by "|"

To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/91E9FE01-4701-478C-B575-2BD5BA5DCE86%40gmail.com.

Kovy Jacob

unread,

Jul 22, 2020, 10:44:16 AM7/22/20

to django...@googlegroups.com

Ah, so is the problem that you don’t always know what the delimiter is when you read it? If yes, what is the use case for this? You might not need a universal solution, maybe just put all the info into a csv yourself, manually.

To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAP%3DoziSjnUSkWgHqb1RzsSHsUURLM9%3DPP0ZNX_zORkp3v-L1%2BQ%40mail.gmail.com.

Ronaldo Mata

unread,

Jul 22, 2020, 10:47:37 AM7/22/20

to django...@googlegroups.com

Yes, the problem here is that the files will be loaded by the user, so I don't know what delimiter I will receive. This is not a base command that I am using, it is the logic that I want to incorporate in a view

To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/1471A9A8-8BFD-41B0-9AC4-2EA424F1F989%40gmail.com.

Kovy Jacob

unread,

Jul 22, 2020, 11:01:17 AM7/22/20

to django...@googlegroups.com

Maybe first use the standard file.open to save the file to a variable, search that variable for the different delimiters using standard string manipulation vichulu, and then open it using the corresponding delimiter.

To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAP%3DoziSjnUSkWgHqb1RzsSHsUURLM9%3DPP0ZNX_zORkp3v-L1%2BQ%40mail.gmail.com.

Kovy Jacob

unread,

Jul 22, 2020, 11:04:45 AM7/22/20

to django...@googlegroups.com

That’s probably not the proper answer, but that’s the best I can do. Sorry :-(

To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAP%3DoziR%3DrkT%3DCHquc%3DOCB1WbmLFdGuJy0CWadM7bMs8-cGGPNw%40mail.gmail.com.

Liu Zheng

unread,

Jul 22, 2020, 11:12:49 AM7/22/20

to django...@googlegroups.com

Hi, glad you solved the problem. Yes, both the request.FILES[‘file’] and the chardet file handler are binary handlers. Binary handler presents the raw data. chardet takes a sequence or raw data and then detect the encoding format. With its prediction, if you want to open that puece of data in text mode, you can use the .decode(<encoding format>) method of bytes object to get a python string.

To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/1DD30686-3E37-4217-AC5A-F865A522F059%40gmail.com.

Kovy Jacob

unread,

Jul 22, 2020, 11:15:03 AM7/22/20

to django...@googlegroups.com

Cool! I’m so happy I was able to help you!! Good luck!

To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAGQ3pf-hZFLu6JpfTg7qj0jJ92v5br38z9Dx2m%3DkKwouiZZhFw%40mail.gmail.com.

Liu Zheng

unread,

Jul 22, 2020, 11:15:16 AM7/22/20

to django...@googlegroups.com

What i meant was that you can only feed binary data or binary handlers to chardet. You can decode the binary data according to the detection results afterward.

Kovy Jacob

unread,

Jul 22, 2020, 11:17:37 AM7/22/20

to django...@googlegroups.com

I’m confused. I don’t know if I can help.

On Jul 22, 2020, at 11:11 AM, Liu Zheng <firstd...@gmail.com> wrote:

To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAGQ3pf-hZFLu6JpfTg7qj0jJ92v5br38z9Dx2m%3DkKwouiZZhFw%40mail.gmail.com.

Ronaldo Mata

unread,

Jul 22, 2020, 12:05:06 PM7/22/20

to django...@googlegroups.com

Hi Kovy, this is not solved. Liu Zheng but using chardet(request.FILES['file'].read()) return encoding "ascii" is not correct, I've uploaded a file using utf-7 as encoding for example and the result is wrog. and then I tried request.FILES['file'].read().decode('ascii') and not work return bad data. Example for @ string return "+AEA-" string.

To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/73558DAD-CAE6-4275-A8F0-F3A7C47E1514%40gmail.com.

Liu Zheng

unread,

Jul 22, 2020, 12:25:53 PM7/22/20

to django...@googlegroups.com

Hi,

Are you sure that the file used for detection is the same as the file opened and decoded and gave you incorrect information?

By the way, ascii is a proper subset of utf-8. If chardet said it ascii, decoding it using utf-8 should always work.

If your file contains non-ascii UTF-8 bytes, maybe it’s a bug in chardet? You can try it directly, without mixing it with django’s requests first. Make sure you can detect and decode the file locally in a test program. Then put it into the app.

If you share the file, i’m also glad to help you try it.

To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAP%3DoziSHnZFKiXON8b5Jn7hu7LVX-jHCOQ%2BHUSeiBO%3DF3Q_yxw%40mail.gmail.com.

Ronaldo Mata

unread,

Jul 24, 2020, 10:09:44 AM7/24/20

to django...@googlegroups.com

Yes, I will try it. Anythin I will let you know

To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAGQ3pf-CsurYvoDYJvbqW9kTMQGMcu5XdJ2zJsp3zz5ZwFvT5g%40mail.gmail.com.

Jani Tiainen

unread,

Jul 24, 2020, 3:43:48 PM7/24/20

to django...@googlegroups.com

Hi,

I highly can recommend to use pandas to read csv. It does pretty good job to guess a lot of things without extra config.

Of course it's one more extra dependency.

To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAP%3DoziTNYmh37hvx0fJL0n5cK_4HBm3fBi5BZf%3D0cnrG3pzvmw%40mail.gmail.com.

Ronaldo Mata

unread,

Jul 24, 2020, 5:13:40 PM7/24/20

to django...@googlegroups.com

Hi Pandas require knows the encoding and delimiter previously when you use pd.read_csv(filepath, encoding=" ", delimiter=" ") I think that is the same 🤔

To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAHn91offCbz%3DH_QH%3D60wpVVM6xHFPnSj4oFg4ZMOso5PS5SfzA%40mail.gmail.com.

Liu Zheng

unread,

Jul 24, 2020, 10:33:44 PM7/24/20

to django...@googlegroups.com

Yes. You are right. Pandas' default behavior is as following:

encoding = sys.getsystemencoding() or "utf-8"

I tried to open a simple csv encoded into "utf16-LE" (popular on windows), and got the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAP%3DoziRCr_GBFvfE-FWW3v%3Dd2CV_G3Lr1JwGc%2BYR40y69ufcyw%40mail.gmail.com.

Naresh Jonnala

unread,

Jul 25, 2020, 11:37:15 AM7/25/20

to Django users

Hi,

I am not sure this will help or not, Still i want add a peace of code.

sniffer = csv.Sniffer()
dialect = sniffer.sniff(<first line of csv>)

dialect.__dict__
mappingproxy({'__module__': 'csv', '_name': 'sniffed', 'lineterminator': '\r\n',
'quoting': 0, '__doc__': None, 'doublequote': False, 'delimiter': ',',
'quotechar': '"', 'skipinitialspace': False})

lineterminator = dialect.lineterminator
quoting = dialect.quoting
doublequote = dialect.doublequote
delimiter = dialect.delimiter
quotechar = dialect.quotechar
skipinitialspace = dialect.skipinitialspace

csv.DictReader(self.file_open, **dialect)

Try this.

-

Naresh Jonnala

Hindustan.

To unsubscribe from this group and stop receiving emails from it, send an email to django...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/64307441-0e65-45a2-b917-ece15a4ea729o%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Django users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to django...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAP%3DoziQuZyb74Wsk%2BnjngUpSccOKCYRM_C%3D7KgGX%2BgV5wRzHwQ%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "Django users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to django...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/91E9FE01-4701-478C-B575-2BD5BA5DCE86%40gmail.com.

--
You received this message because you are subscribed to the Google Groups "Django users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to django...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAP%3DoziSjnUSkWgHqb1RzsSHsUURLM9%3DPP0ZNX_zORkp3v-L1%2BQ%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "Django users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to django...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/1471A9A8-8BFD-41B0-9AC4-2EA424F1F989%40gmail.com.

--
You received this message because you are subscribed to the Google Groups "Django users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to django...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAP%3DoziR%3DrkT%3DCHquc%3DOCB1WbmLFdGuJy0CWadM7bMs8-cGGPNw%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "Django users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to django...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/1DD30686-3E37-4217-AC5A-F865A522F059%40gmail.com.

--
You received this message because you are subscribed to the Google Groups "Django users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to django...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAGQ3pf-hZFLu6JpfTg7qj0jJ92v5br38z9Dx2m%3DkKwouiZZhFw%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "Django users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to django...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/73558DAD-CAE6-4275-A8F0-F3A7C47E1514%40gmail.com.

--
You received this message because you are subscribed to the Google Groups "Django users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to django...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAP%3DoziSHnZFKiXON8b5Jn7hu7LVX-jHCOQ%2BHUSeiBO%3DF3Q_yxw%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "Django users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to django...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAGQ3pf-CsurYvoDYJvbqW9kTMQGMcu5XdJ2zJsp3zz5ZwFvT5g%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "Django users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to django...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAP%3DoziTNYmh37hvx0fJL0n5cK_4HBm3fBi5BZf%3D0cnrG3pzvmw%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "Django users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to django...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAHn91offCbz%3DH_QH%3D60wpVVM6xHFPnSj4oFg4ZMOso5PS5SfzA%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "Django users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to django...@googlegroups.com.

Ronaldo Mata

unread,

Jul 25, 2020, 5:41:33 PM7/25/20

to django...@googlegroups.com

Hi Naresh Jonnala.

Yes, it's work to detect delimiter on csv file, But still I don't know how to detect what is the current encoding of csv file 🤔

I need to know how to implement a good uploading csv file view on django

Reply all

Reply to author

Forward