Import csv file on django view

429 views
Skip to first unread message

Ronaldo Mata

unread,
Jul 20, 2020, 7:09:35 PM7/20/20
to django...@googlegroups.com
How to deal with encoding when you try to read a csv file on view.

I have a view to upload csv file, in this view I read file and save each row as new record.

My bug is when I try to upload a csv file with a differente encoding (not UTF-8)

how to handle this on django (using request.FILES) I was researching and I found chardet but I don't know how to pass it a request.FILES. I need help please.

Liu Zheng

unread,
Jul 21, 2020, 1:46:51 PM7/21/20
to Django users
Hi. First of all, I think it's impossible to perfectly detect encoding without further information. See the answer in this SO post: https://stackoverflow.com/questions/436220/how-to-determine-the-encoding-of-text There are many packages and tools to help detect encoding format, but keep in mind that they are only giving educated guesses. (Most of the time, the guess is correct, but do check the dev page to see whether there are known issues related to your problem.)

Now let's say you have decided to use chardet. Check its doc page for the usage: https://chardet.readthedocs.io/en/latest/usage.html#usage You'll have more than one solutions. Here are some examples:

1. If the files uploaded to your server are all expected to be small csv files (less than a few MB and not many users do it concurrently), you can do the following:

#in the view to handle the uploaded file: (assume file input name is just "file")
file_content = request.FILES['file'].read()
chardet
.detect(file_content)

2. Also, chardet seems to support incremental (line-by-line) detection https://chardet.readthedocs.io/en/latest/usage.html#example-detecting-encoding-incrementally

Given this, we can also read from requests.FILES line by line and pass each line to chardet

from chardet.universaldetector import UniversalDetector

#somewhere in a view function
detector
= UniversalDetector()
file_handle
= request.FILES['file']
for line in file_handle:
    detector
.feed(line)
   
if detector.done: break
detector
.close()
# result available as a dict at detector.result




Ronaldo Mata

unread,
Jul 22, 2020, 10:26:42 AM7/22/20
to django...@googlegroups.com
Hi Liu thank for your answer.

This has been a headache, I am trying to read the file using csv.DictReader initially i had an error trying to get the dict keys when iterating by rows, and i thought it could be encoding (for this reason i wanted to prepare the view to use the correct encoding). for that reason I asked my question.

1) your first approach doesn't work, if i send utf-8 file, chardet returns ascii as encoding. it seems request.FILES ['file']. read () returns a binary with that encoding.

2) In the end I realized that the problem was the delimiter of the csv but predicting it is another problem.

Anyway, it was a task that I had to do and that was my limitation. I think there must be a library that does all this, uploading a csv file is common practice in many web apps.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/64307441-0e65-45a2-b917-ece15a4ea729o%40googlegroups.com.

Kovy Jacob

unread,
Jul 22, 2020, 10:29:45 AM7/22/20
to django...@googlegroups.com
Could you just use the standard python csv module?

Ronaldo Mata

unread,
Jul 22, 2020, 10:40:59 AM7/22/20
to django...@googlegroups.com
Hi Kovy, I'm using csv module, but I need to handle the delimiters of the files, sometimes you come separated by "," others by ";" and rarely by "|" 

Kovy Jacob

unread,
Jul 22, 2020, 10:44:16 AM7/22/20
to django...@googlegroups.com
Ah, so is the problem that you don’t always know what the delimiter is when you read it? If yes, what is the use case for this? You might not need a universal solution, maybe just put all the info into a csv yourself, manually.

Ronaldo Mata

unread,
Jul 22, 2020, 10:47:37 AM7/22/20
to django...@googlegroups.com
Yes, the problem here is that the files will be loaded by the user, so I don't know what delimiter I will receive. This is not a base command that I am using, it is the logic that I want to incorporate in a view

Kovy Jacob

unread,
Jul 22, 2020, 11:01:17 AM7/22/20
to django...@googlegroups.com
Maybe first use the standard file.open to save the file to a variable, search that variable for the different delimiters using standard string manipulation vichulu, and then open it using the corresponding delimiter.

Kovy Jacob

unread,
Jul 22, 2020, 11:04:45 AM7/22/20
to django...@googlegroups.com
That’s probably not the proper answer, but that’s the best I can do. Sorry :-(

Liu Zheng

unread,
Jul 22, 2020, 11:12:49 AM7/22/20
to django...@googlegroups.com
Hi, glad you solved the problem. Yes, both the request.FILES[‘file’] and the chardet file handler are binary handlers. Binary handler presents the raw data. chardet takes a sequence or raw data and then detect the encoding format. With its prediction, if you want to open that puece of data in text mode, you can use the .decode(<encoding format>) method of bytes object to get a python string.

Kovy Jacob

unread,
Jul 22, 2020, 11:15:03 AM7/22/20
to django...@googlegroups.com
Cool! I’m so happy I was able to help you!! Good luck!

Liu Zheng

unread,
Jul 22, 2020, 11:15:16 AM7/22/20
to django...@googlegroups.com

What i meant was that you can only feed binary data or binary handlers to chardet. You can decode the binary data according to the detection results afterward.

Kovy Jacob

unread,
Jul 22, 2020, 11:17:37 AM7/22/20
to django...@googlegroups.com
I’m confused. I don’t know if I can help.

On Jul 22, 2020, at 11:11 AM, Liu Zheng <firstd...@gmail.com> wrote:

Ronaldo Mata

unread,
Jul 22, 2020, 12:05:06 PM7/22/20
to django...@googlegroups.com
Hi Kovy, this is not solved. Liu Zheng but using chardet(request.FILES['file'].read()) return encoding "ascii" is not correct, I've uploaded a file using utf-7 as encoding for example and the result is wrog. and then I tried request.FILES['file'].read().decode('ascii') and not work return bad data. Example for @ string return "+AEA-" string.

Liu Zheng

unread,
Jul 22, 2020, 12:25:53 PM7/22/20
to django...@googlegroups.com
Hi, 

Are you sure that the file used for detection is the same as the file opened and decoded and gave you incorrect information?

By the way, ascii is a proper subset of utf-8. If chardet said it ascii, decoding it using utf-8 should always work.

If your file contains non-ascii UTF-8 bytes, maybe it’s a bug in chardet? You can try it directly, without mixing it with django’s requests first. Make sure you can detect and decode the file locally in a test program. Then put it into the app.

If you share the file, i’m also glad to help you try it.

Ronaldo Mata

unread,
Jul 24, 2020, 10:09:44 AM7/24/20
to django...@googlegroups.com
Yes, I will try it. Anythin I will let you know

Jani Tiainen

unread,
Jul 24, 2020, 3:43:48 PM7/24/20
to django...@googlegroups.com
Hi,

I highly can recommend to use pandas to read csv. It does pretty good job to guess a lot of things without extra config. 

Of course it's one more extra dependency. 


Ronaldo Mata

unread,
Jul 24, 2020, 5:13:40 PM7/24/20
to django...@googlegroups.com
Hi Pandas require knows the encoding and delimiter previously when you use pd.read_csv(filepath, encoding=" ", delimiter=" ") I think that is the same 🤔

Liu Zheng

unread,
Jul 24, 2020, 10:33:44 PM7/24/20
to django...@googlegroups.com
Yes. You are right. Pandas' default behavior is as following:

encoding = sys.getsystemencoding() or "utf-8"

I tried to open a simple csv encoded into "utf16-LE" (popular on windows), and got the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

Naresh Jonnala

unread,
Jul 25, 2020, 11:37:15 AM7/25/20
to Django users
Hi,

I am not sure this will help or not, Still i want add a peace of code.

sniffer = csv.Sniffer()
dialect = sniffer.sniff(<first line of csv>)

dialect.__dict__
mappingproxy({'__module__': 'csv', '_name': 'sniffed', 'lineterminator': '\r\n',
'quoting': 0, '__doc__': None, 'doublequote': False, 'delimiter': ',',
'quotechar': '"', 'skipinitialspace': False})

lineterminator = dialect.lineterminator
quoting = dialect.quoting
doublequote = dialect.doublequote
delimiter = dialect.delimiter
quotechar = dialect.quotechar
skipinitialspace = dialect.skipinitialspace

csv.DictReader(self.file_open, **dialect)

Try this.

-
Naresh Jonnala
Hindustan.
To unsubscribe from this group and stop receiving emails from it, send an email to django...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django...@googlegroups.com.

Ronaldo Mata

unread,
Jul 25, 2020, 5:41:33 PM7/25/20
to django...@googlegroups.com
Hi  Naresh Jonnala.

Yes, it's work to detect delimiter on csv file, But still I don't know how to detect what is the current encoding of csv file 🤔 

I need to know how to implement a good uploading csv file  view on django 
Reply all
Reply to author
Forward
0 new messages