utf-8' codec can't decode byte 0x93 in position 31 even though i use read().decode('utf-8') when i read a UploadedFile i get from a file selector in a View in django

111 views
Skip to first unread message

fábio andrews rocha marques

unread,
Sep 21, 2017, 7:07:53 PM9/21/17
to Django users
I have a template called cadastrarnovomaterial.html that is a page with a textfield and a file selection button. The user should select a .csv file with the file selector. It is like this:
<meta charset="utf-8"/>
<h1>Cadastrar um novo material</h1>
<b>'Cadastre um novo material que pode ser usado para criar provas'</b>
<form action="{% url 'terminarcadastronovomaterial' %}" method="post" enctype="multipart/form-data" class="form-horizontal" charset='UTF-8'>
{% csrf_token %} 
<label for="nome">Nome: </label>
{% if nomematerialcadastrar %}
<input id="nomematerial" type="text" name="nomematerial" value={{ nomematerialcadastrar }}>
{% else %}
<input id="nomematerial" type="text" name="nomematerial" value=""/>
{% endif %}
<div class="form-group">
    <label for="arquivocsv" class="col-md-3 col-sm-3 col-xs-12 control-label">Arquivo csv: </label>
    <div class="col-md-8">
        <input type="file" name="arquivocsv" id="arquivocsv" required="True" class="form-control">
    </div>                    
</div>
<input type="submit" value="Cadastrar novo material"/>
</form>

Then, i have the View associated with the url from the action of the above form 'terminarcadastronovomaterial'. It basically tries to first check if the .csv file is really a csv, it opens the file and starts to read it line by line:
def terminarcadastronovomaterial(request):
if request.method == 'POST':
nomematerial = request.POST['nomematerial']
arquivo = request.FILES.get('arquivocsv')
if arquivo.name.endswith('.csv'):
file_data = arquivo.read().decode('utf-8')
lines = file_data.split("\n")
for line in lines:
print("nova linha")
fields = line.split(";")
for field in fields:
if ',' in field:
distratores = field.split(",")
print("distratores")
for distrator in distratores:
print(distrator)
print("fim distratores")
else:
print(field) 

This code is working very well for non utf-8 characters inside de .csv. The problem is with utf-8 characters. In my .csv, i have a table and in this table there's utf-8 characters like "ônibus" and じてんしゃ. The thing is, when i run my django app and i try to read my csv, django is returning this message utf-8' codec can't decode byte 0x93 in position 31(which is "ônibus" in csv).
I've tried to search and search how to use utf-8 csv reading on python, but it only leads to using open file without a file selector screen(which is the case for django). How do i solve my problem? is the file.read().decode('utf-8') correct? Should i use another thing?

csvjapa.png

James Bennett

unread,
Sep 21, 2017, 11:43:42 PM9/21/17
to django...@googlegroups.com
The issue you're running into is that the sequence of bytes you passed in is not UTF-8, but you're trying to decode it as UTF-8. 

The character  'ô' -- that's U+00F4 LATIN SMALL LETTER O WITH CIRCUMFLEX -- is 0xC3 0xB4 in UTF-8. And the byte 0x93 can never be valid as the beginning of a UTF-8-encoded code point.

I can't tell you what encoding does use 0x93 for that character, but it is not UTF-8.

Antonis Christofides

unread,
Sep 22, 2017, 1:32:39 AM9/22/17
to django...@googlegroups.com

Hi,

in addition to what James said, it is unusual to write f.read().decode('utf-8'). Usually the result of f.read() is already decoded (unless you opened the file as a binary file, but you would normally not do that for a file containing text). It depends on how you opened the file, and the mechanics are different in Python 2 and Python 3.

BTW, I've been writing a series of posts on encodings, starting from https://djangodeployment.com/2017/06/19/encodings-part-1/. I'm interested in feedback.

Regards,

Antonis

Antonis Christofides
http://djangodeployment.com
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/2838894a-3a6b-4712-a53a-6b30ba53474f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Melvyn Sopacua

unread,
Sep 22, 2017, 1:34:35 AM9/22/17
to django...@googlegroups.com
0x93 is windows-1252 for opening fancy quote . I think the quoting for
your csv is incorrect.

http://www.i18nqa.com/debug/utf8-debug.html
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-users...@googlegroups.com.
> To post to this group, send email to django...@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-users.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/CAL13Cg-7M8VrjOzJA%2BDGQBQ49490SzDbV7Fz7fnY7R6TxxZmNw%40mail.gmail.com.
>
> For more options, visit https://groups.google.com/d/optout.



--
Melvyn Sopacua

fábio andrews rocha marques

unread,
Sep 27, 2017, 12:18:00 PM9/27/17
to Django users
I've managed to solve it! The problem wasn't on the csv, but on those "print" i used. It turns out command prompt on windows 8 can't print japanese letters. So, 'i've solved it by printing the output to a file instead of a command prompt
Reply all
Reply to author
Forward
0 new messages