UnicodeDecodeError for Decoding Uploaded Excel Files

251 views
Skip to first unread message

Jessica Le

unread,
Jan 17, 2014, 4:58:15 PM1/17/14
to we...@googlegroups.com
Hi all, 

I'm trying to create an application that lets users upload an .xls file that I then take and feed that uploaded.xls file into my program which reads and parses it. However, I am having issues with the utf-8 encoding for the Excel files. I have searched everywhere on stackoverflow and google, but none of them are working.

Here is my code:

def POST(self):
        x = web.input(calendar_file={}, ref_id='')
        if x:
        ref_id = (x.ref_id if x.ref_id else "")
        filepath=x.calendar_file.filename # replaces the windows-style slashes with linux ones.
        fn=filepath.split('/')[-1] # splits the and chooses the last part (the filename
        filename = "%s/Users/jl98567/Documents/xMatters_calendar_app/test/" + fn
        fullpath = os.path.join('c:', filename % (ref_id))
        content = x["calendar_file"].file.read()
        with open(fullpath, 'w') as f_out:
        if not f_out:
        raise Exception("Unable to open %s for writing. " % (fullpath))
        f_out.write(content)
        print str(x['calendar_file'].value.encode('utf8','ignore'))
        raise web.seeother('/upload?ref_id=%s&filename=%s' % (ref_id, filename))


Here is the error;

<type 'exceptions.UnicodeDecodeError'> at /

'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)

PythonC:\Users\jl98567\Documents\xMatters_calendar_app\schedule_web.py in POST, line 45
WebPOST http://localhost:8080/

line 45 is: print str(x['calendar_file'].value.encode('utf8','ignore'))

I have tried decoding it and then encoding it similar to this one:

print str(x['calendar_file'].value.decode('utf-8').encode('utf8','ignore'))

 but still doesn't work.

Any suggestions?

Thanks much! 

Hugo Lol

unread,
Jan 17, 2014, 5:15:09 PM1/17/14
to we...@googlegroups.com
What if you just do this:

print x['calendar_file'].value

Also, why are you converting that to str after encoding to UTF-8? The str type only works with 7-bit ASCII char codes. Try removing the str conversion. Also, try decoding instead of coding: x['calendar_file'].value.decode("utf-8").

This is the bad face of Python 2, the UTF-8 errors and all that stuff.

Hope it helps! 

Jessica Le

unread,
Jan 17, 2014, 8:01:21 PM1/17/14
to we...@googlegroups.com
Hi Hugo! Thanks for your reply. 

I tried them all, but none of the encoding and decoding methods work. Still receiving the Unicode error. If I try the 

print x['calendar_file'].value

it works, but it doesn't translate it correctly. It just prints out a bunch of garbage. 

Hugo Lol

unread,
Jan 20, 2014, 6:52:14 PM1/20/14
to we...@googlegroups.com
Can you upload the Excel file or something so I can test this in my PC?

Jessica Le

unread,
Jan 22, 2014, 11:15:55 PM1/22/14
to we...@googlegroups.com
Here's my get and post methods and well as the attached test data:

class index:
    def GET(self):
    web.header("Content-Type","text/html; charset=utf-8")
    return render.index(form)
    def POST(self):
        x = web.input(calendar_file={}, ref_id='')
        if x:
            ref_id = (x.ref_id if x.ref_id else "")
            filepath=x.calendar_file.filename # replaces the windows-style slashes with linux ones.
            fn=filepath.split('/')[-1] # splits the and chooses the last part (the filename
            filename = "%s/Users/jl98567/Documents/xMatters_calendar_app/test/" + fn
            fullpath = os.path.join('c:', filename % (ref_id))
            content = x["calendar_file"].file.read()
            with open(fullpath, 'w') as f_out:
                if not f_out:
                    raise Exception("Unable to open %s for writing. " % (fullpath))
                f_out.write(content)

        print x['calendar_file'].value
        
        raise web.seeother('/upload?ref_id=%s&filename=%s' % (ref_id, filename))

Thank you so much! 

On Mon, Jan 20, 2014 at 5:52 PM, Hugo Lol <nethu...@gmail.com> wrote:
Can you upload the Excel file or something so I can test this in my PC?

--
You received this message because you are subscribed to a topic in the Google Groups "web.py" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/webpy/_i64Ym_Ubxg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to webpy+un...@googlegroups.com.
To post to this group, send email to we...@googlegroups.com.
Visit this group at http://groups.google.com/group/webpy.
For more options, visit https://groups.google.com/groups/opt_out.

testdata.xls

Jessica Le

unread,
Feb 3, 2014, 8:52:21 PM2/3/14
to we...@googlegroups.com
So after doing more research/testing, it seems like the built in function for file upload only seems to work flawlessly for txt files. Is there a library out there for other kinds of files? 

David Kopec

unread,
Feb 4, 2014, 10:08:59 PM2/4/14
to we...@googlegroups.com
Hi Jessica,

I'm not sure this has to do with web.py.  Are you able to print the file when opening it locally in python using the same code?

Jessica Le

unread,
Feb 4, 2014, 11:04:40 PM2/4/14
to we...@googlegroups.com
Hi David, 

So when I try to print it locally via the terminal, I just get a bunch of webdings style text like this:

 ■   ♠☺☻                 ☺   ☻╒═╒£.←►ô +,∙«0   ░     ☺   H   ↨   P   ♂   X
 ♀   ï   ☻   Σ♦  ♥     ♫ ♂       ♂       ♂       ♂       ▲►  ☺      Sheet1
   ▲   ♂   Worksheets ♥   ☺

... basically unreadable. The method works for .txt files flawlessly, so I know it has to do with the .xls formatting. I tired converting the .xls files to .csv too, but same thing - it's unreadable. 

Thanks for your input David!


Ben Corneau

unread,
Feb 5, 2014, 11:57:08 PM2/5/14
to we...@googlegroups.com
Not sure if this will resolve your issue, but when you open the file for writing you'll want it in binary mode when running on windows.

 with open(fullpath, 'wb') as f_out:


--
You received this message because you are subscribed to the Google Groups "web.py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to webpy+un...@googlegroups.com.

Kevin Houlihan

unread,
Feb 6, 2014, 5:14:20 AM2/6/14
to we...@googlegroups.com
Hi Jessica,

xls files are binary files, so gibberish is what you should expect if you dump them to the terminal, and decoding them as utf8 (or any other character encoding) is not going to have meaningful results. You might see the odd embedded string but that's about it. You need to understand the binary format if you want to get anything out of the file. Microsoft have a primer on it here: http://msdn.microsoft.com/en-us/library/gg615597(v=office.14).aspx

Alternatively you could use a python module that understands the format, if there are any that are up to the task! Stackoverflow directed me to this one: http://www.python-excel.org/

Regards,
Kevin


--
You received this message because you are subscribed to the Google Groups "web.py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to webpy+un...@googlegroups.com.

James Tyra

unread,
Feb 3, 2014, 8:58:05 PM2/3/14
to we...@googlegroups.com

Jessica just so you know I can not possibly imagine a reason to run pythons "encode" or "decode" functions on an uploaded Excel file or ANY file. These functions interpret the input as a text stream, modifying it in various ways to make it human readable. This will absolutely corrupt an excel file or any non txt file.

In order to process Microsoft xls or xlsx files with Python you are going to need some type of special library.

I recommend you take a step back for a second and explain to us what your trying to accomplish. 

Jessica Le

unread,
Feb 10, 2014, 7:21:24 PM2/10/14
to we...@googlegroups.com
I dont quite understand why I would want to encode/decode the xls files either, but for some reason the upload method from the webpy library throws the UnicodeDecodeError when I try to print out the values from the uploaded Excel file. The upload file method works for text files fine though. I investigated the UnicodeDecode error on stackoverflow, and it seemed like that was the way to go... 

All I want to do is let the user be able to upload an Excel file using REST. It shouldn't be this hard :( 


--
You received this message because you are subscribed to a topic in the Google Groups "web.py" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/webpy/_i64Ym_Ubxg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to webpy+un...@googlegroups.com.

Kevin Houlihan

unread,
Feb 11, 2014, 5:28:34 AM2/11/14
to we...@googlegroups.com
Hi Jessica,

In the original code you posted it looks like you had already saved the file to disk before you went about trying to print it. Did this result in a valid file that you could open in excel?

Regards,
Kevin

Jessica Le

unread,
Feb 11, 2014, 2:42:04 PM2/11/14
to we...@googlegroups.com
No, it results in a corrupt Excel file. =(

David Kopec

unread,
Feb 11, 2014, 2:52:38 PM2/11/14
to we...@googlegroups.com
Is it possible the file is somehow being encoded upon upload?  That would explain corruption, right?  Is the POST specifying that it is transferring binary data?  Is there MIME Type set to Excel?  Just ideas, not sure what's going on.

Jessica Le

unread,
Feb 12, 2014, 3:19:14 PM2/12/14
to we...@googlegroups.com
I'm not sure if the file is being encoded upon upload... I'm trying to find more documentation on the web.input method because I think it maybe have to do with this problem. Perhaps it doesn't support Excel files?

Here is the code that I'm trying to print out the contents of the file:

 x = web.input(calendar_file={}, ref_id='') #picks up the upload contents of the file 
print x['calendar_file'].value

Maybe I'm using the method wrong, but there isn't much documentation on this... 

This is what I"m using: http://webpy.org/cookbook/fileupload
Reply all
Reply to author
Forward
0 new messages