Uploading & Encoding .xls files on Web Application

310 views
Skip to first unread message

Jessica Le

unread,
Feb 4, 2014, 2:55:39 PM2/4/14
to python...@googlegroups.com
Hi all, 

I'm trying to create a web application that lets users upload an .xls file that I then take and feed that uploaded.xls file into my program which reads and parses it. I am currently using Python 2.7 on the Web.py framework. 

However, I am having issues with the utf-8 encoding for the Excel files. I have searched everywhere on stackoverflow and google, but none of them are working. This method seems to be only working for .txt & .csv files, but when I try images or .pdf they don't work, so I'm not sure if the web.py built in library just doesn't support Excel files.  When I upload an Excel file, it just spits out unreadable content like the following:

 ■   ♠☺☻                 ☺   ☻╒═╒£.←►ô +,∙«0   ░     ☺   H   ↨   P   ♂   X
 ♀   ï   ☻   Σ♦  ♥     ♫ ♂       ♂       ♂       ♂       ▲►  ☺      Sheet1
   ▲   ♂   Worksheets ♥   ☺


Here is my code:

  1. class index:
  2.     def GET(self):
  3.      web.header("Content-Type","text/html; charset=utf-8")
  4.      return render.index(form)
  5.     def POST(self):
  6.         x = web.input(calendar_file={}, ref_id='')
  7.         if x:
  8.             ref_id = (x.ref_id if x.ref_id else "")
  9.             filepath=x.calendar_file.filename # replaces the windows-style slashes with linux ones.
  10.             fn=filepath.split('/')[-1] # splits the and chooses the last part (the filename
  11.             filename = "%s/Users/jl98567/Documents/xMatters_calendar_app/test/" + fn
  12.             fullpath = os.path.join('c:', filename % (ref_id))
  13.             content = x["calendar_file"].file.read()
  14.             with open(fullpath, 'w') as f_out:
  15.                 if not f_out:
  16.                     raise Exception("Unable to open %s for writing. " % (fullpath))
  17.                 f_out.write(content)
  18.         print x['calendar_file'].value
  19.         raise web.seeother('/upload?ref_id=%s&filename=%s' % (ref_id, filename))

Now, when I try to encode line 18, 

print x['calendar_file'].value.encode('utf-8')

I get the following error:

<type 'exceptions.UnicodeDecodeError'> at /

'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)


The weird thing is that I know encoding it to utf-8 works on my application that isn't web based or using the web.py file upload method. So I can't seem to see what the problem is here. 
For example:

content = str(sheet.cell(row,0).value.encode('utf8'))

that works perfectly fine. 


Any suggestions?

Thanks much! 

John Yeung

unread,
Feb 4, 2014, 5:36:47 PM2/4/14
to python-excel
Well, .xls is not really what I would call "encoded in UTF-8". If
there are any strings present in the .xls file, then those are encoded
in UTF-8. But most of the file is binary.

So the main thing you need to do is make sure you are working with a
stream of raw bytes, not characters. Now, I have practically no Web
experience, and I don't know how to make sure web.py is reading in a
stream of bytes (investigate lines 6 and 13), but I would imagine that
setting the content type to "text/html; charset=utf-8" isn't right.

Later, once you've got that sorted out, when it's time to write out
the content, don't do

with open(fullpath, 'w') as f_out:

This is definitely not going to work. It needs to be

with open(fullpath, 'wb') as f_out:

Perhaps someone familiar with web.py will offer more specific advice.
In fact, you currently don't have a Python Excel issue at all, but
just a how-to-handle-binary-files-on-the-Web issue. So if web.py has
a mailing list or forum, go check that out instead! Happy hunting!

John Y.

Adrian Klaver

unread,
Feb 4, 2014, 5:44:16 PM2/4/14
to python...@googlegroups.com
See here for content types:

http://en.wikipedia.org/wiki/Internet_media_type

for excel you are looking for

http://en.wikipedia.org/wiki/Internet_media_type#Prefix_vnd

application/vnd.ms-excel: Microsoft Excel files

so something like:

content_type="application/vnd.ms-excel"

>
> John Y.
>


--
Adrian Klaver
adrian...@gmail.com

John Machin

unread,
Feb 5, 2014, 4:59:59 AM2/5/14
to python...@googlegroups.com
On 5/02/2014 9:36 AM, John Yeung wrote:
> Well, .xls is not really what I would call "encoded in UTF-8". If
> there are any strings present in the .xls file, then those are encoded
> in UTF-8.
Hi John Y,

For "modern" XLS files, strings are encoded in ISO 8859-1 if they can be
(1 byte per character) otherwise in UTF-16LE (2 bytes per character).

Older files (Excel 95 and earlier) have a CODEPAGE record which provides
a codepage number (for example, 1252) which is used by xlrd to derive
the encoding (for same example: "cp1252") which is used to translate to
Unicode.

XLSX files are ZIP files i.e. also binary.

In any case, as you say the OP's problem is binary file handling, and
it's not specific to Excel files.

Also waiting for a web file-handling expert to chip in ...
John M

John Yeung

unread,
Feb 5, 2014, 9:24:19 AM2/5/14
to python-excel
On Wed, Feb 5, 2014 at 4:59 AM, John Machin <sjma...@lexicon.net> wrote:
> On 5/02/2014 9:36 AM, John Yeung wrote:
>> If there are any strings present in the .xls file, then those
>> are encoded in UTF-8.
>
> For "modern" XLS files, strings are encoded in ISO 8859-1 if they can be (1
> byte per character) otherwise in UTF-16LE (2 bytes per character).

Ah. I can see how I conflated those into a hazy, incorrect memory of UTF-8.

John Y.
Reply all
Reply to author
Forward
0 new messages