How do I get pandas to read an excel file attached to a HTTP POST message?

316 views
Skip to first unread message

Arpan Sen

unread,
Feb 22, 2018, 3:42:03 PM2/22/18
to PyData
hi all

i am trying to read an input byte stream from http post into pandas read_excel, and i keep getting errors. can anyone please help?

here's my server (echo1.py):

import falcon
import json
import pandas
from StringIO import StringIO

class Echo(object):
  def on_post(self, req, resp):
    data = req.stream.read()
    f1 = pandas.read_excel(StringIO(data), encoding='utf-8') # ERROR IN THIS LINE
    resp.body = json.dumps({'aa' : 'bb'})
    resp.status = falcon.HTTP_200

api = falcon.API()
api.add_route('/', Echo())

i am running the server as gunicorn -w 2 -b 0.0.0.0:9100 echo1:api

here's my client:
h = {'charset': 'utf-8', 'content-type': 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet', 'content-disposition': 'form-data', 'filename': 'report.xlsx'}
with open('report.xlsx', 'rb') as f: r = requests.post('http://localhost:9100', headers=h,  files={'report.xlsx': f})

and here's the error:
[2018-02-23 00:42:16 +0530] [96230] [ERROR] Error handling request /
Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/gunicorn/workers/sync.py", line 135, in handle
self.handle_request(listener, req, client, addr)
File "/Library/Python/2.7/site-packages/gunicorn/workers/sync.py", line 176, in handle_request
respiter = self.wsgi(environ, resp.start_response)
File "/Library/Python/2.7/site-packages/falcon/api.py", line 227, in __call__
responder(req, resp, **params)
File "/Users/apple/Downloads/echo1.py", line 11, in on_post
f1 = pandas.read_excel(StringIO(data), encoding='utf-8')
File "/Library/Python/2.7/site-packages/pandas/util/_decorators.py", line 118, in wrapper
return func(*args, **kwargs)
File "/Library/Python/2.7/site-packages/pandas/io/excel.py", line 230, in read_excel
io = ExcelFile(io, engine=engine)
File "/Library/Python/2.7/site-packages/pandas/io/excel.py", line 292, in __init__
self.book = xlrd.open_workbook(file_contents=data)
File "/Library/Python/2.7/site-packages/xlrd/__init__.py", line 162, in open_workbook
ragged_rows=ragged_rows,
File "/Library/Python/2.7/site-packages/xlrd/book.py", line 91, in open_workbook_xls
biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
File "/Library/Python/2.7/site-packages/xlrd/book.py", line 1271, in getbof
bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
File "/Library/Python/2.7/site-packages/xlrd/book.py", line 1265, in bof_error
raise XLRDError('Unsupported format, or corrupt file: ' + msg)
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '--c17067'

i have tried opening report.xlsx with pandas separately and that works just fine.

thanks!



Reply all
Reply to author
Forward
0 new messages