Re: XLRD issue

56 views
Skip to first unread message

Chris Withers

unread,
May 7, 2012, 4:02:15 PM5/7/12
to Bryan Wyatt, python...@googlegroups.com
Hi Bryan,

On 07/05/2012 20:59, Bryan Wyatt wrote:
> Hopefully this is the right contact method. If not my apologies.

The correct way is always the mailing list ;-)

> I am working with XLRD trying to get it to open up sheets from
> http://www.eia.gov/totalenergy/data/monthly/
>
> eg:
> http://www.eia.gov/totalenergy/data/monthly/query/mer_data_excel.asp?table=T01.01
>
> I keep getting the following error:
> xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF
> record; found '<html><h'

That should give you a hint: that url is returning html, not an excel file.

It may be that reading that html will tell you what's gone wrong.
It may be that eia.gov use a rather crappy technique that serves up html
in a way that excel opens as a spreadsheet. If that's what's going on,
you want an html parser such as BeautifulSoup to extract the data, not xlrd.

cheers,

Chris

--
Simplistix - Content Management, Batch Processing & Python Consulting
- http://www.simplistix.co.uk

Craig Barnes

unread,
May 7, 2012, 5:00:48 PM5/7/12
to python...@googlegroups.com

Hi Bryan,

On 7 May 2012 21:02, Chris Withers <ch...@simplistix.co.uk> wrote:
Hi Bryan,

On 07/05/2012 20:59, Bryan Wyatt wrote:
Hopefully this is the right contact method.  If not my apologies.

The correct way is always the mailing list ;-)

I am working with XLRD trying to get it to open up sheets from
http://www.eia.gov/totalenergy/data/monthly/

eg:
http://www.eia.gov/totalenergy/data/monthly/query/mer_data_excel.asp?table=T01.01

I keep getting the following error:
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF
record; found '<html><h'

That should give you a hint: that url is returning html, not an excel file.

The URL looks like it returns an xls file ok when clicked, but then when I open using urllib2 it sends html.  You probably want to find out how this url is intended to work.

HTH

--
Craig

'The first time any man's freedom is trodden on - we are all damaged.' Jean-Luc Picard
()  ascii ribbon campaign - against html mail
/\

David Avraamides

unread,
May 7, 2012, 5:16:38 PM5/7/12
to python...@googlegroups.com
That URL downloads an HTML table named with an .xls extension. Excel is just reading it for you, like it does for other known formats (TDF, CSV, etc).

If you click on that link and open the downloaded file in a text editor, you'll see it's just a big HTML table.

--
You received this message because you are subscribed to the Google Groups "python-excel" group.
To post to this group, send an email to python...@googlegroups.com.
To unsubscribe from this group, send email to python-excel...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/python-excel?hl=en-GB.

Reply all
Reply to author
Forward
0 new messages