Which python package can open Excel HTML format?

492 views
Skip to first unread message

Victor Hsu

unread,
Aug 26, 2016, 9:00:30 AM8/26/16
to python-excel
Hi,

Is there any package recommendation to operate the Excel html data with python ?
The Header of Excel file I got is below format?

<html xmlns:o="urn:schemas-microsoft-com:office:office"
 xmlns:x="urn:schemas-microsoft-com:office:excel"
 <head><meta http-equiv=Content-Type content="text/html" charset=utf-8></head><html xmlns:o="urn:schemas-microsoft-com:office:office"
 xmlns:x="urn:schemas-microsoft-com:office:excel"
 <head><meta http-equiv=Content-Type content="text/html" charset=utf-8></head>
<head><style>
br {mso-data-placement:same-cell;}
</style></head>
<head>
<meta http-equiv=Content-Type content="text/html; charset=utf-8">
<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 9">
<style>


 

Charlie Clark

unread,
Aug 26, 2016, 9:19:54 AM8/26/16
to python...@googlegroups.com
Am .08.2016, 15:00 Uhr, schrieb Victor Hsu <victo...@compal.com>:

> Is there any package recommendation to operate the Excel html data with
> python ?
> The Header of Excel file I got is below format?

This isn't actually an Excel file. Instead it looks like someone used the
option to "save as HTML" Or a library that works like this. Your best
option is probably to run this through a headless version of LibreOffice
or OpenOffice to convert into something useful. None of the Python Excel
libraries can work with this kind of file.

Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Kronenstr. 27a
Düsseldorf
D- 40217
Tel: +49-211-600-3657
Mobile: +49-178-782-6226

Victor Hsu

unread,
Aug 26, 2016, 9:44:28 AM8/26/16
to python-excel
thanks.. 

I got this kind of Excel file from JIRA or DevSuite Bug tracking system. 

I can use Win32com to call Excel and convert to some format others can use.

However, the problem here is my tool must have Excel installed. Some of users doesn't have that.

So I am thinking whether there is library to do that.

btw, what does "Headless version" mean? 



Victor Hsu於 2016年8月26日星期五 UTC+8下午9時00分30秒寫道:

Charlie Clark

unread,
Aug 26, 2016, 9:48:52 AM8/26/16
to python...@googlegroups.com
Am .08.2016, 15:44 Uhr, schrieb Victor Hsu <victo...@compal.com>:

> thanks..
>
> I got this kind of Excel file from JIRA or DevSuite Bug tracking system.
>
> I can use Win32com to call Excel and convert to some format others can
> use.

The other option is to use xlwings to remote control Excel.

> However, the problem here is my tool must have Excel installed. Some of
> users doesn't have that.
>
> So I am thinking whether there is library to do that.
>
> btw, what does "Headless version" mean?

It means you can run it from the command line without the GUI ever
appearing. Really useful for converting files.

John Yeung

unread,
Aug 26, 2016, 10:36:11 AM8/26/16
to python-excel
On Fri, Aug 26, 2016 at 9:48 AM, Charlie Clark
<charli...@clark-consulting.eu> wrote:
> Am .08.2016, 15:44 Uhr, schrieb Victor Hsu <victo...@compal.com>:
>> I can use Win32com to call Excel and convert to some format others can
>> use.
>
> The other option is to use xlwings to remote control Excel.

xlwings relies on win32com, if you're on Windows.

John Y.

Charlie Clark

unread,
Aug 26, 2016, 10:53:23 AM8/26/16
to python...@googlegroups.com
Am .08.2016, 16:36 Uhr, schrieb John Yeung <gallium....@gmail.com>:

> xlwings relies on win32com, if you're on Windows.

Yes, but it also works on MacOS. More importantly, it provides a nicer API
for Excel for anything more sophisticated than opening and closing a file.

John Yeung

unread,
Aug 26, 2016, 11:06:17 AM8/26/16
to python-excel
On Fri, Aug 26, 2016 at 10:53 AM, Charlie Clark
<charli...@clark-consulting.eu> wrote:
> Am .08.2016, 16:36 Uhr, schrieb John Yeung <gallium....@gmail.com>:
>
>> xlwings relies on win32com, if you're on Windows.
>
> Yes, but it also works on MacOS. More importantly, it provides a nicer API
> for Excel for anything more sophisticated than opening and closing a file.

But OP already said he was fine using win32com, so he's not on a Mac.
(And he's not really doing much more than opening and closing a file.)
He also said that he is looking for an alternative that doesn't
involve the Excel program. You already mentioned
LibreOffice/OpenOffice in your first response, which I guess is viable
but extremely heavyweight. xlwings doesn't get him any closer to his
goal.

John Y.

Álvaro Justen [Turicas]

unread,
Aug 26, 2016, 6:14:41 PM8/26/16
to python...@googlegroups.com
You can use the rows library to extract information from HTML (data
inside a <table> or using XPath) -- it uses lxml under the hood, but
have a pretty simple and automatic API (automatically identify and
convert data types, for example). Take a look at:
https://github.com/turicas/rows

Regards,
Álvaro Justen "Turicas"
http://turicas.info/ http://twitter.com/turicas
http://CursoDeArduino.com.br/ http://github.com/turicas
+55 21 9 9898-0141
> --
> You received this message because you are subscribed to the Google Groups
> "python-excel" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to python-excel...@googlegroups.com.
> To post to this group, send email to python...@googlegroups.com.
> Visit this group at https://groups.google.com/group/python-excel.
> For more options, visit https://groups.google.com/d/optout.

Victor Hsu

unread,
Aug 27, 2016, 10:13:05 AM8/27/16
to python-excel
Thanks all for suggestion.

It seems that the file exported from JIRA or DevSuite is a HTML format instead of standard Excel file(although the file extension is .xls. and Excel recognize this format)
I think I can parse the HTML file with BeautifulSoup and convert into a real Excel 2010 format easily.

Thanks.

Victor


Victor Hsu於 2016年8月26日星期五 UTC+8下午9時00分30秒寫道:
Reply all
Reply to author
Forward
0 new messages