Reading embedded files in Excel

969 views
Skip to first unread message

martin.o...@gmail.com

unread,
Feb 3, 2016, 3:01:25 PM2/3/16
to openpyxl-users
Hello,

I have an .xlsx file with embedded objects, both PDF & Word docx files. I can double click them in Excel and view them. When I iterate over the cells, the "value" attribute is None, as is the internal_value attribute of the cell with the embedded object.

How do I (can I?) access the embedded file?

Thanks

Charlie Clark

unread,
Feb 4, 2016, 5:47:56 AM2/4/16
to openpyx...@googlegroups.com
Am .02.2016, 21:01 Uhr, schrieb <martin.o...@gmail.com>:

> Hello,
>
> I have an .xlsx file with embedded objects, both PDF & Word docx files. I
> can double click them in Excel and view them. When I iterate over the
> cells, the "value" attribute is None, as is the internal_value attribute
> of the cell with the embedded object.

People put the strangest things in Excel files it seems!

> How do I (can I?) access the embedded file?

Because Excel files are just zip archives I'd simply unzip the file and
extract the files like that.

Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Kronenstr. 27a
Düsseldorf
D- 40217
Tel: +49-211-600-3657
Mobile: +49-178-782-6226

martin.o...@gmail.com

unread,
Feb 4, 2016, 9:39:08 AM2/4/16
to openpyxl-users
Right you are Charlie. I figured it'd be more complex than that. For anyone who stumbled upon this issue, here's a Python snippet to extract any embedded objects:

import zipfile


filename = 'Sample.xlsx'

prefix = filename.split('.xlsx')[0]

embed_path = '%s/xl/embeddings/' % prefix


embedded = []


with zipfile.ZipFile(filename, 'r') as fd:

   for zipinfo in fd.infolist():

       fn = zipinfo.filename

       # Ignore the directory itself, we just want the files.

       if fn != embed_path and fn.startswith(embed_path):

           embedded.append(fn)

           print "Extracting %s" % fn

           fd.extract(zipinfo)

Charlie Clark

unread,
Feb 4, 2016, 9:47:04 AM2/4/16
to openpyx...@googlegroups.com
Am .02.2016, 15:39 Uhr, schrieb <martin.o...@gmail.com>:

> Right you are Charlie. I figured it'd be more complex than that. For
> anyone who stumbled upon this issue, here's a Python snippet to extract
> any
> embedded objects:
>
> import zipfile
>
> filename = 'Sample.xlsx'
>
> prefix = filename.split('.xlsx')[0]

os.path provides nice utilities for this

> embed_path = '%s/xl/embeddings/' % prefix
>
> embedded = []
>
> with zipfile.ZipFile(filename, 'r') as fd:
>
> for zipinfo in fd.infolist():
>
> fn = zipinfo.filename
>
> # Ignore the directory itself, we just want the files.
>
> if fn != embed_path and fn.startswith(embed_path):

Use fd.namelist() to get just the files.
Reply all
Reply to author
Forward
0 new messages