Find external links

88 views
Skip to first unread message

Andrea Gavana

unread,
Apr 23, 2012, 8:27:25 AM4/23/12
to python...@googlegroups.com
Hi All,

I have surfed the web without much success, so I thought I may ask
for advice here. I am trying to find a way to find and enumerate the
internal/external links in an Excel spreadsheet (for a very large
number of Excel files). By internal/external links I mean the names of
the linked documents inside an Excel spreadsheet as outlined here:

http://msdn.microsoft.com/en-us/library/aa195780(v=office.11).aspx

I have tried with an Excel macro and also with pywin32, and I got
almost there. However, the drawback of this approach is that VB macros
and pywin32 scripts need to actually *run* Excel to find out what's
inside, and with some many files (and some of them are corrupted, old,
messy, etc...) after a while Excel becomes unstable and my whole
script crashes.

On the other hand, xlrd just reads the Excel file without actually
running Excel, so I thought it would be a good solution for my problem
The issue is, I am not sure if this information (i.e., the LinkSources
array) can be retrieved using xlrd or not. If it's possible, could you
please let me know where I should look in the documentation/sources?
If it is not possible, does anyone know how/where I should be
modifying the xlrd source code to extract that information from an
Excel file?

Thank you in advance for your help.

Andrea.

"Imagination Is The Only Weapon In The War Against Reality."
http://xoomer.alice.it/infinity77/

John Machin

unread,
Apr 23, 2012, 4:28:33 PM4/23/12
to python...@googlegroups.com


On Monday, April 23, 2012 10:27:25 PM UTC+10, Infinity77 wrote:
Hi All,

    I have surfed the web without much success, so I thought I may ask
for advice here. I am trying to find a way to find and enumerate the
internal/external links in an Excel spreadsheet (for a very large
number of Excel files). By internal/external links I mean the names of
the linked documents inside an Excel spreadsheet as outlined here:

http://msdn.microsoft.com/en-us/library/aa195780(v=office.11).aspx

[snip]

On the other hand, xlrd just reads the Excel file without actually
running Excel, so I thought it would be a good solution for my problem
The issue is, I am not sure if this information (i.e., the LinkSources
array) can be retrieved using xlrd or not. If it's possible, could you
please let me know where I should look in the documentation/sources?

The xlrd documentation is not very long at all. Reading the whole of the documentation is highly recommended.
 

If it is not possible, does anyone know how/where I should be
modifying the xlrd source code to extract that information from an
Excel file?

xlrd does not currently extract the external references.

The best introduction is http://sc.openoffice.org/excelfileformat.pdf section 4.10.3 References in BIFF8 ... depending on your definition of "old", you may need the earlier section about BIFF5.

You would also need the file [MS-XLS].pdf from http://msdn.microsoft.com/en-us/library/cc313154%28v=office.12%29.aspx

The source file to work on would be book.py

Cheers,
John

Andrea Gavana

unread,
Apr 23, 2012, 4:52:03 PM4/23/12
to python...@googlegroups.com
On 23 April 2012 22:28, John Machin wrote:
>
>
> On Monday, April 23, 2012 10:27:25 PM UTC+10, Infinity77 wrote:
>>
>> Hi All,
>>
>>     I have surfed the web without much success, so I thought I may ask
>> for advice here. I am trying to find a way to find and enumerate the
>> internal/external links in an Excel spreadsheet (for a very large
>> number of Excel files). By internal/external links I mean the names of
>> the linked documents inside an Excel spreadsheet as outlined here:
>>
>> http://msdn.microsoft.com/en-us/library/aa195780(v=office.11).aspx
>>
>> [snip]
>>
>> On the other hand, xlrd just reads the Excel file without actually
>> running Excel, so I thought it would be a good solution for my problem
>> The issue is, I am not sure if this information (i.e., the LinkSources
>> array) can be retrieved using xlrd or not. If it's possible, could you
>> please let me know where I should look in the documentation/sources?
>
> The xlrd documentation is not very long at all. Reading the whole of the
> documentation is highly recommended.

I could quote most of it by memory... my IP address has got to be the
most frequent visitor of the xlrd documentation page.

>> If it is not possible, does anyone know how/where I should be
>> modifying the xlrd source code to extract that information from an
>> Excel file?
>
> xlrd does not currently extract the external references.
>
> The best introduction is http://sc.openoffice.org/excelfileformat.pdf
> section 4.10.3 References in BIFF8 ... depending on your definition of
> "old", you may need the earlier section about BIFF5.
>
> You would also need the file [MS-XLS].pdf from
> http://msdn.microsoft.com/en-us/library/cc313154%28v=office.12%29.aspx
>
> The source file to work on would be book.py

Thanks, I'll take a look at it and see if I can get something
meaningful out of my coding.

Derek McHugh

unread,
Feb 2, 2015, 11:32:19 AM2/2/15
to python...@googlegroups.com
Hi Andrea,

Did you have any luck with this? Been looking around for something similar to no avail.

Derek
Reply all
Reply to author
Forward
0 new messages