Is there an efficient way to read named ranges from Excel into Python?

4,716 views
Skip to first unread message

tobias....@gmail.com

unread,
Jan 21, 2019, 10:33:09 AM1/21/19
to openpyxl-users

I have an Excel file with several sheets and lots of named ranges. Those are different parameters which I want to feed into a model I am developing with Python. Now the question is how I can make those parameters available in Python within a reasonable time.

I have tried different modules for reading the Excel data in Python and the one I am most comfortable with is openpyxl (which also seems like one of the most popular). The issue I am having now is the runtime which is unbearably long. For a range of just 300 rows x 100 columns = 30,000 cells it takes more than 30min, which is frustrating since other programs can read those data within seconds.
I am not an expert in Python or data structures so I appreciate any hint, where I could improve my code, so it won't take several hours for bigger ranges.


wb = openpyxl.load_workbook(path, data_only=True, read_only=True)
parameter_names
= [i.name for i in wb.get_named_ranges()]
parameters
= {}
for parameter in parameter_names: ws, cellrange = next(wb.defined_names[parameter].destinations)
if len(wb[ws][cellrange][0]) > 1: # 2d list
parameters[parameter] = [[wb[ws][cellrange][row][column].value for column in range(len(wb[ws][cellrange][row]))] for row in range(len(wb[ws][cellrange]))]
else: # 1d list
parameters[parameter] = [wb[ws][cellrange][row][0].value for row in range(len(wb[ws][cellrange]))]



It would be great if someone who had a similar problem or just knows how to deal with such issues can share their insights. Either by showing me how I can modify my own code (which would be ideal) or by suggesting a different approach. Thanks a lot!

Charlie Clark

unread,
Jan 21, 2019, 11:10:29 AM1/21/19
to openpyx...@googlegroups.com
Is there any chance you can provide a sample file?

Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Kronenstr. 27a
Düsseldorf
D- 40217
Tel: +49-211-600-3657
Mobile: +49-178-782-6226

tobias....@gmail.com

unread,
Jan 22, 2019, 5:34:17 AM1/22/19
to openpyxl-users
Here is an examplary sheet.
testfile.xlsx

Charlie Clark

unread,
Jan 22, 2019, 6:21:53 AM1/22/19
to openpyx...@googlegroups.com
Am .01.2019, 11:34 Uhr, schrieb <tobias....@gmail.com>:

> Here is an examplary sheet.

Thanks. The use of nested comprehensions in the code makes it hard to read
and I think maybe creating unnecessary loops. Another to note is that
columnar access in read-only mode is very inefficient. This is because
openpyxl has to continually reparse the worksheet for each row in the
column.

I'm not quite sure what you want but it looks like lists of the values for
the various names, eg. for

NewWarehouses

You want ['coordN', 'coordO', 'coordS', 'kmenge', 'knw', 'kow',
'NewWarehouses', 'OldWarehouses', 'smenge', 'snw', 'sow', 'Supplier']
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56,
57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92,
93, 94, 95, 96, 97, 98, 99, 100]

Is that right?

Charlie Clark

unread,
Jan 22, 2019, 7:28:54 AM1/22/19
to openpyx...@googlegroups.com
Am .01.2019, 12:21 Uhr, schrieb Charlie Clark
<charli...@clark-consulting.eu>:

> Is that right?

Assuming it is then the following code should work:

from openpyxl import load_workbook
from openpyxl.worksheet.cell_range import CellRange

wb = load_workbook("scratchpad/names.xlsx", data_only=True, read_only=True)

parameters = {}

for dn in wb.defined_names.definedName[:]:
sheet_name, cell_range = list(dn.destinations)[0]
cr = CellRange(cell_range)
width = cr.max_col - cr.min_col
ws = wb[sheet_name]
if not width:
cells = [cell.value for cell in [row[0] for row in ws[cell_range]]]
else:
cells = []
for row in ws[cell_range]:
cells.append([c.value for c in row])
parameters[dn.name] = cells


NB. that if you have lots of ranges in the same worksheet then things
might get faster if you don't use read-only mode.

tobias....@gmail.com

unread,
Jan 22, 2019, 7:53:06 AM1/22/19
to openpyxl-users
Exactly. Those are the named ranges and the example values are the values for 'NewWarehouses'.

Reading the one-dimensional arrays isn't fast, but it is done within a tolerable time. But looking at bigger (and 2d) fields, the runtime is prohibitively long.
For example for the named range "snw". Without the list comprehension it looks like the following. I thought there was no difference between list comprehension and a standard for loop regarding efficiency!? Or is there a better approach?

ws, cellrange = next(wb.defined_names["snw"].destinations)
snw = [[]]*len(wb[ws][cellrange]) # create as many subsets as rows in cellrange
for row in range(len(wb[ws][cellrange])):
print(row)
for col in range(len(wb[ws][cellrange][row])):
snw[row].append(wb[ws][cellrange][row][col].value)
print(snw)


Ironically, I included the read-only mode in order to speed up the reading process (I read that somewhere). Without the read-only mode it is indeed faster, albeit still quite slow...
Thanks for your help!

tobias....@gmail.com

unread,
Jan 22, 2019, 9:09:14 AM1/22/19
to openpyxl-users
With your code it works like a charm. Thank you so much for your help! I'll try to get the hang of the difference in computing time....

Charlie Clark

unread,
Jan 22, 2019, 9:58:43 AM1/22/19
to openpyx...@googlegroups.com
Am .01.2019, 15:09 Uhr, schrieb <tobias....@gmail.com>:

> With your code it works like a charm. Thank you so much for your help!
> I'll try to get the hang of the difference in computing time....

You must have some kind of n*p loop in there. Nested comprehensions can be
very tricky to debug. They also don't really save much time for anything
less than several thousand elements. Use them for any kind of throwaway
data structure, but only if you can tell at a glance what they're doing:
there are no bonuses in Python for fitting everything in one line!

In read-only mode you have to avoid lots of distinct random access. For
example, ws['K1'] and ws['K2'] will require all the cells in a worksheet
up to K1 and K2 to be parsed so it usually makes sense

I found your code ran slow in both modes but by commenting out the 2D
branch found it OK. So this is where the problem is, almost certainly down
to using x * y instead of just looping over the rows; in addition, of
course, to calling ws[cell_range] four times, though that's not really
noticeable in this small workbook.

Charlie

PS fwiw in openpyxl 2.6 you can use the values_only parameter with
iter_rows. So you end up with something like:

cells = []
for row in ws.iter_rows(min_row=cr.min_row, min_col=cr.min_col,
max_row=cr.max_row, max_col=cr.max_col, values_only=True):
cells.append(row)

I suppose it might make sense to allow iter_rows(*cr.bounds) but that
would require changing the signature of CellRange.bounds. Might make more
sense to allow it to be passed explicitly so the call can be less verbose.

Suleman Aziz

unread,
Feb 7, 2023, 1:21:54 PM2/7/23
to openpyxl-users
HI, 

I'm new to python and got a task to get all name ranges in a workbook and then read values from those name ranges but It's important to get the ranges names as well since I need to put those values in a model workbook using the same ranges names. 

I have tried the code mentioned above but that doesn't work e.g.

wb.get_named_ranges()  (throws an exception that 'workbook' object has no attribute 'get_names_ranges' - I read the documentation and it seemed this method is deprecated )

Similarly, 
wb.defined_names.definedName[:] (throws an exception that 'DefinedNamesDict' object has no attribute 'definedName' )

however, the following code works perfectly fine: 

wb.defined_names["TestRange"] but the issue is I don't have a names list of all name ranges and it will be dynamic that would vary model by model. 

Any help would be much appreciated! 

Regards,
Suleman

Suleman Aziz

unread,
Feb 7, 2023, 3:37:10 PM2/7/23
to openpyxl-users
I'm able to make it work with the following code. I'm not sure if this is the most optimized way to achieve the result I'm looking for: 

wb = load_workbook(path, data_only=True)
names = []

for dn in wb.defined_names:
names.append(dn)

#Filter Range names
names = filter_names_list(names) 
print(names)
Name, Range = [], []

for n in names:
dn = wb.defined_names[n]
print(dn.name)
print(dn.attr_text)


sheet_name, cell_range = list(dn.destinations)[0]
print(sheet_name)
print(cell_range) 

Please let me know if there is a better way to do it. 
Reply all
Reply to author
Forward
0 new messages