I have an Excel file with several sheets and lots of named ranges. Those are different parameters which I want to feed into a model I am developing with Python. Now the question is how I can make those parameters available in Python within a reasonable time.
I have tried different modules for reading the Excel data in Python
and the one I am most comfortable with is openpyxl (which also seems
like one of the most popular).
The issue I am having now is the runtime which is unbearably long. For a
range of just 300 rows x 100 columns = 30,000 cells it takes more than
30min, which is frustrating since other programs can read those data
within seconds.
I am not an expert in Python or data structures so I appreciate any
hint, where I could improve my code, so it won't take several hours for
bigger ranges.
wb = openpyxl.load_workbook(path, data_only=True, read_only=True)
parameter_names = [i.name for i in wb.get_named_ranges()]
parameters = {}
for parameter in parameter_names: ws, cellrange = next(wb.defined_names[parameter].destinations)if len(wb[ws][cellrange][0]) > 1: # 2d listparameters[parameter] = [[wb[ws][cellrange][row][column].value for column in range(len(wb[ws][cellrange][row]))] for row in range(len(wb[ws][cellrange]))]else: # 1d listparameters[parameter] = [wb[ws][cellrange][row][0].value for row in range(len(wb[ws][cellrange]))]
It would be great if someone who had a similar problem or just knows how to deal with such issues can share their insights. Either by showing me how I can modify my own code (which would be ideal) or by suggesting a different approach. Thanks a lot!
ws, cellrange = next(wb.defined_names["snw"].destinations)
snw = [[]]*len(wb[ws][cellrange]) # create as many subsets as rows in cellrange
for row in range(len(wb[ws][cellrange])):
print(row)
for col in range(len(wb[ws][cellrange][row])):
snw[row].append(wb[ws][cellrange][row][col].value)
print(snw)