Detecting formula errors in a worksheet

105 views
Skip to first unread message

David Orme

unread,
Jan 11, 2022, 8:45:52 AM1/11/22
to openpyxl-users
Hi,

I'm using openpyxl to validate the contents of a workbook and one of the things I'd like to be able to do is warn users about errors in formulae. So for example if G22 contains: '=notafunction()`  then Excel displays '#NAME?'. 

I'm upgrading from `xlrd`, which had a specific enum for the cell type (xlrd.XL_CELL_ERROR). Is there a way to detect these in openpyxl?

Cheers,
David

Charlie Clark

unread,
Jan 11, 2022, 9:37:46 AM1/11/22
to openpyxl-users
On 11 Jan 2022, at 14:45, David Orme wrote:

> Hi,
>
> I'm using openpyxl to validate the contents of a workbook and one of the
>
> things I'd like to be able to do is warn users about errors in formulae. So
>
> for example if G22 contains: '=notafunction()\` then Excel displays
>
> '#NAME?'.
>
> I'm upgrading from \`xlrd\`, which had a specific enum for the cell type (xlrd.XL\_CELL\_ERROR).
>
> Is there a way to detect these in openpyxl?

You can check the cell's data_type attribute, "e" corresponds to an error. You can also use the tokeniser to check whether formulae are formally valid and also check the formulae names using the utils package.

Charlie

--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Sengelsweg 34
Düsseldorf
D- 40489
Tel: +49-203-3925-0390
Mobile: +49-178-782-6226

David Orme

unread,
Jan 11, 2022, 9:43:10 AM1/11/22
to openpyxl-users
Many thanks - that's great. I'd found a bit more info and it seems that the data_type 'e' is only set when the workbook has data_only=True? Otherwise, all formulae have data_type 'f'.

I've also been using iter_rows with value_only=True - sort of assuming that this is likely to be faster and lower memory than the default. That of course loses data_type, so if I want to stick with that then it comes down to matching cell values against the small set of possible error codes? I'm not sure whether the performance cost of that is likely to be greater than using values_only=False!

David

Charlie Clark

unread,
Jan 11, 2022, 9:57:31 AM1/11/22
to openpyxl-users
On 11 Jan 2022, at 15:43, David Orme wrote:

> Many thanks - that's great. I'd found a bit more info and it seems that the
>
> data\_type 'e' is only set when the workbook has data\_only=True? Otherwise,
>
> all formulae have data\_type 'f'.

Welcome to the wonderful world of insufficient typing. Whether something is an error or not can only be determined at runtime. For openpyxl, the value is either a formula or it's something else. This is actually very important disambiguation.

> I've also been using iter\_rows with value\_only=True - sort of assuming that
>
> this is likely to be faster and lower memory than the default.

Only in read-only mode does it make much difference.

> That of course loses data\_type, so if I want to stick with that then it comes down
>
> to matching cell values against the small set of possible error codes? I'm
>
> not sure whether the performance cost of that is likely to be greater than
>
> using values\_only=False!

Depends on the size of the worksheets, but it sounds to me like something you don't need to worry about (premature optimisation). I wouldn't advise you to rely on the data type for formulae, but if you want you can just have two copies of the file in read-only mode. Better still is to use something like PyCell, which uses our tokeniser, to validate formulae directly.

David Orme

unread,
Jan 11, 2022, 10:02:53 AM1/11/22
to openpyxl-users

Welcome to the wonderful world of insufficient typing. Whether something is an error or not can only be determined at runtime. For openpyxl, the value is either a formula or it's something else. This is actually very important disambiguation.

Ha - very true.
 
Depends on the size of the worksheets, but it sounds to me like something you don't need to worry about (premature optimisation). I wouldn't advise you to rely on the data type for formulae, but if you want you can just have two copies of the file in read-only mode. Better still is to use something like PyCell, which uses our tokeniser, to validate formulae directly.

I am using read_only =True. 90% of the files I want to validate are small but some of them are large (lots of Mb) and I want to keep the memory down as much as possible. I am prone to premature optimisation though. Thanks very much for the explanation and design advice.

David
Reply all
Reply to author
Forward
0 new messages