On 11 Jan 2022, at 15:43, David Orme wrote:
> Many thanks - that's great. I'd found a bit more info and it seems that the
>
> data\_type 'e' is only set when the workbook has data\_only=True? Otherwise,
>
> all formulae have data\_type 'f'.
Welcome to the wonderful world of insufficient typing. Whether something is an error or not can only be determined at runtime. For openpyxl, the value is either a formula or it's something else. This is actually very important disambiguation.
> I've also been using iter\_rows with value\_only=True - sort of assuming that
>
> this is likely to be faster and lower memory than the default.
Only in read-only mode does it make much difference.
> That of course loses data\_type, so if I want to stick with that then it comes down
>
> to matching cell values against the small set of possible error codes? I'm
>
> not sure whether the performance cost of that is likely to be greater than
>
> using values\_only=False!
Depends on the size of the worksheets, but it sounds to me like something you don't need to worry about (premature optimisation). I wouldn't advise you to rely on the data type for formulae, but if you want you can just have two copies of the file in read-only mode. Better still is to use something like PyCell, which uses our tokeniser, to validate formulae directly.