I have to return some excel with data entered through the web by the users.
To do that, I crafted this helper function:
from defusedcsv import csv
from openpyxl.cell.cell import Cell
from typing import Any
def safe_set_cell_value(cell: Cell, value: Any) -> None:
"""This helper is used to prevent CSV injection attacks by forcing the
value to be a string if it contains any special characters.
The library defusedcsv prepends a single quote to any string that seems nasty.
If we note that defusedcsv has modified the string,
we force the cell to be a string.
This will prevent Excel from interpreting the string as a formula.
e.g. `=3+2` will be not be interpreted as a formula, but displayed as a string.
"""
if not isinstance(value, str):
# We don't want to do anything with non-string
# values, so we just set the value and return.
cell.value = value
return
value = value.strip()
cell.value = value
if csv._escape(value) != value:
# If we see that defusedcsv would modify the value, we force the cell to be a string.
cell.data_type = "s"
It works as expected: when I open the file, it does not evaluate the formulas.
Is there any better way to do that?
On 18 Feb 2025, at 12:41, Alessandro Pisa wrote:
It works as expected: when I open the file, it does not evaluate the formulas.
Is there any better way to do that?
What is it exactly that you're trying to do? Could you perhaps provide a couple of rows of sample data?
Openpyxl itself uses the heuristic that if value.startswith("=")
it should be a formula. You can easily reverse this by setting the data type to "s".
Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Sengelsweg 34
Düsseldorf
D- 40489
Tel: +49-203-3925-0390
Mobile: +49-178-782-6226
On 18 Feb 2025, at 12:41, Alessandro Pisa wrote:
It works as expected: when I open the file, it does not evaluate the formulas.
Is there any better way to do that?
What is it exactly that you're trying to do? Could you perhaps provide a couple of rows of sample data?
Openpyxl itself uses the heuristic that if value.startswith("=") it should be a formula. You can easily reverse this by setting the data type to "s".
On 18 Feb 2025, at 22:27, Alessandro Pisa wrote:
I have a web application.
It gathers data from the users.
It aggregates the data in an excel file generated with openpyxl.
If the data are not parsed, an injection attack can happen.See:
Microsoft's own approach to this is to disable certain function for files that come from the network, ie. a block-everything approach.
Openpyxl itself uses the heuristic that if value.startswith("=") it should
be a formula. You can easily reverse this by setting the data type to "s".Thanks, I got that.
I also saw that if the cell is a CellRichText one, data_type is not set
to "f".
IMO, it would be better to set the data_type to "f" if and only if value is
something marked on purpose as a formula.
Well, you could use the tokeniser and/or PyCel to analyse the function and and see whether it uses functions you'd like to block. But, if you're really worried about user input, then the best solution is to make it impossible to enter functions entirely. This should ideally be done before you start adding data to the worksheet and you can choose an alternative value or to set the cell to an error. Otherwise simply set the data_type to "s".
row = [value if not value.startswith("=") else None for value in row]
Microsoft's own approach to this is to disable certain function for files that come from the network, ie. a block-everything approach.
Well, you could use the tokeniser and/or PyCel to analyse the function and and see whether it uses functions you'd like to block. But, if you're really worried about user input, then the best solution is to make it impossible to enter functions entirely. This should ideally be done before you start adding data to the worksheet and you can choose an alternative value or to set the cell to an error. Otherwise simply set the data_type to "s".
row = [value if not value.startswith("=") else None for value in row]