Avoid injections attacks

146 views
Skip to first unread message

Alessandro Pisa

unread,
Feb 18, 2025, 6:42:10 AMFeb 18
to openpyx...@googlegroups.com

I have to return some excel with data entered through the web by the users.

To do that, I crafted this helper function:

from defusedcsv import csv
from openpyxl.cell.cell import Cell
from typing import Any

def safe_set_cell_value(cell: Cell, value: Any) -> None:
    """This helper is used to prevent CSV injection attacks by forcing the
    value to be a string if it contains any special characters.

    The library defusedcsv prepends a single quote to any string that seems nasty.

    If we note that defusedcsv has modified the string,
    we force the cell to be a string.

    This will prevent Excel from interpreting the string as a formula.

    e.g. `=3+2` will be not be interpreted as a formula, but displayed as a string.
    """
    if not isinstance(value, str):
        # We don't want to do anything with non-string
        # values, so we just set the value and return.
        cell.value = value
        return

    value = value.strip()
    cell.value = value

    if csv._escape(value) != value:
        # If we see that defusedcsv would modify the value, we force the cell to be a string.
        cell.data_type = "s"

It works as expected: when I open the file, it does not evaluate the formulas.

Is there any better way to do that?

--

Charlie Clark

unread,
Feb 18, 2025, 6:51:23 AMFeb 18
to openpyx...@googlegroups.com

On 18 Feb 2025, at 12:41, Alessandro Pisa wrote:

It works as expected: when I open the file, it does not evaluate the formulas.

Is there any better way to do that?

What is it exactly that you're trying to do? Could you perhaps provide a couple of rows of sample data?

Openpyxl itself uses the heuristic that if value.startswith("=") it should be a formula. You can easily reverse this by setting the data type to "s".

Charlie

--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Sengelsweg 34
Düsseldorf
D- 40489
Tel: +49-203-3925-0390
Mobile: +49-178-782-6226

Alessandro Pisa

unread,
Feb 18, 2025, 4:27:28 PMFeb 18
to openpyxl-users
Il giorno martedì 18 febbraio 2025 alle 12:51:23 UTC+1 charli...@clark-consulting.eu ha scritto:

On 18 Feb 2025, at 12:41, Alessandro Pisa wrote:

It works as expected: when I open the file, it does not evaluate the formulas.

Is there any better way to do that?

What is it exactly that you're trying to do? Could you perhaps provide a couple of rows of sample data?

I have a web application.
It gathers data from the users.
It aggregates the data in an excel file generated with openpyxl.
If the data are not parsed, an injection attack can happen.

See:


Openpyxl itself uses the heuristic that if value.startswith("=") it should be a formula. You can easily reverse this by setting the data type to "s".

Thanks, I got that.
I also saw that if the cell is a CellRichText one, data_type is not set to "f".
IMO, it would be better to set the data_type to "f" if and only if value is something marked on purpose as a formula.

Charlie Clark

unread,
Feb 19, 2025, 6:52:51 AMFeb 19
to openpyxl-users

On 18 Feb 2025, at 22:27, Alessandro Pisa wrote:

I have a web application.
It gathers data from the users.
It aggregates the data in an excel file generated with openpyxl.
If the data are not parsed, an injection attack can happen.

See:

Microsoft's own approach to this is to disable certain function for files that come from the network, ie. a block-everything approach.

Openpyxl itself uses the heuristic that if value.startswith("=") it should
be a formula. You can easily reverse this by setting the data type to "s".

Thanks, I got that.
I also saw that if the cell is a CellRichText one, data_type is not set
to "f".
IMO, it would be better to set the data_type to "f" if and only if value is
something marked on purpose as a formula.

Well, you could use the tokeniser and/or PyCel to analyse the function and and see whether it uses functions you'd like to block. But, if you're really worried about user input, then the best solution is to make it impossible to enter functions entirely. This should ideally be done before you start adding data to the worksheet and you can choose an alternative value or to set the cell to an error. Otherwise simply set the data_type to "s".

row = [value if not value.startswith("=") else None for value in row]

Alessandro Pisa

unread,
Feb 19, 2025, 8:25:52 AMFeb 19
to openpyx...@googlegroups.com
On Wed, 19 Feb 2025 at 12:52, Charlie Clark wrote:

Microsoft's own approach to this is to disable certain function for files that come from the network, ie. a block-everything approach.

Well, you could use the tokeniser and/or PyCel to analyse the function and and see whether it uses functions you'd like to block. But, if you're really worried about user input, then the best solution is to make it impossible to enter functions entirely. This should ideally be done before you start adding data to the worksheet and you can choose an alternative value or to set the cell to an error. Otherwise simply set the data_type to "s".

row = [value if not value.startswith("=") else None for value in row]

Thanks for your insights!
Very much appreciated :)
Reply all
Reply to author
Forward
0 new messages