#36823: Data loss using `bulk_create()` on Django 5.2 due to Postgres `UNNEST` and
explicit cast truncating
-------------------------------------+-------------------------------------
Reporter: James Beith | Type: Bug
Status: new | Component: Database
| layer (models, ORM)
Version: 5.2 | Severity: Normal
Keywords: postgres bulk | Triage Stage:
create unnest | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
In Django 5.2, `bulk_create()` can use a faster strategy that inserts rows
via `UNNEST` of typed arrays ([
https://code.djangoproject.com/ticket/35936
feature ticket]). That strategy typically involves explicitly casting the
arrays to the destination column types (e.g. `varchar(8)[]`). So if we've
a model field `models.CharField(max_length=8)` then PostgreSQL explicitly
casts to `varchar(n)` which is a **silent truncate** to length `n`.
Previously, in Django 5.1 which doesn't use `UNNEST`, if we use
`bulk_create()` passing an instance that has a value longer than 8
characters (e.g. `"AAAABBBBC"`) for the field
`models.CharField(max_length=8)` we'd get a `DataError` raised and no
database row persisted. On Django 5.2, the value is truncated to
`"AAAABBBB"` and the row is persisted in the database.
This truncation behaviour is a [
https://www.postgresql.org/docs/current
/datatype-character.html#DATATYPE-CHARACTER desired SQL behaviour]:
> However, if one explicitly casts a value to character varying(n) or
character(n), then an over-length value will be truncated to n characters
without raising an error. (This too is required by the SQL standard.)
Further examples can be [
https://dbfiddle.uk/cdkWx9O5 seen here].
--
Ticket URL: <
https://code.djangoproject.com/ticket/36823>
Django <
https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.