Are there use cases for storing null bytes in CharField/TextField?

1,668 views
Skip to first unread message

Tim Graham

unread,
May 15, 2017, 11:54:35 AM5/15/17
to Django developers (Contributions to Django itself)
Does anyone know of a use case for using null bytes in CharField/TextField?

psycopg2 2.7+ raises ValueError("A string literal cannot contain NUL (0x00) characters.") when trying to save null bytes [0] and this exception is unhandled in Django which allow malicious form submissions to crash [1]. With psycopg2 < 2.7, there is no exception and null bytes are silently truncated by PostgreSQL. Other databases that I tested (SQLite, MySQL, Oracle) allow saving null bytes. This creates possible cross-database compatibility problems when moving data from those databases to PostgreSQL, e.g.[2].

I propose to have CharField and TextField strip null bytes from the value either a) only on PostgreSQL or b) on all databases. Please indicate your preference or suggest another solution.

[0] https://github.com/psycopg/psycopg2/issues/420
[1] https://code.djangoproject.com/ticket/28201 - Saving a Char/TextField with psycopg2 2.7+ raises ValueError: A string literal cannot contain NUL (0x00) characters is unhandled
[2] https://code.djangoproject.com/ticket/28117 - loaddata raises ValueError with psycopg2 backend when data contains null bytes

Adam Johnson

unread,
May 15, 2017, 12:12:25 PM5/15/17
to django-d...@googlegroups.com
The problem with (a) - data with null bytes in strings from other databases can't be loaded into PG as per #28117 .

The problem with (b) - data currently in databases in the wild will be modified upon save 😱

(b) is incredibly destructive and could break an unknown number of applications whilst (a) doesn't affect anyone until they try to migrate null-byte-strings into PG. I vote for (a), or (c) add form-level validation to (Char/Text)Field that null bytes aren't in the submitted string (for all databases) and error when trying to save them on PG.


--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/9897126d-b6ef-48f1-9f19-96ed98ce10e5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Adam

Michael Manfre

unread,
May 15, 2017, 12:14:27 PM5/15/17
to Django developers (Contributions to Django itself)
I imagine we won't hear of a use case until after the change happens and I'm some what strongly opposed to stripping potentially valid data from all databases because of a limitation of one. I'd be in favor of loaddata checking for null bytes and complaining when the backend doesn't support that feature.

Regards,
Michael Manfre

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.

Luke Plant

unread,
May 15, 2017, 12:18:38 PM5/15/17
to django-d...@googlegroups.com

I agree with Adam, we should never silently change submitted data at the model layer. My preference would be c), a form-level validation error that prevents saving.

Luke

To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.

Tim Chase

unread,
May 15, 2017, 1:31:19 PM5/15/17
to Tim Graham, Django developers (Contributions to Django itself)
On 2017-05-15 08:54, Tim Graham wrote:
> Does anyone know of a use case for using null bytes in
> CharField/TextField?

Is this not what BinaryField is for? It would seem to me that
attempting to store binary NULL bytes in a CharField/TextField should
result in an error condition.

-tkc



Claude Paroz

unread,
May 15, 2017, 2:24:24 PM5/15/17
to Django developers (Contributions to Django itself)
I also think that this should be handled at serialization level (form fields and (de)serialization framework).

Claude

Jani Tiainen

unread,
May 16, 2017, 5:11:38 AM5/16/17
to django-d...@googlegroups.com

Hi,

I would guess that one could use null byte to denote "empty field" in Oracle for example. (I recall seeing such a convention in one of our non-django apps). And that's to overcome limitation that Oracle doesn't have real concept of empty string so we stored single null byte to mark that.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.

-- 
Jani Tiainen

Tim Graham

unread,
May 19, 2017, 9:35:28 AM5/19/17
to Django developers (Contributions to Django itself)
If CharField/TextField have a form validation error if null bytes are in the input, are users going to be able to understand that error and fix it? I'm not sure if it's a probable case, but I'm thinking of a non-technical user who copy/pastes some text that includes a null byte.

Perhaps a " strip_null_bytes" model field option that defaults to True would be reasonable. That could be passed to the form field to toggle where or not that validation happens. Actually, three possible behaviors might be needed: silently strip null bytes, allow null bytes (an invalid option when using PostgreSQL), prohibit null bytes.

Tim Graham

unread,
May 29, 2017, 3:15:17 PM5/29/17
to Django developers (Contributions to Django itself)
A reply from Luke came only to me:
----

Is it even possible for null bytes to be copied into a text field and for them to be submitted? The original report uses Javascript to get them in there - https://code.djangoproject.com/ticket/28201

These pages suggests it is going to be really hard for users to be entering 0x00 accidentally:

- http://stackoverflow.com/questions/6961208/how-to-input-a-null-character-into-a-web-form

- https://superuser.com/questions/946533/is-there-any-way-to-copy-null-bytes-ascii-0x00-to-the-clipboard-on-windows

If we are talking about this kind of rare condition, I think a slightly obscure error message is fine. Defaulting to stripping any data would be a bad idea IMO, for the case where a backend is currently handling and storing 0x00 chars we should try hard not to break that.

Luke
----

I guess we'll proceed with the form field / serialization enhancements.

Jon Dufresne

unread,
May 31, 2017, 8:27:49 PM5/31/17
to django-d...@googlegroups.com, Tim Graham
The null byte is also a valid Unicode code point [0].

I guess I'm a bit surprised that a valid code point can't be stored in a PostgreSQL text column. This does appear to be documented for the char(int) string function [1], although without justification.

> The NULL (0) character is not allowed because text data types cannot store such bytes.

I'm curious behind PostgreSQL's decision to prohibit this code point. If anyone has additional information to share on their reason, please pass it along.


Cheers

Tim Graham

unread,
Jun 2, 2017, 12:09:54 PM6/2/17
to Django developers (Contributions to Django itself)
I found a PostgreSQL bug report requesting removal of the restriction. Here's the final reply:

Franklin Schmidt wrote:
> I agree that storing 0x00 in a UTF8 string is weird, but I am
> converting a huge database to postgres, and in a huge database, weird
> things happen.  Using bytea for a text field just because one in a
> million records has a 0x00 doesn't make sense to me.  I did hack
> around it in my conversion code to remove the 0x00 but I expect that
> anyone else who tries converting a big database to postgres will also
> confront this issue.

That's the right solution. If you have 0x00 bytes in your text fields, 
you're much better off cleaning them away anyway, than trying to work 
around them.
-Heikki Linnakangas


I also found a possible related discussion about supporting \u0000 in JSON values [1]. PostgreSQL tried to support it but had to remove that support because it caused ambiguity.

Reply all
Reply to author
Forward
0 new messages