{{{
>>> from django.utils.safestring import SafeBytes, SafeText
>>> from django.utils.encoding import force_text
>>> type(force_text(SafeText('')))
django.utils.safestring.SafeText
>>> type(force_text(SafeBytes(b'')))
str
}}}
This causes byte strings run through `mark_safe` and rendered in a
template to be incorrectly escaped.
{{{
>>> from django.template import Template, Context
>>> from django.utils.safestring import mark_safe
>>> Template('{{ x }}').render(Context({'x': mark_safe(b'&')}))
'&'
>>> Template('{{ x }}').render(Context({'x': mark_safe('&')}))
'&'
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/28121>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
Comment (by Tim Graham):
Could you give a use case where the current behavior causes a problem? Is
it certain that the that text version of an arbitrary bytestring is also
safe?
--
Ticket URL: <https://code.djangoproject.com/ticket/28121#comment:1>
* Attachment "28121_1_8.patch" added.
Test and patch for 1.8
* Attachment "28121_1_10.patch" added.
Test and patch for 1.10
* Attachment "28121_1_11.patch" added.
Test and patch for 1.11
* Attachment "28121_master.patch" added.
Test and patch for master
Comment (by Thomas Achtemichuk):
Added some patches against various stable branches and master. Not sure of
the process for submitting PRs - is one per branch OK?
Also see that SafeBytes has been deprecated for internal use in 2.0, so
perhaps best just to ignore the patch against master.
--
Ticket URL: <https://code.djangoproject.com/ticket/28121#comment:2>
* status: new => closed
* resolution: => wontfix
Comment:
Based on the [https://docs.djangoproject.com/en/dev/internals/release-
process/#supported-versions supported versions policy], the patch doesn't
seem to qualify for a backport to the stable branches, so closing as
wontfix since the issue isn't really applicable on master which supports
Python 3 only.
--
Ticket URL: <https://code.djangoproject.com/ticket/28121#comment:3>
Comment (by Thomas Achtemichuk):
Tim,
This came up when bootstrapping a SPA's template with the output of DRF's
`JSONRenderer` which produces utf-8 encoded json. Something like the
following:
{{{
def app_home(request):
return render(
request,
'app_base.html',
{'init_data':
mark_safe(JSONRenderer().render(SomeSerializer.data))}
)
}}}
We're preparing to switch over to python3, and this bug has lead to a
fairly extensive audit of everywhere we use `mark_safe` and pass values
into templates.
Is it certain that the that text version of an arbitrary bytestring is
also safe
If it isn't, then the way that `force_text` has behaved under PY2 for the
last 5+ years should be examined:
{{{
>>> type(force_text(SafeBytes(b'&')))
django.utils.safestring.SafeText
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/28121#comment:4>
* status: closed => new
* resolution: wontfix =>
Comment:
Tim,
Reopening as I didn't make clear in my initial report that the behavior
differs between PY3:
{{{
>>> type(force_text(SafeBytes(b'&')))
str
}}}
and PY2:
{{{
>>> type(force_text(SafeBytes(b'&')))
django.utils.safestring.SafeText
}}}
If this behavior is incorrect under PY2, let me know and I'll open another
ticket to address it. But it definitely seems one of the above is
incorrect.
--
Ticket URL: <https://code.djangoproject.com/ticket/28121#comment:5>
Comment (by Thomas Achtemichuk):
Also, there is this, fairly explicit comment in `force_text` that makes me
believe that the behavior under PY3 is wrong:
{{{
# Note: We use .decode() here, instead of six.text_type(s,
encoding,
# errors), so that if s is a SafeBytes, it ends up being a
# SafeText at the end.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/28121#comment:6>
* version: master => 1.11
Old description:
> Under python 3 & Django 1.8.18, 1.9.13, 1.10.7, 1.11 and master, calling
> `force_text` on an instance of `SafeBytes` causes a `str` to be returned
> rather than an instance of `SafeText`.
>
> {{{
> >>> from django.utils.safestring import SafeBytes, SafeText
> >>> from django.utils.encoding import force_text
> >>> type(force_text(SafeText('')))
> django.utils.safestring.SafeText
> >>> type(force_text(SafeBytes(b'')))
> str
> }}}
>
> This causes byte strings run through `mark_safe` and rendered in a
> template to be incorrectly escaped.
>
> {{{
> >>> from django.template import Template, Context
> >>> from django.utils.safestring import mark_safe
> >>> Template('{{ x }}').render(Context({'x': mark_safe(b'&')}))
> '&'
> >>> Template('{{ x }}').render(Context({'x': mark_safe('&')}))
> '&'
> }}}
New description:
Under python 3 & Django 1.8.18, 1.9.13, 1.10.7, 1.11 and master, calling
`force_text` on an instance of `SafeBytes` causes a `str` to be returned
rather than an instance of `SafeText`.
{{{
>>> from django.utils.safestring import SafeBytes, SafeText
>>> from django.utils.encoding import force_text
>>> type(force_text(SafeText('')))
django.utils.safestring.SafeText
>>> type(force_text(SafeBytes(b'')))
str
}}}
This causes byte strings run through `mark_safe` and rendered in a
template to be incorrectly escaped.
{{{
>>> from django.template import Template, Context
>>> from django.utils.safestring import mark_safe
>>> Template('{{ x }}').render(Context({'x': mark_safe(b'&')}))
'&'
>>> Template('{{ x }}').render(Context({'x': mark_safe('&')}))
'&'
}}}
Edit: This behavior differs from the same code run under PY2:
{{{
>>> type(force_text(SafeBytes(b'&')))
django.utils.safestring.SafeText
}}}
And disagrees with the comment in force_text:
{{{
# Note: We use .decode() here, instead of six.text_type(s,
encoding,
# errors), so that if s is a SafeBytes, it ends up being a
# SafeText at the end.
}}}
--
--
Ticket URL: <https://code.djangoproject.com/ticket/28121#comment:7>
Comment (by Tim Graham):
Even so, I don't think the patch would qualify for a backport based on the
[https://docs.djangoproject.com/en/dev/internals/release-process
/#supported-versions supported versions policy] as the behavior has
existed as long as Django has supported Python 3, correct?
--
Ticket URL: <https://code.djangoproject.com/ticket/28121#comment:8>
Comment (by Thomas Achtemichuk):
It could be argued that this satisfies both: "Functionality bug in newly-
introduced features" (the feature being PY3 support), and "Regressions
from older versions of Django." Since the 1.0 release a decade ago, when
variable auto-escaping was added, `force_text` and `force_unicode` before
it have always passed "safe" bytestrings through as "safe" unicode
strings. I guess the question is, according to the "rule of thumb" in the
supported versions policy:
Had this been discovered in the lead-up to the 1.6 release (PY3 support),
would the different behavior between PY2 and PY3 been a release blocker?
I'd assume that the goal of all that hard work was to have Django function
the identically under PY2 and PY3, and any difference in behavior would
have been a blocker. As someone who is doing the a bunch of that same hard
work right now in my own codebase, that change in behavior causing a unit
test (and entire app) to fail is definitely a blocker for me.
The other consideration would be: "Would changing this behavior under PY3
//now// break anything in existing codebases?"
To which my answer would be: If one has code that relies on auto-escaping
a bytestring //explicitly// passed through `mark_safe`, and only under
PY3... That's not the type of code worth supporting instead of fixing
inconsistent behavior between python versions.
--
Ticket URL: <https://code.djangoproject.com/ticket/28121#comment:9>
Comment (by Claude Paroz):
About the initial use case: considering a HTML template should be
basically text, not bytes, what about decoding your UTF-8 encoded stream
before passing it to `mark_safe`?
--
Ticket URL: <https://code.djangoproject.com/ticket/28121#comment:10>
Comment (by Thomas Achtemichuk):
Claude, yes, that's what I've done to work around this.
--
Ticket URL: <https://code.djangoproject.com/ticket/28121#comment:11>
* cc: tom@… (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/28121#comment:12>
Comment (by Aymeric Augustin):
Hrm. I realize I have no idea what `SafeBytes` are.
If you don't know the charset of the document in which you're going to
interpolate these bytes, you have no idea what unicode codepoints they'll
map to and you cannot make any guarantees about their safety in a HTML
context.
It would be tempting to say "they're in DEFAULT_CHARSET", but that's too
fragile for a security-critical feature. They could still be interpolated
into something in another charset.
IMO the only way to fix this is to remove `SafeBytes`. I can't see a way
to define it in a way that makes sense from a security perspective, short
of annotating them with a charset, but then we've reinvented text strings.
--
Ticket URL: <https://code.djangoproject.com/ticket/28121#comment:13>
Comment (by Aymeric Augustin):
In any case, the Python 3 behavior seems correct to me, the Python 2
behavior seems dubious from a security perspective.
--
Ticket URL: <https://code.djangoproject.com/ticket/28121#comment:14>
* status: new => closed
* resolution: => wontfix
Comment:
I completely agree with Aymeric, there is no such thing as `SafeBytes`. It
has already almost disappeared on master anyway.
Tom, what you call a workaround is probably the right thing to do.
--
Ticket URL: <https://code.djangoproject.com/ticket/28121#comment:15>
Comment (by Jon Dufresne):
I agree that using `SafeBytes` is incorrect use here.
As the type is no longer used internally and only kept for reusable apps
supporting Python2, should the class be formally deprecated with warnings
and docs? If so, I don't mind doing the necessary changes.
--
Ticket URL: <https://code.djangoproject.com/ticket/28121#comment:16>
Comment (by Aymeric Augustin):
Yes, I think we should deprecate SafeBytes and related bits of code, if
any.
--
Ticket URL: <https://code.djangoproject.com/ticket/28121#comment:17>
Comment (by Tim Graham):
Removing `SafeBytes` is included in #27753, "Cleanups when no supported
version of Django supports Python 2 anymore".
--
Ticket URL: <https://code.djangoproject.com/ticket/28121#comment:18>