[Python-ideas] Draft PEP on string interpolation

Mike Miller

unread,

Aug 20, 2015, 7:10:44 PM8/20/15

to python...@python.org

The ground seems to be settling on the issue, so I have tried my hand at a grand
unified pep for string interpolation.

I originally started writing thinking I would fight arbitrary expressions,
though agreeing they would be very useful. In my research however, I discovered
that they've become an industry standard of sorts. So, I pivoted and started
thinking of mitigation strategies to reduce their downsides instead.

There's still plenty to do and details to iron out, I'd appreciate your help.
If this PEP doesn't stick I hope fragments of it can be useful for others.

https://bitbucket.org/mixmastamyk/docs/src/default/pep/pep-05XX.rst

(Pls excuse the inline links, I've not moved them to the footer yet.)

-Mike
_______________________________________________
Python-ideas mailing list
Python...@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

MRAB

unread,

Aug 20, 2015, 8:28:34 PM8/20/15

to python...@python.org

On 2015-08-21 00:10, Mike Miller wrote:
> The ground seems to be settling on the issue, so I have tried my hand at a grand
> unified pep for string interpolation.
>
> I originally started writing thinking I would fight arbitrary expressions,
> though agreeing they would be very useful. In my research however, I discovered
> that they've become an industry standard of sorts. So, I pivoted and started
> thinking of mitigation strategies to reduce their downsides instead.
>
> There's still plenty to do and details to iron out, I'd appreciate your help.
> If this PEP doesn't stick I hope fragments of it can be useful for others.
>
> https://bitbucket.org/mixmastamyk/docs/src/default/pep/pep-05XX.rst
>
> (Pls excuse the inline links, I've not moved them to the footer yet.)
>

In the "Composition with other prefixes" section, I don't like how
f'...' uses {} syntax but fb'...' uses % syntax.

In the "Environment Access" section, I would've expected f'Home folder:
${HOME}' to do ''.join('Home folder: $', HOME.__format__('')).

BTW, in the "Reference Implementation(s)" section, it's "its work", not
"it's work".

Guido van Rossum

unread,

Aug 20, 2015, 8:46:15 PM8/20/15

to Mike Miller, python...@python.org

Can you give a brief discussion of how your version differs from PEP 498? So far I've found:

- lots of language summarizing the discussion following PEPs 498 and 501
- %(name)s in byte strings (which I think is abominable)

- t prefix for translated strings

- some optional ideas (which I'm skipping for now)

Am I missing something?

Second, do you have a proposal for marking translatable strings that should be extracted by pygettext but not interpolated in the spot where they occur? (This is the N_(...) format from the pygettext docs.)

--

--Guido van Rossum (python.org/~guido)

Mike Miller

unread,

Aug 20, 2015, 8:58:18 PM8/20/15

to gu...@python.org, python...@python.org

I found the b'' idea on a recent message here between you and Nick I think, it
seemed interesting. It's gone now, as well as the typo, thanks MRAB.

The summary is, it is a superset of PEP 498 with i18n integrated, additional
background to inform the arbitrary expression decision, and security policy
specified for those who will be concerned about it.

As for deferring interpolation, I suppose I'd recommend existing N_... syntax
until I've had a bit more time to think about it.

-Mike

On 08/20/2015 05:39 PM, Guido van Rossum wrote:
> Can you give a brief discussion of how your version differs from PEP 498? So far
> I've found:
>
> - lots of language summarizing the discussion following PEPs 498 and 501
> - %(name)s in byte strings (which I think is abominable)
> - t prefix for translated strings
> - some optional ideas (which I'm skipping for now)
>
> Am I missing something?
>
> Second, do you have a proposal for marking translatable strings that should be
> extracted by pygettext but not interpolated in the spot where they occur? (This
> is the N_(...) format from the pygettext docs.)
>

Mike Miller

unread,

Aug 20, 2015, 10:39:48 PM8/20/15

to gu...@python.org, python...@python.org

Ok, not exactly it was this one, that I may have misunderstood:

https://mail.python.org/pipermail/python-ideas/2015-August/035347.html

On 08/20/2015 05:57 PM, Mike Miller wrote:
> I found the b'' idea on a recent message here between you and Nick I think, it
> seemed interesting. It's gone now, as well as the typo, thanks MRAB.

Mike Miller

unread,

Aug 20, 2015, 10:47:32 PM8/20/15

to gu...@python.org, python...@python.org

I'm guessing there's not much else we can do here but use another string prefix.
To stay consistent, I'd expect a slight modification to:

Nt'Hello {name}.'

However, if we were determined to reduce the number of prefixes, an alternative
might be to put a flag inside the string. My first thought is of django/jinja
templating comments:

t'{# deferred=1 #}Hello {name}.'

How does that sound?

-Mike

On 08/20/2015 05:39 PM, Guido van Rossum wrote:
> Second, do you have a proposal for marking translatable strings that shouldbe
> extracted by pygettext but not interpolated in the spot where they occur?

> (This the N_(...) format from the pygettext docs.)

Guido van Rossum

unread,

Aug 20, 2015, 10:53:27 PM8/20/15

to Mike Miller, python...@python.org

Yeah, I think Nick meant that as a way of implementing the "formatting mini-language" for bytes, given that bytes don't have __format__ or format. But using %(name)s for the *syntax* in bytes was never on the table. I think we're better off not supporting this type of string interpolation for bytes at all.

On Thu, Aug 20, 2015 at 7:39 PM, Mike Miller <python...@mgmiller.net> wrote:

Ok, not exactly it was this one, that I may have misunderstood:

https://mail.python.org/pipermail/python-ideas/2015-August/035347.html

On 08/20/2015 05:57 PM, Mike Miller wrote:

I found the b'' idea on a recent message here between you and Nick I think, it
seemed interesting. It's gone now, as well as the typo, thanks MRAB.

Nick Coghlan

unread,

Aug 21, 2015, 2:40:50 AM8/21/15

to Guido van Rossum, python...@python.org

On 21 August 2015 at 12:52, Guido van Rossum <gu...@python.org> wrote:
> Yeah, I think Nick meant that as a way of implementing the "formatting
> mini-language" for bytes, given that bytes don't have __format__ or format.
> But using %(name)s for the *syntax* in bytes was never on the table. I think
> we're better off not supporting this type of string interpolation for bytes
> at all.

Yeah, I'm OK with doing this as a text-only thing - while printf-style
formatting is certainly useful, binary data is still often best
approached as a serialisation problem moreso than as an interpolation
one.

I really like Mike's language survey in his draft, and the main thing
I'd highlight in relation to that is that the interpolation syntax
used in JavaScript (with the leading "$" for substitution expressions)
is essentially the same as that used in PEPs 215, 292 & 501 (with the
key difference being to make the braces optional when leaving them out
is unambiguous)

One key pragmatic benefit of that is that I expect the number of folks
needing to context switch between JavaScript code and Python code will
vastly outstrip the number of folks context switching between C# and
Python.

One key compatibility benefit of that particular syntax is that it
interoperates much better with the "{{ global_variable }}"
substitution used for Mozilla's l20n templating (http://l20n.org/).
That makes it more compatible with the similar syntax used for Django
and Jinja2 variable substituation, and the "{% %}" syntax used for
Django and Jinja2 blocks.

However, those latter examples *do* highlight a "What could possibly
go wrong?" question we need to ensure we ask, which is how we want to
address the likelihood of folks writing things like:

myquery = i"SELECT $column FROM $table;"
mycommand = i"cat $filename"
mypage = i"<html><body>$content</body></html>"

It's the opposite of the "interpolating untrusted strings that may
contain aribtrary expressions" problem - what happens when the
variables being *substituted* are untrusted? It's easy to say "don't
do that", but if doing the right thing incurs all the repetition
currently involved in calling str.format, we're going to see a *lot*
of people doing the wrong thing. At that point, the JavaScript
backticks-with-arbitrary-named-callable solution starts looking very
attractive:

myquery = sql`SELECT $column FROM $table;`
mycommand = sh`cat $filename`
mypage = html`<html><body>$content</body></html>`

At that point, internationalisation could just be:

translated = _`This $value and this $other_value are interpolated
after translation lookup`

From an implementation perspective, that could be a matter of:

* adding a new "__interpolate__" magic method with a suitable signature
* changing the builtin "format" to implement __interpolate__ as str.format
* adding an "interpolator" builtin decorator that just did:

def interpolator(f):
f.__interpolate__ = f.__call__
return f

Regards,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Nathaniel Smith

unread,

Aug 21, 2015, 7:07:17 AM8/21/15

to Nick Coghlan, python...@python.org

On Aug 20, 2015 23:40, "Nick Coghlan" <ncog...@gmail.com> wrote:
>
[...]

> myquery = i"SELECT $column FROM $table;"
> mycommand = i"cat $filename"
> mypage = i"<html><body>$content</body></html>"
>
> It's the opposite of the "interpolating untrusted strings that may
> contain aribtrary expressions" problem - what happens when the
> variables being *substituted* are untrusted? It's easy to say "don't
> do that", but if doing the right thing incurs all the repetition
> currently involved in calling str.format, we're going to see a *lot*
> of people doing the wrong thing. At that point, the JavaScript
> backticks-with-arbitrary-named-callable solution starts looking very
> attractive:
>
> myquery = sql`SELECT $column FROM $table;`
> mycommand = sh`cat $filename`
> mypage = html`<html><body>$content</body></html>`

Surely if using backticks we would drop the ugly prefix syntax and just make it a function call?

myquery = sql(`SELECT $column FROM $table;`)

etc., where `...` returns an object with the string and substitution info inside it.

I can certainly appreciate the argument that safe quoting for string interpolation deserves as much attention at the language level in 2015 as buffer overflow checking deserved back in the day.

Taking that problem seriously though is perhaps an argument against even having a trivial string version, because if it's legal then people will still write

do_sql("SELECT $column FROM $table;")

instead and the only way to get them to consistently use delayed (safe) evaluation would be to constantly educate and audit, which is the opposite of good design for security and exactly the problem we have now. Really what we want from this perspective is that it should be *harder* to get it wrong than to get it right.

Maybe simple no-quoting interpolation should be spelled

str(`hello $planet`)

(or substitute favorite prefix tag if allergic to backticks), so you have to explicitly specify a quoting syntax even if only to say that you want the null syntax.

Alternatively I guess it would be enough if interfaces like our hypothetical sql(...) simply refused to accept raw strings and required delayed interpolation objects only, even for static/constant queries. But I'm unconvinced that this would happen, given the number of preexisting APIs that already accept strings, and the need to continue supporting pre-3.6 versions of python.

-n

Nick Coghlan

unread,

Aug 21, 2015, 8:06:40 AM8/21/15

to Nathaniel Smith, python...@python.org

On 21 August 2015 at 21:06, Nathaniel Smith <n...@pobox.com> wrote:
> On Aug 20, 2015 23:40, "Nick Coghlan" <ncog...@gmail.com> wrote:
>>
> [...]
>> myquery = i"SELECT $column FROM $table;"
>> mycommand = i"cat $filename"
>> mypage = i"<html><body>$content</body></html>"
>>
>> It's the opposite of the "interpolating untrusted strings that may
>> contain aribtrary expressions" problem - what happens when the
>> variables being *substituted* are untrusted? It's easy to say "don't
>> do that", but if doing the right thing incurs all the repetition
>> currently involved in calling str.format, we're going to see a *lot*
>> of people doing the wrong thing. At that point, the JavaScript
>> backticks-with-arbitrary-named-callable solution starts looking very
>> attractive:
>>
>> myquery = sql`SELECT $column FROM $table;`
>> mycommand = sh`cat $filename`
>> mypage = html`<html><body>$content</body></html>`
>
> Surely if using backticks we would drop the ugly prefix syntax and just make
> it a function call?

Not really, no, as `obj` already means repr(obj) in Python 2, and we
can't silently make it do something else in Python 3 (although we can
break it noisily and thus strongly encourage folks to switch to using
the builtin instead).

The attractiveness of "little bobby tables" [1] vulnerabilities with
an interpolation syntax that *doesn't* support custom interpolation
engines has switched me from being mildly interested in the idea of
good support for SQL, shell command and HTML generation to considering
it a necessary capability, though.

Cheers,
Nick.

[1] https://xkcd.com/327/

Eric V. Smith

unread,

Aug 21, 2015, 12:35:43 PM8/21/15

to python...@python.org

The various string interpolation proposals are conflating two things:

1: extracting the expressions from the source string, and evaluating
them in the correct context, and

2: taking the source string and the evaluated values, and building the
resulting string.

The problem is that in #1, the compiler has to be in on what's going on.
That's because this problem can't be solved with normal function calls.
So if normal function calls can't do it, what choices do we have? Either
syntax, or special function names known to the compiler. I think syntax
is clearly the right choice here.

The only syntax changes that anyone has come up with so far are string
prefixes, maybe suffixes, and back-ticks (ick). Of those, prefixes make
the most sense. I'm interested in other suggestions, though. (Since I
wrote this, I see Barry's import-based approach, but it's similar:
instructions to the compiler.)

Yuri's proposal was to implement #1 by having _any_ string prefix
trigger the compiler to get involved to extract the source string and
the compute the values. Then for #2, he invoked normal function calls,
derived from the string prefix. He also loosened the restriction that
strings would be the result: because any function could be invoked with
the source string and the values, that function could return anything.

If you really want string interpolation to be extensible to domains such
as SQL and HTML, then I think an approach like Yuri's is the only way to
do it: some syntax to tell the compiler to treat a string differently,
coupled with some user-specifiable function that gets called to do the
real work, and no need for the result to be a string.

Eric.

Chris Rebert

unread,

Aug 21, 2015, 1:48:27 PM8/21/15

to Nick Coghlan, python...@python.org

On Thu, Aug 20, 2015 at 11:40 PM, Nick Coghlan <ncog...@gmail.com> wrote:
> On 21 August 2015 at 12:52, Guido van Rossum <gu...@python.org> wrote:

<snip>

> It's the opposite of the "interpolating untrusted strings that may
> contain aribtrary expressions" problem - what happens when the
> variables being *substituted* are untrusted? It's easy to say "don't
> do that", but if doing the right thing incurs all the repetition
> currently involved in calling str.format, we're going to see a *lot*
> of people doing the wrong thing. At that point, the JavaScript
> backticks-with-arbitrary-named-callable solution starts looking very
> attractive:
>
> myquery = sql`SELECT $column FROM $table;`
> mycommand = sh`cat $filename`
> mypage = html`<html><body>$content</body></html>`
>
> At that point, internationalisation could just be:
>
> translated = _`This $value and this $other_value are interpolated
> after translation lookup`

The problem with such syntax is that Guido already long ago ruled out
using backticks for anything in Python 3:
"""
No more backticks.

Backticks (`) will no longer be used as shorthand for repr -- but that
doesn't mean they are available for other uses. Even ignoring the
backwards compatibility confusion, the character itself causes too
many problems (in some fonts, on some keyboards, when typesetting a
book, etc).
""" -- https://www.python.org/dev/peps/pep-3099/

Regards,
Chris
--
http://chrisrebert.com

Mike Miller

unread,

Aug 21, 2015, 3:52:34 PM8/21/15

to Eric V. Smith, python...@python.org, Nick Coghlan

Yes, we were discussing these custom prefixes in Yuri's thread yesterday, but
Guido dropped a big -1 there. However, you Eric and Nick make some compelling
arguments in favor of them; they do solve several of our outstanding issues.

Would he be able to be persuaded to change his mind?

-Mike

(Note: I edited out the backticks aspect from below, don't think it will be
possible or desired, as Chris R. demonstrated in this thread.)

On 08/21/2015 09:35 AM, Eric V. Smith wrote:
>> On 08/21/2015 07:49 AM, Nick Coghlan wrote:
>>>> of people doing the wrong thing. At that point, the JavaScript

>>>> arbitrary-named-callable solution starts looking very
>>>> attractive:
>>>>
>>>> myquery = sql"SELECT $column FROM $table;"

>>> ....

>
> If you really want string interpolation to be extensible to domains such
> as SQL and HTML, then I think an approach like Yuri's is the only way to
> do it: some syntax to tell the compiler to treat a string differently,

Nick Coghlan

unread,

Aug 21, 2015, 6:59:51 PM8/21/15

to Mike Miller, Eric V. Smith, python...@python.org

On 22 August 2015 at 05:52, Mike Miller <python...@mgmiller.net> wrote:
> Yes, we were discussing these custom prefixes in Yuri's thread yesterday,
> but Guido dropped a big -1 there. However, you Eric and Nick make some
> compelling arguments in favor of them; they do solve several of our
> outstanding issues.
>
> Would he be able to be persuaded to change his mind?

It's also worth reiterating my concept of using "!" to introducing the
arbitrary "magic happens here" prefixes. That is, you'd write them
like this:

myquery = !sql"SELECT $column FROM $table;"
mycommand = !sh"cat $filename"
mypage = !html"<html><body>$content</body></html>"

I'd previously suggested a syntax along those lines for full compile
time AST manipulation where the compiler also had to be made aware of
the prefix names somehow, but I think the proposals that have evolved
around f-strings make it possible to instead resolve the named
reference at runtime, while still having the compiler handle the
subexpression extraction and evaluation.

Regards,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Mike Miller

unread,

Aug 21, 2015, 7:22:33 PM8/21/15

to gu...@python.org, Barry Warsaw, python...@python.org

The more I think about it, trying to push str.format() (or similar) syntax into
i18n is just too much. We'll need a different prefix anyway because of the
compile/runtime differences so why not stick with str.Template formatting?

It fits the use case perfectly, and requires little additional work.

Trying to get consistency with str.format() syntax creates security, policy,
docs, and tools requirements. Instead, we could accept a small inconsistency
(in the grand scheme of things) in order to use the right tool for the job::

# powerful formatting
f'Folder {folder}'

# simple translation
t'Hello $name.' # or i'', _'', etc
dt'Hello # deferred translation

I'd like to update my draft to reflect this change, unless anyone has objections.

-Mike

MRAB

unread,

Aug 21, 2015, 7:33:43 PM8/21/15

to python...@python.org

On 2015-08-22 00:21, Mike Miller wrote:
> The more I think about it, trying to push str.format() (or similar) syntax into
> i18n is just too much. We'll need a different prefix anyway because of the
> compile/runtime differences so why not stick with str.Template formatting?
>
> It fits the use case perfectly, and requires little additional work.
>
> Trying to get consistency with str.format() syntax creates security, policy,
> docs, and tools requirements. Instead, we could accept a small inconsistency
> (in the grand scheme of things) in order to use the right tool for the job::
>
> # powerful formatting
> f'Folder {folder}'
>
> # simple translation
> t'Hello $name.' # or i'', _'', etc
> dt'Hello # deferred translation
>
> I'd like to update my draft to reflect this change, unless anyone has objections.
>

f-strings use {} syntax, so I'd prefer t-string to use {} syntax too,
unless you're happy to explain it to all those users who'll be asking
why f-strings use {} but t-strings use $. (I might even be one of them!
:-))

Yury Selivanov

unread,

Aug 21, 2015, 7:36:18 PM8/21/15

to python...@python.org

On 2015-08-21 6:59 PM, Nick Coghlan wrote:
> It's also worth reiterating my concept of using "!" to introducing the
> arbitrary "magic happens here" prefixes. That is, you'd write them
> like this:
>
> myquery = !sql"SELECT $column FROM $table;"
> mycommand = !sh"cat $filename"
> mypage = !html"<html><body>$content</body></html>"

I too like the macros concept, especially how it's implemented
in Rust. Your examples would look like:

myquery = sql!"SELECT $column FROM $table;"
mycommand = sh! "cat $filename"

and it'd be possible to do even more:

v = vec! [1, 2, 3]
debug!("error {error code}")

To implement macros we'll have to introduce another import
step -- macros expansion, during which Python would resolve
macros names and evaluate them, storing the transformation
result in pyc files and creating new (updated) code objects.

All in all, I don't think that all the extra complexity
required to have full macros support is worth it. Template
Strings would be a great alternative, much easier to
implement.

Yury

Mike Miller

unread,

Aug 21, 2015, 8:02:18 PM8/21/15

to MRAB, python...@python.org

On 08/21/2015 04:33 PM, MRAB wrote:
> f-strings use {} syntax, so I'd prefer t-string to use {} syntax too,
> unless you're happy to explain it to all those users who'll be asking
> why f-strings use {} but t-strings use $. (I might even be one of them!
> :-))

Hi, the reason is above, it fits the use case perfectly. Most devs (that know
nothing about i18n) will not know it exists.

However, imagine the opposite situation...

We'll have to explain to non-technical translators how to use a new syntax,
which they'll sometimes make mistakes with.

We'll have to explain and lecture devs and translators frequently not to use
format specifiers and arbitrary expressions, etc.

We'll have to document policy and write tools to make sure advanced features
don't get used in translating. The checks may not be done at compile time, so
those who don't use linters won't be helped.

All to avoid $,${} in favor of {}.

Certainly either strategy could be done, but one is easier, doesn't put i18n
needs at a disadvantage, and won't affect the typical developer.

I agree neither choice is perfect.

-Mike

Mike Miller

unread,

Aug 21, 2015, 8:19:35 PM8/21/15

to MRAB, python...@python.org

And also implement str.format_safe_subtitute().

Wes Turner

unread,

Aug 21, 2015, 9:07:34 PM8/21/15

to Nick Coghlan, Eric V. Smith, Python-Ideas

On Aug 21, 2015 5:59 PM, "Nick Coghlan" <ncog...@gmail.com> wrote:
>
> On 22 August 2015 at 05:52, Mike Miller <python...@mgmiller.net> wrote:
> > Yes, we were discussing these custom prefixes in Yuri's thread yesterday,
> > but Guido dropped a big -1 there. However, you Eric and Nick make some
> > compelling arguments in favor of them; they do solve several of our
> > outstanding issues.
> >
> > Would he be able to be persuaded to change his mind?
>
> It's also worth reiterating my concept of using "!" to introducing the
> arbitrary "magic happens here" prefixes. That is, you'd write them
> like this:
>
> myquery = !sql"SELECT $column FROM $table;"
> mycommand = !sh"cat $filename"
> mypage = !html"<html><body>$content</body></html>"
>
> I'd previously suggested a syntax along those lines for full compile
> time AST manipulation where the compiler also had to be made aware of
> the prefix names somehow, but I think the proposals that have evolved
> around f-strings make it possible to instead resolve the named
> reference at runtime, while still having the compiler handle the
> subexpression extraction and evaluation.

So, str subclasses with _repr_sql_ functions that sometimes serialize and translate differently based on ~threadlocals for SQL variant, lang, charset
; and a new syntax for str.format(**globals()+locals())?

Barry Warsaw

unread,

Aug 21, 2015, 9:39:33 PM8/21/15

to Mike Miller, python...@python.org

On Aug 21, 2015, at 04:21 PM, Mike Miller wrote:

>The more I think about it, trying to push str.format() (or similar) syntax
>into i18n is just too much. We'll need a different prefix anyway because of
>the compile/runtime differences so why not stick with str.Template
>formatting?
>
>It fits the use case perfectly, and requires little additional work.

The main annoyance with string.Template based approaches is the same as
str.format() -- the requirement to use sys._getframe() to access the
interpolation values. I think this was one of the main reasons to propose new
syntax, since the compiler can parse the interpolation string and arrange for
the values to be composed into a substitution dictionary without having to do
ugly locals/globals references. I think this is also Guido's main gripe about
the current function-based implementations.

I still wish we could solve this more limited problem, but I don't see a way
around that without adding syntax, and if you're going to do that, then I
think most people want to go down the whole PEP 498/501 road.

Cheers,
-Barry

Mike Miller

unread,

Aug 22, 2015, 1:52:52 AM8/22/15

to Barry Warsaw, python...@python.org

Hi,

I'm not sure that's the case any more, after reading the threads here this week
there are numerous difficulties with trying to reconcile both use cases, and
didn't get the feeling anyone has an elegant solution to them.

We could implement f'' and (i'', aka t'') using either syntax of course, parsing
variables from the string, but choosing translation with str.format() seems to
cause several more issues than (string.Template() and a bit of inconsistency does).

Which syntax would you rather have for translation? (Knowing that you might
give a different answer for standard interpolation.)

-Mike

Nick Coghlan

unread,

Aug 22, 2015, 6:09:53 AM8/22/15

to Mike Miller, Barry Warsaw, python...@python.org

On 22 August 2015 at 15:52, Mike Miller <python...@mgmiller.net> wrote:
> Hi,
>
> I'm not sure that's the case any more, after reading the threads here this
> week there are numerous difficulties with trying to reconcile both use
> cases, and didn't get the feeling anyone has an elegant solution to them.
>
> We could implement f'' and (i'', aka t'') using either syntax of course,
> parsing variables from the string, but choosing translation with
> str.format() seems to cause several more issues than (string.Template() and
> a bit of inconsistency does).
>
> Which syntax would you rather have for translation? (Knowing that you might
> give a different answer for standard interpolation.)

I just pushed a major rewrite of PEP 501 based on the discussions
since the initial version of that and PEP 498 went online:
https://www.python.org/dev/peps/pep-0501/

It switches to using a magic method and explicitly named interpolator
in interpolation expressions, with "!str" being the interpolator
reference for default string formatting. From a motivation
perspective, while i18n remains a consideration, more easily
addressing the risk of code injection attacks against naive use of
string interpolation when generating database queries, shell commands
or HTML pages now provides a stronger motivation making the
interpolation semantics extensible.

Writing a custom interpolator (including for i18n) becomes as simple as doing:

@interpolator
def my_interpolator(raw_template, parsed_fields, field_values):
...

While using it then looks like:

result = !my_interpolator "This has $values $mixed into it"

(Similar to yield, it is proposed that interpolation expressions would
require parentheses when embedded inside a larger expression)

Cheers,

Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Guido van Rossum

unread,

Aug 22, 2015, 12:17:52 PM8/22/15

to Nick Coghlan, Barry Warsaw, python...@python.org

1. That's an entirely different proposal, you're just reusing the PEP number.

2. Have I died and gone to Perl?

Nick Coghlan

unread,

Aug 22, 2015, 4:37:01 PM8/22/15

to Guido van Rossum, Barry Warsaw, python...@python.org

It's aiming to solve the same basic problem though, which is the
aspect I consider most important when tackling a design question. The
discussions following the posting of my first draft highlighted some
real limitations of my original design both at a semantic level and at
a motivational level, so I changed it in place rather than introducing
yet another PEP on the same topic (Mike Miller's draft PEP was an
excellent synthesis, but there's no way he could account for the fact
that 501 was still only a first draft).

> 2. Have I died and gone to Perl?

That's my question in relation to PEP 498 - it seems to introduce lots
of line noise for people to learn to read for little to no benefit (my
perspective is heavily influenced by the fact that most of the code I
write myself these days consists of network API calls + logging
messages + UI template rendering, with only very occasional direct
calls to str.format that use anything more complicated than "{}" or
"{!r}" as the substitution field).

As a result, I'd be a lot more comfortable with PEP 498 if it had more
examples of potential practical use cases, akin to the examples
section from PEP 343 for context managers.

While the second draft of PEP 501 is even more line-noisy than PEP 498
due to the use of both "!" and "$", it at least generalises the
underlying semantics of compiler-assisted interpolation to apply to
additional use cases like logging, i18n (including compabitibility
with Mozilla's l20n syntax), safe SQL interpolation, safe shell
command interpolation, HTML template rendering, etc.

For the third draft, I'll take another pass at the surface syntax - I
like the currently proposed semantics, but agree the current spelling
is overly sigil heavy.

Regards,

Guido van Rossum

unread,

Aug 22, 2015, 6:51:27 PM8/22/15

to Nick Coghlan, Barry Warsaw, python...@python.org

On Sat, Aug 22, 2015 at 1:36 PM, Nick Coghlan <ncog...@gmail.com> wrote:

On 23 August 2015 at 02:16, Guido van Rossum <gu...@python.org> wrote:
> On Sat, Aug 22, 2015 at 3:09 AM, Nick Coghlan <ncog...@gmail.com> wrote:
>> (Similar to yield, it is proposed that interpolation expressions would
>> require parentheses when embedded inside a larger expression)
>
>
> 1. That's an entirely different proposal, you're just reusing the PEP
> number.

It's aiming to solve the same basic problem though, which is the
aspect I consider most important when tackling a design question. The
discussions following the posting of my first draft highlighted some
real limitations of my original design both at a semantic level and at
a motivational level, so I changed it in place rather than introducing
yet another PEP on the same topic (Mike Miller's draft PEP was an
excellent synthesis, but there's no way he could account for the fact
that 501 was still only a first draft).

Yeah, it's not unheard of for PEP authors to pivot after listening to feedback. :-)

OTOH this topic is rich enough that I have no problem spending a few more PEP numbers on it. If Mike asks for a PEP number I am not going to withhold it.

> 2. Have I died and gone to Perl?

That's my question in relation to PEP 498 - it seems to introduce lots
of line noise for people to learn to read for little to no benefit (my
perspective is heavily influenced by the fact that most of the code I
write myself these days consists of network API calls + logging
messages + UI template rendering, with only very occasional direct
calls to str.format that use anything more complicated than "{}" or
"{!r}" as the substitution field).

As a result, I'd be a lot more comfortable with PEP 498 if it had more
examples of potential practical use cases, akin to the examples
section from PEP 343 for context managers.

Since you accept "!r", you must be asking about the motivation for including ":spec", right? That's inherited from PEP 3101. For myself, I know that the most common use of format specs is to limit the number of digits printed for floating point numbers, e.g.

t0 = time.time()

chop_onions(n)

t1 = time.time()

print("Chopped %d onions in %.3f seconds." % (n, t1-t0))

Or, using PEP 3101,

print("Chopped {} onions in {:.3f} seconds.".format(n, t1-t0))

Using the PEP 498 I can write this as

print("Chopped {n} onions in {t1-t0:.3f} seconds.")

But in PEP 498 without :spec, I'd have to find some other way of formatting t1-t0, and none of the alternatives look pretty. (Anything that requires introducing a temporary variable feels particularly ugly to me.)

While the second draft of PEP 501 is even more line-noisy than PEP 498
due to the use of both "!" and "$", it at least generalises the
underlying semantics of compiler-assisted interpolation to apply to

additional use cases like logging, i18n (including compatibility

with Mozilla's l20n syntax), safe SQL interpolation, safe shell
command interpolation, HTML template rendering, etc.

That's perhaps a bit *too* ambitious. The claim of "safety" for PEP 498 is simple -- it does not provide a way for a dynamically generated string to access values in the current scope (and it does this by not supporting dynamically generated strings). For most domains you mention, safety is much more complex, and in fact mostly orthogonal -- code injection attacks rely on the value of the interpolated variables, so PEP 498's "safety" does not help at all. I18n safety may be the exception -- the scenario is an untrustworthy translator who adds an interpolation that references a variable whose content is deemed sensitive, perhaps a database key.

For the third draft, I'll take another pass at the surface syntax - I
like the currently proposed semantics, but agree the current spelling
is overly sigil heavy.

Good luck.

Nick Coghlan

unread,

Aug 22, 2015, 9:38:01 PM8/22/15

to Guido van Rossum, Barry Warsaw, python...@python.org

On 23 August 2015 at 08:50, Guido van Rossum <gu...@python.org> wrote:
> OTOH this topic is rich enough that I have no problem spending a few more
> PEP numbers on it. If Mike asks for a PEP number I am not going to withhold
> it.

Aye, agreed - at the very least, we want to preserve his survey of
interpolation in other languages, as I found that to be an incredibly
valuable contribution.

>> > 2. Have I died and gone to Perl?
>>
>> That's my question in relation to PEP 498 - it seems to introduce lots
>> of line noise for people to learn to read for little to no benefit (my
>> perspective is heavily influenced by the fact that most of the code I
>> write myself these days consists of network API calls + logging
>> messages + UI template rendering, with only very occasional direct
>> calls to str.format that use anything more complicated than "{}" or
>> "{!r}" as the substitution field).
>>
>> As a result, I'd be a lot more comfortable with PEP 498 if it had more
>> examples of potential practical use cases, akin to the examples
>> section from PEP 343 for context managers.
>
> Since you accept "!r", you must be asking about the motivation for including
> ":spec", right?

Sorry, I wasn't clear - PEP 501 also retains the field formatting
capabilities, and is hence strictly "noisier" than PEP 498 (especially
the ! prefix version of the syntax). It's just that it solves enough
*other* problems for it to seem worth the cost to me. When the benefit
is "str.format is prettier, all other forms of interpolation remain
repetitively verbose", it seems a very invasive change just to
replace:

print("Chopped {} onions in {:.3f} seconds.".format(n, t1-t0))

with:

print(f"Chopped {n} onions in {t1-t0:.3f} seconds.")

>> While the second draft of PEP 501 is even more line-noisy than PEP 498
>> due to the use of both "!" and "$", it at least generalises the
>> underlying semantics of compiler-assisted interpolation to apply to
>> additional use cases like logging, i18n (including compatibility
>> with Mozilla's l20n syntax), safe SQL interpolation, safe shell
>> command interpolation, HTML template rendering, etc.
>
>
> That's perhaps a bit *too* ambitious. The claim of "safety" for PEP 498 is
> simple -- it does not provide a way for a dynamically generated string to
> access values in the current scope (and it does this by not supporting
> dynamically generated strings). For most domains you mention, safety is much
> more complex, and in fact mostly orthogonal -- code injection attacks rely
> on the value of the interpolated variables, so PEP 498's "safety" does not
> help at all.

Right, but that's where I came to the conclusion that the lack of
arbitrary interpolation support ends up making PEP 498 actively
dangerous, as string interpolation based substitution ends up being so
much prettier than doing things right. Compare:

os.system(f"echo {filename}")
subprocess.call(f"echo {filename}")
subprocess.call(["echo", filename])

Even in that simple case, the two unsafe approaches are much nicer to
read, and as the command line gets more complex, the safe version gets
harder and harder to read relative to the unsafe ones.

With the latest PEP 501 draft (which switched the proposed syntax and
semantics to behave more like a traditional binary operator), we could
make invoking a subprocess *safely* look like:

subprocess.call $"echo $filename"

However, I'm now coming full circle back to the idea of making this a
string prefix, so that would instead look like:

subprocess.call($"echo $filename")

The trick would be to make interpolation lazy *by default* (preserving
the triple of the raw template string, the parsed fields, and the
expression values), and put the default rendering in the resulting
object's *__str__* method.

That description is probably as clear as mud, though, so back to the
PEP I go! :)

Guido van Rossum

unread,

Aug 23, 2015, 12:10:33 AM8/23/15

to Nick Coghlan, Barry Warsaw, python...@python.org

On Sat, Aug 22, 2015 at 6:37 PM, Nick Coghlan <ncog...@gmail.com> wrote:

On 23 August 2015 at 08:50, Guido van Rossum <gu...@python.org> wrote:
> OTOH this topic is rich enough that I have no problem spending a few more
> PEP numbers on it. If Mike asks for a PEP number I am not going to withhold
> it.

Aye, agreed - at the very least, we want to preserve his survey of
interpolation in other languages, as I found that to be an incredibly
valuable contribution.

For that he should just update the Wikipedia page on the topic.Or maybe write the PEP, and then update Wikipedia, using the PEP as the [needed citation]. :-)

>> > 2. Have I died and gone to Perl?
>>
>> That's my question in relation to PEP 498 - it seems to introduce lots
>> of line noise for people to learn to read for little to no benefit (my
>> perspective is heavily influenced by the fact that most of the code I
>> write myself these days consists of network API calls + logging
>> messages + UI template rendering, with only very occasional direct
>> calls to str.format that use anything more complicated than "{}" or
>> "{!r}" as the substitution field).
>>
>> As a result, I'd be a lot more comfortable with PEP 498 if it had more
>> examples of potential practical use cases, akin to the examples
>> section from PEP 343 for context managers.
>
> Since you accept "!r", you must be asking about the motivation for including
> ":spec", right?

Sorry, I wasn't clear - PEP 501 also retains the field formatting
capabilities, and is hence strictly "noisier" than PEP 498 (especially
the ! prefix version of the syntax). It's just that it solves enough
*other* problems for it to seem worth the cost to me.

Wow. "PEP 498 seems to introduce a lot of line noise" was a rather broken way to say that...

When the benefit
is "str.format is prettier, all other forms of interpolation remain
repetitively verbose",

Who says that, and what does it mean?

it seems a very invasive change just to
replace:

print("Chopped {} onions in {:.3f} seconds.".format(n, t1-t0))

with:

print(f"Chopped {n} onions in {t1-t0:.3f} seconds.")

But only people who are politically correct about it use str.format(). Everyone else (and the logging module :-) still uses %.

>> While the second draft of PEP 501 is even more line-noisy than PEP 498
>> due to the use of both "!" and "$", it at least generalises the
>> underlying semantics of compiler-assisted interpolation to apply to
>> additional use cases like logging, i18n (including compatibility
>> with Mozilla's l20n syntax), safe SQL interpolation, safe shell
>> command interpolation, HTML template rendering, etc.
>
>
> That's perhaps a bit *too* ambitious. The claim of "safety" for PEP 498 is
> simple -- it does not provide a way for a dynamically generated string to
> access values in the current scope (and it does this by not supporting
> dynamically generated strings). For most domains you mention, safety is much
> more complex, and in fact mostly orthogonal -- code injection attacks rely
> on the value of the interpolated variables, so PEP 498's "safety" does not
> help at all.

Right, but that's where I came to the conclusion that the lack of
arbitrary interpolation support ends up making PEP 498 actively
dangerous, as string interpolation based substitution ends up being so
much prettier than doing things right. Compare:

os.system(f"echo {filename}")
subprocess.call(f"echo {filename}")
subprocess.call(["echo", filename])

Even in that simple case, the two unsafe approaches are much nicer to
read, and as the command line gets more complex, the safe version gets
harder and harder to read relative to the unsafe ones.

That reasoning is perverse, and feels disingenuous.

With the latest PEP 501 draft (which switched the proposed syntax and
semantics to behave more like a traditional binary operator), we could
make invoking a subprocess *safely* look like:

subprocess.call $"echo $filename"

Which reminds me of your one-time attempts to make call parentheses optional, so we could have print be a function and yet be able to write

print x, y

However, I'm now coming full circle back to the idea of making this a
string prefix, so that would instead look like:

subprocess.call($"echo $filename")

The trick would be to make interpolation lazy *by default* (preserving
the triple of the raw template string, the parsed fields, and the
expression values), and put the default rendering in the resulting
object's *__str__* method.

That's a clever idea. But I expect it will make interpolation much less convenient, because every recipient will have to call str(). The elegance of PEP 498 is that the recipient doesn't have to do or know anything special, because the result is *just* a string object.

That description is probably as clear as mud, though, so back to the
PEP I go! :)

I recommend taking a break first. Or maybe sample the recent activity in datetime-sig instead. :-)

Nick Coghlan

unread,

Aug 23, 2015, 12:11:04 AM8/23/15

to Guido van Rossum, Barry Warsaw, python...@python.org

On 23 August 2015 at 11:37, Nick Coghlan <ncog...@gmail.com> wrote:
> However, I'm now coming full circle back to the idea of making this a
> string prefix, so that would instead look like:
>
> subprocess.call($"echo $filename")
>
> The trick would be to make interpolation lazy *by default* (preserving
> the triple of the raw template string, the parsed fields, and the
> expression values), and put the default rendering in the resulting
> object's *__str__* method.

Indeed, after working through this latest change, I ended up back
where I started from a syntactic perspective, with a proposal for
i(nterpolated)-strings rather than f(ormatted)-strings:
https://www.python.org/dev/peps/pep-0501/

With appropriate modifications to subprocess.call, the proposal would
then enable us to write a *safe* shell command interpolation as:

subprocess.call(i"echo $filename")

The essential change relative to PEP 498 is to make it so that i"echo
$filename" doesn't produce a string directly. Rather, it would produce
an interpolation template as a first class object, holding:

* a reference to the raw template
* a compile time constant tuple-of-tuples describing the parsed fields
* a tuple with the calculated field values

The default rendering semantics would then live in
types.InterpolationTemplate.__str__, rather than being applied
implicitly at the point of definition.

The same underlying approach could also be used with the "{}"
substitution field syntax proposed in PEP 498, but I have some
concrete reasons for continuing to prefer $-based substitution:

* it makes the case of interpolating in single variables with str() as
simple as possible
* it's consistent with JavaScript/ES6 and Python+JavaScript is a more
common combination than Python+C#
* other common templating formats (including Django, Jinja2 and
Mozilla's l20n) collide on "{" and "}", but not on "$"
* it allows the raw template string to be more readily extracted and
used in an i18n message catalog

Regards,
Nick.

P.S. I vaguely recall seeing questions/suggestions along these lines
in the previous discussion threads, but don't recall the details. If
anyone did make a suggestion like this, please let me know, and I can
add your name to the Acknowledgements section.

Nick Coghlan

unread,

Aug 23, 2015, 12:58:16 AM8/23/15

to Guido van Rossum, Barry Warsaw, python...@python.org

On 23 August 2015 at 14:09, Guido van Rossum <gu...@python.org> wrote:
> On Sat, Aug 22, 2015 at 6:37 PM, Nick Coghlan <ncog...@gmail.com> wrote:
>> Right, but that's where I came to the conclusion that the lack of
>> arbitrary interpolation support ends up making PEP 498 actively
>> dangerous, as string interpolation based substitution ends up being so
>> much prettier than doing things right. Compare:
>>
>> os.system(f"echo {filename}")
>> subprocess.call(f"echo {filename}")
>> subprocess.call(["echo", filename])
>>
>> Even in that simple case, the two unsafe approaches are much nicer to
>> read, and as the command line gets more complex, the safe version gets
>> harder and harder to read relative to the unsafe ones.
>
>
> That reasoning is perverse, and feels disingenuous.

Yeah, professional paranoia produces a weird way of looking at the world :)

The key change in my thinking relative to a couple of years ago has
been that it's no longer the things that throw surprising exceptions
that cause me the most concern, but rather those that *appear* to
work, but are actually hiding a dangerous latent defect. These are the
situations where a developer (or reviewer) has to "just know" that the
apparently obvious way to do something is actually problematic, and
that's where we get security vulnerabilities.

Having the preferred interpolation syntax produce a non-string object
by default provides us with the opportunity to consider on an
interface by interface basis whether we want to:

* require callers to prerender interpolation templates (the default)
* implicitly render interpolation templates with the default renderer
(by calling str on the input, which many APIs do already)
* define and use a custom renderer for interpolation templates

This wouldn't prevent folks from doing the wrong thing -
os.system(str(i"echo $filename")) is just as dangerous from a code
injection perspective as os.system(f"echo {filename}"). The difference
lies in the appearance of the *fixed* code, where
subprocess.call(i"echo $filename") would be just as readable as the
os.system version, while f-strings don't help with any case that
requires a custom renderer in order to do the right thing.

That way, when a security linter picks up a problematic call like
os.system(str(i"echo $filename")), the solution it suggests can be
just as easy to read as the original.

> Which reminds me of your one-time attempts to make call parentheses
> optional, so we could have print be a function and yet be able to write
>
> print x, y

Yeah, that comparison occurred to me as well. It's one of the reasons
I kept looking for a way to do custom interpolation using a normal
function call instead of needing a new binary operator :)

>> However, I'm now coming full circle back to the idea of making this a
>> string prefix, so that would instead look like:
>>
>> subprocess.call($"echo $filename")
>>
>> The trick would be to make interpolation lazy *by default* (preserving
>> the triple of the raw template string, the parsed fields, and the
>> expression values), and put the default rendering in the resulting
>> object's *__str__* method.
>
> That's a clever idea. But I expect it will make interpolation much less
> convenient, because every recipient will have to call str(). The elegance of
> PEP 498 is that the recipient doesn't have to do or know anything special,
> because the result is *just* a string object.

Right, although eager rendering with i-strings just involves calling
"str" at the point of definition.

Another alternative would be to combine the two ideas, and have
i-strings be an implementation detail of f-strings, with f"echo
$filename" being a highly optimised version of str(i"echo $filename")
that avoids the need for a builtin name lookup (modulo whichever
substitution field syntax you eventually choose).

>> That description is probably as clear as mud, though, so back to the
>> PEP I go! :)
>
> I recommend taking a break first.

Aye, having got PEP 501 back to a place where *I* like it again, I'll
leave it alone for a while. The pace of iteration this weekend was
because I kept discovering aspects I didn't like myself, and coming up
with related improvements.

> Or maybe sample the recent activity in
> datetime-sig instead. :-)

Minstrels (singing): Brave Sir Robin ran away, bravely ran away away ,
when danger reared its ugly head, he bravely turned his tail and
fled... ;)

Cheers,
Nick.

P.S. For folks not familiar with that last reference:
http://www.montypython.net/scripts/bravesir.php :)

Akira Li

unread,

Aug 23, 2015, 5:29:26 AM8/23/15

to python...@python.org

Nick Coghlan <ncog...@gmail.com> writes:

> os.system(f"echo {filename}")
> subprocess.call(f"echo {filename}")
> subprocess.call(["echo", filename])
>
> Even in that simple case, the two unsafe approaches are much nicer to
> read, and as the command line gets more complex, the safe version gets
> harder and harder to read relative to the unsafe ones.

subprocess.call does not run the shell by default and therefore
subprocess.call(f"echo {filename}") will fail on POSIX (unless there is
an executable named echo<space>...).

If you meant shell=True then the right way is already the hard way:
pipes and redirections are more readable and less error-prone if written
using the shell syntax [1] (unless something like plumbum [2] is used)
and therefore people already might use the unsafe string formatting
without the corresponding shlex.quote() calls

[1]
http://stackoverflow.com/questions/295459/how-do-i-use-subprocess-popen-to-connect-multiple-processes-by-pipes
[2] https://pypi.python.org/pypi/plumbum

Terry Reedy

unread,

Aug 23, 2015, 8:24:28 PM8/23/15

to python...@python.org

On 8/23/2015 12:09 AM, Nick Coghlan wrote:

> Indeed, after working through this latest change, I ended up back
> where I started from a syntactic perspective, with a proposal for
> i(nterpolated)-strings rather than f(ormatted)-strings:
> https://www.python.org/dev/peps/pep-0501/

As I understand the two proposals, the essential difference, glossing
over surface syntax, is this. Compiling f'<template>' would parse the
template to an inaccessible structure of existing type (tuple?) and
process it at runtime with unreplaceable code returning a string with
interpolations. Compiling i'<templat>', in your latest revision, would
parse the template to an accessible structure of a new class. The new
class would have default code (in .__repr__) equivalent in result to the
f code. But additional methods or functions could return other strings
(or even non-strings). (Being able to access the structure for debugging
purposes might be helpful.) Is this basically it?

--
Terry Jan Reedy

Eric V. Smith

unread,

Aug 23, 2015, 8:34:58 PM8/23/15

to python...@python.org

At this point, I think PEPs 498 and 501 have converged, except for the
delayed string interpolation object (which I realize is important) and
how expressions are identified in the strings (which I consider less
important).

I think the string interpolation object is interesting. It's basically
what Petr Viktorin and Chris Angelico discussed and suggested here:
https://mail.python.org/pipermail/python-ideas/2015-August/035303.html.

My suggestion would be to add both f-strings (PEP 498) and i-strings (as
they're currently called in PEP 501), but with the exact same syntax to
identify and evaluate expressions. I don't particularly care what the
prefixes are. I'd add the plain f-strings first, then i-strings maybe
later. There are definitely some issues with delayed interpolation we
need to think about. An f-string would be shorthand for str(i-string).

I think it's hyperbolic to refers f-strings as a new string formatting
language. With one small difference (detailed in PEP 498, and with zero
usage I could find in the stdlib outside of tests), f-strings are a
strict superset of str.format() strings (but not the arguments to
.format of course). I think f-strings are no more different from
str.format strings than PEP 501 i-strings are to string.Template strings.

From what I can tell in the stdlib and in the wild, str.format() has
hundreds or thousands of times more usage that string.Template. I
realize that the reasons are not necessarily related to the syntax of
the replacement strings, but you can't say most people aren't familiar
with str.format().

> That description is probably as clear as mud, though, so back to the
> PEP I go! :)

Thanks for PEP 501. Maybe I'll add delayed interpolation to PEP 498!

On a more serious note, I'm thinking of adding i-strings to my f-string
implementation. I have some ideas that the format_spec (the :.3f stuff)
could be used by the code that eventually does the string interpolation.
For example, sql(i-string) might want to interpret this expression using
__sql__, instead of how str(i-string) would use __format__. Then the
sql() machinery could look at the format_spec and pass it to the value's
__sql__ method.

For example:
sql(i'select {date:as_date} from {tablename}'

might call date.__sql__('as_date'), which would know how to cast to the
write datatype (this happens to me all the time).

This is one reason I'm thinking of ditching !s, !r, and !a, at least for
the first implementation of PEP 498: they're not needed, and are not
generally applicable if we add the hooks I'm considering into i-strings.

Eric.

Eric V. Smith

unread,

Aug 23, 2015, 8:37:26 PM8/23/15

to python...@python.org

On 08/23/2015 08:23 PM, Terry Reedy wrote:
> On 8/23/2015 12:09 AM, Nick Coghlan wrote:
>
>> Indeed, after working through this latest change, I ended up back
>> where I started from a syntactic perspective, with a proposal for
>> i(nterpolated)-strings rather than f(ormatted)-strings:
>> https://www.python.org/dev/peps/pep-0501/
>
> As I understand the two proposals, the essential difference, glossing
> over surface syntax, is this. Compiling f'<template>' would parse the
> template to an inaccessible structure of existing type (tuple?) and
> process it at runtime with unreplaceable code returning a string with
> interpolations. Compiling i'<templat>', in your latest revision, would
> parse the template to an accessible structure of a new class. The new
> class would have default code (in .__repr__) equivalent in result to the
> f code. But additional methods or functions could return other strings
> (or even non-strings). (Being able to access the structure for debugging
> purposes might be helpful.) Is this basically it?
>

I think so. I just posted a longer version of this to python-ideas.

My current thinking is to add both f-strings and i-strings. The f-string
version, which would be much more common, would build in the str() call
around the i-string.

Eric.

Nathaniel Smith

unread,

Aug 23, 2015, 8:40:21 PM8/23/15

to Eric V. Smith, python...@python.org

On Sun, Aug 23, 2015 at 5:35 PM, Eric V. Smith <er...@trueblade.com> wrote:
> On a more serious note, I'm thinking of adding i-strings to my f-string
> implementation. I have some ideas that the format_spec (the :.3f stuff)
> could be used by the code that eventually does the string interpolation.
> For example, sql(i-string) might want to interpret this expression using
> __sql__, instead of how str(i-string) would use __format__. Then the
> sql() machinery could look at the format_spec and pass it to the value's
> __sql__ method.
>
> For example:
> sql(i'select {date:as_date} from {tablename}'
>
> might call date.__sql__('as_date'), which would know how to cast to the
> write datatype (this happens to me all the time).

Another use case would be when using an HTML-sensitive interpolater,
one would want a way to mark that one particular substitution-string
is already HTML-encoded and does not need further quoting.

-n

--
Nathaniel J. Smith -- http://vorpus.org

Guido van Rossum

unread,

Aug 23, 2015, 9:14:12 PM8/23/15

to python...@python.org

I'm feeling pretty good about f-strings. They're pretty much a proven concept, combining .format() strings in Python, and expression interpolation in other languages.

But for i-strings, I think it would be good if we could gather more actual experience using them. Every potential use case brought up for these so far (translation, html/shell/sql quoting) feels like there's a lot of work needing to be done to see if the idea is actually viable there. It would be a shame if we added all the (considerable!) machinery for i-strings and all we got was yet another way to do it (https://xkcd.com/927/), without killing at least one competing approach (similar to the way .format() has failed to replace %).

It's tough to envision how we could gather more experience with i-strings *without* building them into the language, but I'm really hesitant to add them without more experience. (This is the "new on the job market" paradox. :-) Maybe they could be emulated using a function call that uses sys._getframe() under the covers? Or maybe it's possible to cook up an experiment using other syntax hooks? E.g. the coding hack used in pyxl (https://github.com/dropbox/pyxl).[1]

Some specific thoughts:

- In HTML, there are multiple different ways that stuff needs to be quoted, depending on context, e.g. as element text, or as an attribute value, or inside <script></script>. My (limited) experience with pyxl at Dropbox also suggests that html often is constructed programmatically in multiple stages, so it's important to be able to include already-interpolated html fragments into another html block.

- In SQL the evaluation of $N is often built into the SQL parser.

- Honestly, subprocess.call(i'echo $filename') looks like it's referencing an environment variable, not a variable in the Python code.

[1] I am not endorsing pyxl -- its use is currently controversial at Dropbox. But its "coding: pyxl" hack is easily adapted for other syntax experiments (e.g. https://github.com/JukkaL/mypy/tree/master/mypy/codec).

Ron Adam

unread,

Aug 23, 2015, 9:25:17 PM8/23/15

to python...@python.org

On 08/23/2015 08:35 PM, Eric V. Smith wrote:
> Thanks for PEP 501. Maybe I'll add delayed interpolation to PEP 498!
>
> On a more serious note, I'm thinking of adding i-strings to my f-string
> implementation. I have some ideas that the format_spec (the :.3f stuff)
> could be used by the code that eventually does the string interpolation.
> For example, sql(i-string) might want to interpret this expression using
> __sql__, instead of how str(i-string) would use __format__. Then the
> sql() machinery could look at the format_spec and pass it to the value's
> __sql__ method.
>
> For example:
> sql(i'select {date:as_date} from {tablename}'
>
> might call date.__sql__('as_date'), which would know how to cast to the
> write datatype (this happens to me all the time).
>
> This is one reason I'm thinking of ditching !s, !r, and !a, at least for
> the first implementation of PEP 498: they're not needed, and are not
> generally applicable if we add the hooks I'm considering into i-strings.

In the .format() mini language is there a way to format an in place
literal value? (ok... need an example for this one.)

"{Name: {'John Doe':?<30} {'123-123-1234':?>13}\n".format()

What would '?' be?

Here this case the values are give, but not formatted yet.

I was thinking this would allow interpolating the values, then
translating, and finally formatting the translated string. It seems
part of the problem is the insertion of the values and formatting may be
tied to closely each other. Field formatting and value formatting are
to separate things.

By separating them into two well defined steps, we may be able to do...

"{Name: {name:<30} {number:>13}\n".interpolate().translate().format()

And possibly a literal syntax for that could just be expanded to the
chained method calls. Probably 'i' and/or 'f' would do, but 't' for
translate seems like it may be nice.

And if someone wanted to they can still do each step separately by using
the methods explicitly.

Cheers,
Ron

Steven D'Aprano

unread,

Aug 23, 2015, 9:26:00 PM8/23/15

to python...@python.org

On Sun, Aug 23, 2015 at 08:35:17PM -0400, Eric V. Smith wrote:

> I think the string interpolation object is interesting. It's basically
> what Petr Viktorin and Chris Angelico discussed and suggested here:
> https://mail.python.org/pipermail/python-ideas/2015-August/035303.html.

Are you sure that's the right URL? It seems only barely relevant to me.
It has Chris replying to Petr, but it's a vague suggestion of a "quantum
string interpolation" (Chris' words) with no details. He asks:

"How hard would this be to implement? Something that isn't a string,
retains all the necessary information, and then collapses to a string
when someone looks at it?"

I looked ahead a dozen or two posts, and can't see any further
discussion. Have I missed something?

--
Steve

Eric V. Smith

unread,

Aug 23, 2015, 9:31:50 PM8/23/15

to Steven D'Aprano, python...@python.org

> On Aug 23, 2015, at 9:24 PM, Steven D'Aprano <st...@pearwood.info> wrote:
>
>> On Sun, Aug 23, 2015 at 08:35:17PM -0400, Eric V. Smith wrote:
>>
>> I think the string interpolation object is interesting. It's basically
>> what Petr Viktorin and Chris Angelico discussed and suggested here:
>> https://mail.python.org/pipermail/python-ideas/2015-August/035303.html.
>
> Are you sure that's the right URL? It seems only barely relevant to me.
> It has Chris replying to Petr, but it's a vague suggestion of a "quantum
> string interpolation" (Chris' words) with no details. He asks:
>
> "How hard would this be to implement? Something that isn't a string,
> retains all the necessary information, and then collapses to a string
> when someone looks at it?"
>
> I looked ahead a dozen or two posts, and can't see any further
> discussion. Have I missed something?

That's the right url. I thought they were talking about the same thing. I even had a response written about it, saying it would always require str() for the simple use case. Then I accidentally deleted it before I sent it :(

Maybe I read too much in to it.

Eric.

Nick Coghlan

unread,

Aug 23, 2015, 9:42:14 PM8/23/15

to Eric V. Smith, python...@python.org

On 24 August 2015 at 10:35, Eric V. Smith <er...@trueblade.com> wrote:
> On 08/22/2015 09:37 PM, Nick Coghlan wrote:
>> The trick would be to make interpolation lazy *by default* (preserving
>> the triple of the raw template string, the parsed fields, and the
>> expression values), and put the default rendering in the resulting
>> object's *__str__* method.
>
> At this point, I think PEPs 498 and 501 have converged, except for the
> delayed string interpolation object (which I realize is important) and
> how expressions are identified in the strings (which I consider less
> important).
>
> I think the string interpolation object is interesting. It's basically
> what Petr Viktorin and Chris Angelico discussed and suggested here:
> https://mail.python.org/pipermail/python-ideas/2015-August/035303.html.

Aha, I though I'd seen that idea go by in one of the threads, but I
didn't remember where :)

I'll add Petr and Chris to the acknowledgements section in 501.

> My suggestion would be to add both f-strings (PEP 498) and i-strings (as
> they're currently called in PEP 501), but with the exact same syntax to
> identify and evaluate expressions. I don't particularly care what the
> prefixes are. I'd add the plain f-strings first, then i-strings maybe
> later. There are definitely some issues with delayed interpolation we
> need to think about. An f-string would be shorthand for str(i-string).

+1, as this is the point of view I've come to as well.

> I think it's hyperbolic to refers f-strings as a new string formatting
> language. With one small difference (detailed in PEP 498, and with zero
> usage I could find in the stdlib outside of tests), f-strings are a
> strict superset of str.format() strings (but not the arguments to
> .format of course). I think f-strings are no more different from
> str.format strings than PEP 501 i-strings are to string.Template strings.

Yeah, that's a fair criticism of my rhetoric, so I'll stop saying that.

> From what I can tell in the stdlib and in the wild, str.format() has
> hundreds or thousands of times more usage that string.Template. I
> realize that the reasons are not necessarily related to the syntax of
> the replacement strings, but you can't say most people aren't familiar
> with str.format().

Right, and I think we can actually make an example driven decision on
that front by looking at potential *target* formats for template
rendering. After all, one of the interesting discoveries we made in
having both str.__mod__ and str.format available is that %-formatting
is a great way to template str.format strings, and vice-versa, since
the meta-characters don't conflict, so you can minimise the escaping
needed.

For use cases like writing object __repr__ methods, I don't think the
choice of $-substitution or {}-substitution matters - neither $ nor {}
are likely to appear in the desired output (except as part of
interpolated values), so escaping shouldn't be common regardless of
which we choose. (Side note: __repr__ and _str__ implementations are
likely worth highlighting as a good use case for the new syntax!)

I think things get more interesting once we start talking about
interpolation targets other than "human readable text".

For example, one of the neat (/scary, depending on how you feel about
this kind of feature) things I realised in working on the latest draft
of PEP 501 is that you could use it to template *Python code*,
including eagerly bound references to objects in the current scope.
That is:

a = b + c

could instead be written as:

a = eval(str(i"$b + $c"))

That's not very interesting if all you do is immediately call eval()
on it, but it's a lot more interesting if you instead want to do
things like extract the AST, dispatch the operation for execution in
another process, etc. For example, you could use this capability to
build eagerly bound closures, which wouldn't see changes in name
bindings, but *would* see state changes in mutable objects.

With $-substitution, that "just works", as $ generally isn't
syntactically significant in Python code - it can only appear inside
strings (and potentially interpolation templates). With
{}-substitution, you'd have to double all the braces for dictionary
displays, dictionary comprehensions and set comprehensions. In example
form:

data = {k:v for k, v in source}

becomes:

data = eval(str(i"{k:v for k, v in $source}"))

rather than:

data = eval(f"{{k:v for k, v in {{source}}}}"))

You hit a similar problem if you're targeting Django or Jinja2
templates, or any content that involves l20n style JavaScript
translation strings: the use of braces for substitution expressions in
the interpolation template conflicts with their use in the target
format.

So far, the only target rendering environments I've come up with where
$-substitution would create a conflict are shell commands and
JavaScript localisation using Mozilla's l20n syntax, and in both of
those, I'd actually *want* the Python lookup to take precedence over
the target environment lookup (and doubling the prefix to "$$" for
target environment lookup seems quite reasonable when you actually do
want to do the name lookup in the target environment).

>> That description is probably as clear as mud, though, so back to the
>> PEP I go! :)
>
> Thanks for PEP 501. Maybe I'll add delayed interpolation to PEP 498!
>
> On a more serious note, I'm thinking of adding i-strings to my f-string
> implementation. I have some ideas that the format_spec (the :.3f stuff)
> could be used by the code that eventually does the string interpolation.
> For example, sql(i-string) might want to interpret this expression using
> __sql__, instead of how str(i-string) would use __format__. Then the
> sql() machinery could look at the format_spec and pass it to the value's
> __sql__ method.

Yeah, that's the key reason PEP 501 is careful to treat them as opaque
strings that it merely transports through to the renderer. The
*default* renderer would expect them to be str.format format
specifiers, but other renderers may either disallow them entirely, or
expect them to do something different.

> For example:
> sql(i'select {date:as_date} from {tablename}'
>
> might call date.__sql__('as_date'), which would know how to cast to the
> write datatype (this happens to me all the time).
>
> This is one reason I'm thinking of ditching !s, !r, and !a, at least for
> the first implementation of PEP 498: they're not needed, and are not
> generally applicable if we add the hooks I'm considering into i-strings.

+1 from me. Given arbitrary expression support, it's both entirely
possible and more explicit to write the builtin calls directly (obj!a,
obj!r, obj!s -> ascii(obj), repr(obj), str(obj))

Regards,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Nick Coghlan

unread,

Aug 23, 2015, 9:49:55 PM8/23/15

to Steven D'Aprano, python...@python.org

On 24 August 2015 at 11:24, Steven D'Aprano <st...@pearwood.info> wrote:
> On Sun, Aug 23, 2015 at 08:35:17PM -0400, Eric V. Smith wrote:
>
>> I think the string interpolation object is interesting. It's basically
>> what Petr Viktorin and Chris Angelico discussed and suggested here:
>> https://mail.python.org/pipermail/python-ideas/2015-August/035303.html.
>
> Are you sure that's the right URL? It seems only barely relevant to me.
> It has Chris replying to Petr, but it's a vague suggestion of a "quantum
> string interpolation" (Chris' words) with no details. He asks:
>
> "How hard would this be to implement? Something that isn't a string,
> retains all the necessary information, and then collapses to a string
> when someone looks at it?"
>
> I looked ahead a dozen or two posts, and can't see any further
> discussion. Have I missed something?

That's the level of detail I remembered seeing, and it fairly
concisely describes PEP 501's types.InterpolationTemplate - it's an
object that isn't a string (it's an unrendered template that carries
with it all the information needed to render itself on demand) that
renders itself to a plain string when you look at it with str().

So the answer to Chris's initial "How hard would this be to
implement?" question turned out to be "Not very, once we thought
through the details" :)

Cheers,

Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Wes Turner

unread,

Aug 23, 2015, 10:32:24 PM8/23/15

to Nick Coghlan, Eric V. Smith, python...@python.org

IIUC, to do this with SQL,

> sql(i'select {date:as_date} from {tablename}'

needs to be

['select ', unescaped(date, 'as_date'), 'from ', unescaped(tablename)]

so that e.g. sql_92(), sql_2011()

would know that 'select ' is presumably implicitly escaped

* https://en.wikipedia.org/wiki/SQL#Interoperability_and_standardization

* http://docs.sqlalchemy.org/en/rel_1_0/dialects/

* https://docs.djangoproject.com/en/1.7/ref/models/queries/#f-expressions "Django F-Expressions"

Wes Turner

unread,

Aug 24, 2015, 1:01:17 AM8/24/15

to Nick Coghlan, Eric V. Smith, python...@python.org

For reference, the SQLAlchemy Expression API solves for

(safer) method-chaining, nesting *Python* expression API;

or you can reuse a raw SQL connection from a ConnectionPool.

Django F-Objects are relevant because they are deferred

(and compiled in context to the query context);

similar to the objectives of a given SQL syntax

templating, parameterization, and serialization

library.

Django Q-Objects are similar,

in that an f-string is basically

an iterator of AND-ed expressions

where AND means string concatenation.

Personally,

I'd pretty much always just reflect the tables

or map them out

and write SQLAlchemy Python expressions

which are then compiled to a particular dialect

(and quoted appropriately, **avoiding CWE-89**

surviving across table renames,

managing migrations).

Is it sometimes faster to write SQL by hand?

* I'd write the [SQLAlchemy], serialize to SQL, [and modify]

(because I should have namespaced Python table attrs for those attrs anyway,

even if it requires table introspection and reflection at (every/pool) instantiation)

* you can always execute query with a raw connection with an ORM

(and then **refactor (REF) string-ified table and column names**)

Each ORM (and DBAPI) have parametrization settings

(e.g. '%' or '?' or configuration_setting)

which should not collide with the f-string syntax.

* DBAPI v2.0

https://www.python.org/dev/peps/pep-0249/

* SQLite DBAPI

https://docs.python.org/2/library/sqlite3.html

https://docs.python.org/3/library/sqlite3.html

http://docs.sqlalchemy.org/en/rel_1_0/core/tutorial.html#conjunctions

>>> s = select([(users.c.fullname +
...               ", " + addresses.c.email_address).
...                label('title')]).\
...        where(users.c.id == addresses.c.user_id).\
...        where(users.c.name.between('m', 'z')).\
...        where(
...               or_(
...                  addresses.c.email_address.like('%@aol.com'),
...                  addresses.c.email_address.like('%@msn.com')
...               )
...        )
>>> conn.execute(s).fetchall() 
SELECT users.fullname || ? || addresses.email_address AS title
FROM users, addresses
WHERE users.id = addresses.user_id AND users.name BETWEEN ? AND ? AND
(addresses.email_address LIKE ? OR addresses.email_address LIKE ?)
(', ', 'm', 'z', '%@aol.com', '%@msn.com')
[(u'Wendy Williams, we...@aol.com',)]

http://docs.sqlalchemy.org/en/rel_1_0/core/tutorial.html#using-textual-sql

>>> from sqlalchemy.sql import text
>>> s = text(
...     "SELECT users.fullname || ', ' || addresses.email_address AS title "
...         "FROM users, addresses "
...         "WHERE users.id = addresses.user_id "
...         "AND users.name BETWEEN :x AND :y "
...         "AND (addresses.email_address LIKE :e1 "
...             "OR addresses.email_address LIKE :e2)")
SQL>>> conn.execute(s, x='m', y='z', e1='%@aol.com', e2='%@msn.com').fetchall() 
[(u'Wendy Williams, we...@aol.com',)]

SQLAlchemy is not async-compatible

(besides, most drivers block);

it's debatable whether async would be faster, anyway:

https://bitbucket.org/zzzeek/sqlalchemy/issues/3414/asyncio-and-sqlalchemy

Petr Viktorin

unread,

Aug 24, 2015, 3:28:49 AM8/24/15

to Steven D'Aprano, python-ideas

On Mon, Aug 24, 2015 at 3:24 AM, Steven D'Aprano <st...@pearwood.info> wrote:
> On Sun, Aug 23, 2015 at 08:35:17PM -0400, Eric V. Smith wrote:
>
>> I think the string interpolation object is interesting. It's basically
>> what Petr Viktorin and Chris Angelico discussed and suggested here:
>> https://mail.python.org/pipermail/python-ideas/2015-August/035303.html.
>
> Are you sure that's the right URL? It seems only barely relevant to me.
> It has Chris replying to Petr, but it's a vague suggestion of a "quantum
> string interpolation" (Chris' words) with no details. He asks:
>
> "How hard would this be to implement? Something that isn't a string,
> retains all the necessary information, and then collapses to a string
> when someone looks at it?"
>
> I looked ahead a dozen or two posts, and can't see any further
> discussion. Have I missed something?

Actually, it's I who missed something – replied from a phone, and sent
the reply to Chris only instead of to the list. And that killed
further discussion, it seems.
My answer was:

> Not too hard, but getting the exact semantics right could be tricky.
> It's probably something the language/stdlib should enable, rather than
> having it in the stdlib itself.

This seems roughly in line with what Guido was saying earlier. (Am I
misrepresenting your words, Guido?)

I thought a bit about what's bothering me with this idea, and I
realized I just don't like that "quantum effect" – collapsing when
something looks at a value.
All the parts up to that point sound OK, it's the str() that seems too
magical to me.

We could require a more explicit function, not just str(), to format the string:

>>> t0=1; t1=2; n=3
>>> template = i"Peeled {n} onions in {t1-t0:.2f}s"
>>> str(template)
types.InterpolationTemplate(template="Peeled {n} onions in
{t1-t0:.2f}s", fields=(('Peeled', 0, 'n', '', ''), ...), values=(3,
1))
>>> format_template(template) # (or make it a method?)
'Peeled 3 onions in 1s'

This no longer feels "too magic" to me, and it would allow some
experimentation before (if ever) InterpolationTemplate grows a more
convenient str().

Compared to f-strings, all this is doing is exposing the intermediate
structure. (What the "i" really stands for is "internal".)
Now f-strings would be just i-strings with a default formatter applied.

And, InterpolationTemplate should only allow attribute access (i.e. it
shouldn't be structseq). That way the internal structure can be
changed later, and the "old" attributes can be synthetized on access.

Mike Miller

unread,

Aug 24, 2015, 4:52:32 AM8/24/15

to Nick Coghlan, python...@python.org

On 08/23/2015 06:41 PM, Nick Coghlan wrote:
> You hit a similar problem if you're targeting Django or Jinja2
> templates, or any content that involves l20n style JavaScript
> translation strings: the use of braces for substitution expressions in

Hi, this part I don't get, maybe because it's so late here. Why create
Django/Jinja2/i20n templates inside Python code using another templating
language (whether Template or .format)?

Those kind of templates should be in dedicated text files, no?

-Mike

Nick Coghlan

unread,

Aug 24, 2015, 5:48:35 AM8/24/15

to Mike Miller, python...@python.org

On 24 August 2015 at 18:51, Mike Miller <python...@mgmiller.net> wrote:
>
> On 08/23/2015 06:41 PM, Nick Coghlan wrote:
>>
>> You hit a similar problem if you're targeting Django or Jinja2
>> templates, or any content that involves l20n style JavaScript
>> translation strings: the use of braces for substitution expressions in
>
> Hi, this part I don't get, maybe because it's so late here. Why create
> Django/Jinja2/i20n templates inside Python code using another templating
> language (whether Template or .format)?
>
> Those kind of templates should be in dedicated text files, no?

Think of meta-templating tools like cookie-cutter or DevAssistant (or
the project wizards in an IDE) - for those kinds of tools, "source
file formats" are actually output formats. Once you look at enough
different parts of the software development pipeline you find that
pretty much *every* input format is an output format for some other
tool :)

Cheers,

Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Nick Coghlan

unread,

Aug 24, 2015, 5:54:30 AM8/24/15

to Petr Viktorin, python-ideas

On 24 August 2015 at 17:28, Petr Viktorin <enc...@gmail.com> wrote:
> I thought a bit about what's bothering me with this idea, and I
> realized I just don't like that "quantum effect" – collapsing when
> something looks at a value.
> All the parts up to that point sound OK, it's the str() that seems too
> magical to me.
>
> We could require a more explicit function, not just str(), to format the string:
>
>>>> t0=1; t1=2; n=3
>>>> template = i"Peeled {n} onions in {t1-t0:.2f}s"
>>>> str(template)
> types.InterpolationTemplate(template="Peeled {n} onions in
> {t1-t0:.2f}s", fields=(('Peeled', 0, 'n', '', ''), ...), values=(3,
> 1))
>>>> format_template(template) # (or make it a method?)
> 'Peeled 3 onions in 1s'
>
> This no longer feels "too magic" to me, and it would allow some
> experimentation before (if ever) InterpolationTemplate grows a more
> convenient str().

Another option would be to put the default rendering in __format__,
and let __str__ fall through to __repr__. That way str(template)
wouldn't render the template, but format(template) would.

> Compared to f-strings, all this is doing is exposing the intermediate
> structure. (What the "i" really stands for is "internal".)
> Now f-strings would be just i-strings with a default formatter applied.
>
> And, InterpolationTemplate should only allow attribute access (i.e. it
> shouldn't be structseq). That way the internal structure can be
> changed later, and the "old" attributes can be synthetized on access.

Yeah, that's fair. I added the __iter__ to make some of the examples
prettier, but it probably isn't worth the loss of future flexibility.

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Nick Coghlan

unread,

Aug 24, 2015, 6:14:46 AM8/24/15

to Eric V. Smith, python...@python.org

On 24 August 2015 at 11:41, Nick Coghlan <ncog...@gmail.com> wrote:
> That's not very interesting if all you do is immediately call eval()
> on it, but it's a lot more interesting if you instead want to do
> things like extract the AST, dispatch the operation for execution in
> another process, etc. For example, you could use this capability to
> build eagerly bound closures, which wouldn't see changes in name
> bindings, but *would* see state changes in mutable objects.

Offering a nice early binding syntax is a question I've been pondering
for years (cf. PEPs 403 and 3150), so I'm intrigued by this question
of whether or not f-strings and i-strings might be able to deliver
those in a way that's more attractive than the current options.

This idea doesn't necessarily need deferred interpolation, so I'll use
the current PEP 498 f-string prefix and substitution expression
syntax. Consider the following function definition:

def defer(expr):
return eval("lambda: (" + expr + ")")

We can use this today as a strange way of writing a lambda expression:

>>> f = defer("42")
>>> f
<function <lambda> at 0x7f1c0314eae8>
>>> f()
42

There's no reason to do that, of course - you'd just use an actual
lambda expression instead.

However, f-strings will make it possible for folks to write code like this:

callables = [defer(f"{i}") for i in range(10)]

"{i}" in that example isn't a one-element set, it's a substitution
expression that interpolates "str(i)" into the formatted string, which
is then evaluated by "defer" as if the template contained the literal
value of "i" at the time of interpolation, rather than being a lazy
reference to a closure variable. (If you were to get appropriately
creative with exec, you could even use a trick like this to define
multiline lambdas)

Paul Moore

unread,

Aug 24, 2015, 7:36:13 AM8/24/15

to Nick Coghlan, python-ideas

On 24 August 2015 at 10:53, Nick Coghlan <ncog...@gmail.com> wrote:
>> We could require a more explicit function, not just str(), to format the string:
>>
>>>>> t0=1; t1=2; n=3
>>>>> template = i"Peeled {n} onions in {t1-t0:.2f}s"
>>>>> str(template)
>> types.InterpolationTemplate(template="Peeled {n} onions in
>> {t1-t0:.2f}s", fields=(('Peeled', 0, 'n', '', ''), ...), values=(3,
>> 1))
>>>>> format_template(template) # (or make it a method?)
>> 'Peeled 3 onions in 1s'
>>
>> This no longer feels "too magic" to me, and it would allow some
>> experimentation before (if ever) InterpolationTemplate grows a more
>> convenient str().
>
> Another option would be to put the default rendering in __format__,
> and let __str__ fall through to __repr__. That way str(template)
> wouldn't render the template, but format(template) would.

I'm once again losing the thread of all the variations being proposed.

As a reality check, is the expectation that something like the
following will still be possible:

print(f"Iteration {n}: Duration {end-start} seconds")

This is as an improvement over the two current approaches:

print("Iteration {}: Duration {} seconds".format(n, end-start))
print("Iteration %s: Duration %s seconds" % (n, end-start))

because it's less verbose than the former, and less punctuation-heavy
(and old-fashioned ;-)) than the latter.

Explicit str() calls or temporary variables or anything like that are
no improvement over the current options. Of course they may offer more
advanced features, but let's not lose the 80% case for the sake of the
20% (that's actually more like 95-5, to be honest).

Paul

Steven D'Aprano

unread,

Aug 24, 2015, 8:01:03 AM8/24/15

to python...@python.org

On Mon, Aug 24, 2015 at 08:14:21PM +1000, Nick Coghlan wrote:

> This idea doesn't necessarily need deferred interpolation, so I'll use
> the current PEP 498 f-string prefix and substitution expression
> syntax. Consider the following function definition:
>
> def defer(expr):
> return eval("lambda: (" + expr + ")")
>
> We can use this today as a strange way of writing a lambda expression:
>
> >>> f = defer("42")
> >>> f
> <function <lambda> at 0x7f1c0314eae8>
> >>> f()
> 42
>
> There's no reason to do that, of course - you'd just use an actual
> lambda expression instead.

There's a problem with the idea of using eval to defer objects -- it
relies on your object having an eval'able representation. Try to defer()
the following list L:

L = []
L.append(L)

But putting that aside...

> However, f-strings will make it possible for folks to write code like this:
>
> callables = [defer(f"{i}") for i in range(10)]

How is that different from this?

callables = [defer(str(i)) for i in range(10)]

If they are not the same, then what would this return?

[func() for func in callables]

I expect it to give [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]. Am I wrong?

> "{i}" in that example isn't a one-element set, it's a substitution
> expression that interpolates "str(i)" into the formatted string,

I understand that
f-strings are evaluated at the time of, um, their evaluation, so this
would be equivalent to:

callables = [defer("0"), defer("1"), defer("2", ... defer("9")]

> which
> is then evaluated by "defer" as if the template contained the literal
> value of "i" at the time of interpolation, rather than being a lazy
> reference to a closure variable.

I'm completely lost. How would you get a closure variable here?

I mean, I know how to get a closure in general terms, e.g.:

[(lambda : i) for i in range(10)]

but I'm not seeing where you would get a closure *specifically* in
this situation with your defer function.

--
Steve

Eric V. Smith

unread,

Aug 24, 2015, 8:41:15 AM8/24/15

to python...@python.org

On 08/24/2015 07:35 AM, Paul Moore wrote:
> I'm once again losing the thread of all the variations being proposed.
>
> As a reality check, is the expectation that something like the
> following will still be possible:
>
> print(f"Iteration {n}: Duration {end-start} seconds")

Yes, that's the PEP 498 proposal. I think (and this is just my opinion)
that if we do something more complicated, like the delayed interpolation
of i-strings, that we'd still keep f-strings.

And further, while internally we may rewrite f-strings to use the
i-string infrastructure, to the user they'd still look like the same
f-strings.

> Explicit str() calls or temporary variables or anything like that are
> no improvement over the current options. Of course they may offer more
> advanced features, but let's not lose the 80% case for the sake of the
> 20% (that's actually more like 95-5, to be honest).

Agreed.

Eric.

Petr Viktorin

unread,

Aug 24, 2015, 8:46:50 AM8/24/15

to Eric V. Smith, python-ideas

On Mon, Aug 24, 2015 at 2:41 PM, Eric V. Smith <er...@trueblade.com> wrote:
> On 08/24/2015 07:35 AM, Paul Moore wrote:
>> I'm once again losing the thread of all the variations being proposed.
>>
>> As a reality check, is the expectation that something like the
>> following will still be possible:
>>
>> print(f"Iteration {n}: Duration {end-start} seconds")
>
> Yes, that's the PEP 498 proposal. I think (and this is just my opinion)
> that if we do something more complicated, like the delayed interpolation
> of i-strings, that we'd still keep f-strings.
>
> And further, while internally we may rewrite f-strings to use the
> i-string infrastructure, to the user they'd still look like the same
> f-strings.
>
>> Explicit str() calls or temporary variables or anything like that are
>> no improvement over the current options. Of course they may offer more
>> advanced features, but let's not lose the 80% case for the sake of the
>> 20% (that's actually more like 95-5, to be honest).
>
> Agreed.

Indeed. On the other hand, let's make reasonably sure that next year
we won't need yet another syntax for the 20%.

Paul Moore

unread,

Aug 24, 2015, 11:04:50 AM8/24/15

to Eric V. Smith, Python-Ideas

On 24 August 2015 at 13:41, Eric V. Smith <er...@trueblade.com> wrote:
> On 08/24/2015 07:35 AM, Paul Moore wrote:
>> I'm once again losing the thread of all the variations being proposed.
>>
>> As a reality check, is the expectation that something like the
>> following will still be possible:
>>
>> print(f"Iteration {n}: Duration {end-start} seconds")
>
> Yes, that's the PEP 498 proposal. I think (and this is just my opinion)
> that if we do something more complicated, like the delayed interpolation
> of i-strings, that we'd still keep f-strings.

OK. That's my point, essentially - the discussion has drifted into
much more complex areas, with comments about how the wider-ranging
proposals cover the f-string case as a subset, and I just wanted to be
sure that there wasn't an implied "so we don't need f-strings any
more" in there. (Nick at one point spoke quite strongly against adding
multiple ways of doing the same thing).

Paul

Eric V. Smith

unread,

Aug 24, 2015, 11:14:25 AM8/24/15

to gu...@python.org, python...@python.org

On 08/23/2015 09:13 PM, Guido van Rossum wrote:
> But for i-strings, I think it would be good if we could gather more
> actual experience using them. Every potential use case brought up for
> these so far (translation, html/shell/sql quoting) feels like there's a
> lot of work needing to be done to see if the idea is actually viable
> there. It would be a shame if we added all the (considerable!) machinery
> for i-strings and all we got was yet another way to do it
> (https://xkcd.com/927/), without killing at least one competing approach
> (similar to the way .format() has failed to replace %).
>
> It's tough to envision how we could gather more experience with
> i-strings *without* building them into the language, but I'm really
> hesitant to add them without more experience. (This is the "new on the
> job market" paradox. :-) Maybe they could be emulated using a function
> call that uses sys._getframe() under the covers? Or maybe it's possible
> to cook up an experiment using other syntax hooks? E.g. the coding hack
> used in pyxl (https://github.com/dropbox/pyxl).[1]

I hope you don't mind that I borrowed the keys to the time machine. I'm
using the implementation of _string.formatter_parser() that I added for
implementing string.Formatter:

---8<---------------------------------------------
import sys
import _string

class i:
def __init__(self, s):
self.s = s
locals = sys._getframe(1).f_locals
globals = sys._getframe(1).f_globals
self.values = {}
# evaluate the expressions
for literal, expr, format_spec, conversion in \
_string.formatter_parser(self.s):
if expr:
value = eval(expr, locals, globals)
self.values[expr] = value

def __str__(self):
result = []
for literal, expr, format_spec, conversion in \
_string.formatter_parser(self.s):
result.append(literal)
if expr:
value = self.values[expr]
result.append(value.__format__(format_spec))
return ''.join(result)
---8<---------------------------------------------

So now, instead of i"x={x}", we say i("x={x}").

Let's use it with str:

>>> x = i('Version in caps {sys.version[0:7].upper()}')
>>> x
<__main__.i object at 0x7f1653311e90>
>>> str(x)
'Version in caps 3.6.0A0'

Cool. Now let's whip up a simple i18n example:

>>> def gettext(s):
... # Our complicated string lookup
... if s == 'My name is {name}, my dog is {dog}':
... return 'Mi pero es {dog}, y mi nombre es {name}'
... return s
...
>>> def _(istring):
... result = []
... # do the gettext lookup
... s = gettext(istring.s)
... # use the values from our original istring,
... # but the literals and ordering from our
... # looked-up string
... for literal, expr, format_spec, conversion in \
... _string.formatter_parser(s):
... result.append(literal)
... if expr is not None:
... result.append(istring.values[expr])
... return ''.join(result)
...
>>> name = 'Eric'
>>> dog = 'Misty'
>>> x = i('My name is {name}, my dog is {dog}')
>>> str(x)
'My name is Eric, my dog is Misty'
>>> _(x)
'Mi pero es Misty, y mi nombre es Eric'
>>>

That should be enough to play with i-strings in logging, sql, xml, etc.

Several things should be addressed: hiding the call to
_string.formatter_parse inside the 'i' class, for example. And of course
don't use sys._getframe. But the ideas are all there.

I can't swear that _string.formatter_parser will parse all known
expressions, since that's not what it was designed to do. It will likely
fail with expressions that contain strings and braces, for example. I
haven't really checked. But hey, what do you want for free?

With a slight tweak, this code even works with 2.7: replace
"_string.formatter_parser" with "str._formatter_parser". Unfortunately,
2.7 will then only support very simple expressions. Oh, well.

Enjoy!

Eric.

Eric V. Smith

unread,

Aug 24, 2015, 11:54:59 AM8/24/15

to gu...@python.org, python...@python.org

I should have added: this is for i-strings that look like PEP 498's
f-strings. I'm not trying to jump to conclusions about the syntax: I'm
just trying to reuse some code, and making i-strings and f-strings look
like str.format strings allows me to reuse lots of infrastructure (as I
hope can be seen from this example).

For the final version, we can choose whatever syntax makes sense. I
would argue for i"Value={value}" (same for f-strings), but if we decide
to make it something else, I'll live with the decision.

Eric.

Nikolaus Rath

unread,

Aug 24, 2015, 12:30:44 PM8/24/15

to python...@python.org

On Aug 23 2015, Nick Coghlan <ncoghlan-Re5JQE...@public.gmane.org> wrote:

> On 23 August 2015 at 11:37, Nick Coghlan <ncoghlan-Re5JQE...@public.gmane.org> wrote:
>> However, I'm now coming full circle back to the idea of making this a
>> string prefix, so that would instead look like:
>>
>> subprocess.call($"echo $filename")
>>

>> The trick would be to make interpolation lazy *by default* (preserving
>> the triple of the raw template string, the parsed fields, and the
>> expression values), and put the default rendering in the resulting
>> object's *__str__* method.
>

> Indeed, after working through this latest change, I ended up back
> where I started from a syntactic perspective, with a proposal for
> i(nterpolated)-strings rather than f(ormatted)-strings:
> https://www.python.org/dev/peps/pep-0501/
>

> With appropriate modifications to subprocess.call, the proposal would
> then enable us to write a *safe* shell command interpolation as:
>
> subprocess.call(i"echo $filename")

I like the idea, but *please* stop using this example. It's just
terrible. Firstly, subprocess.call defaults to shell=False, so this
wouldn't even work. Secondly, subprocess.call('echo', filename') looks
orders of magnitude cleaner. Thirdly, your i-string wouldn't even know
how to quote because it doesn't know what shell you are using.

Best,
-Nikolaus

--
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

»Time flies like an arrow, fruit flies like a Banana.«

Eric V. Smith

unread,

Aug 24, 2015, 1:10:08 PM8/24/15

to python...@python.org

And because I can't leave well enough alone, here's an improved version.
It includes a little logging example, plus an implementation of
f-strings. Again, using f("") instead of f"".

It might only work with the hg tip (what will be 3.6). I don't have a
3.5 around to test it with. It won't work with 3.3 due to changes in
_string.formatter_parse. It's possible simpler expressions might work,
but I'm not well motivated to try it out.

Eric.

istring.py

Barry Warsaw

unread,

Aug 24, 2015, 1:12:56 PM8/24/15

to python...@python.org

On Aug 21, 2015, at 10:52 PM, Mike Miller wrote:

>Which syntax would you rather have for translation? (Knowing that you might
>give a different answer for standard interpolation.)

For i18n, $-strings (aka PEP 292, string.Template) is by far the best choice.
Translators are very familiar with the syntax, having used it now for many
years (and not just in a Python context), and it's very difficult for
non-technical folks to get wrong.

I don't see any advantages to springing yet another i18n interpolation syntax
on translators, and I definitely don't see the advantage of introducing a
*second* i18n syntax to translators of Python programs.

If that means PEP 498/501 isn't appropriate for Python i18n, so be it. What
we have now works, even if its implementation requires the use of some
frowned-upon APIs, and the use of function syntax for marking and invocation.

Cheers,
-Barry

Guido van Rossum

unread,

Aug 24, 2015, 1:39:35 PM8/24/15

to Barry Warsaw, Python-Ideas

That's fair, and I'm glad we have this clear position on the table.

I cannot accept $ interpolation in the language definition. I also don't want PEP 498 and 501 to use different interpolation syntaxes. So to me, this means that i18n is off the table as a motivation for PEP 501 (it never was on the table for 498), and Nick can focus on motivational examples from html/sql/shell code injection for PEP 501 (but only if he can live with the PEP 498 surface syntax for interpolation).

Wes Turner

unread,

Aug 24, 2015, 2:15:02 PM8/24/15

to Guido van Rossum, Barry Warsaw, Python-Ideas

On Aug 24, 2015 12:39 PM, "Guido van Rossum" <gu...@python.org> wrote:

> (...), and Nick can focus on motivational examples from html/sql/shell code injection for PEP 501 (but only if he can live with the PEP 498 surface syntax for interpolation).

f('select {date} from {tablename}')
~=
['select ', UnescapedStr(date), 'from ', UnescapedStr(tablename)]

* UnescapedUntranslatedSoencodedStr
* _repr_shell
    * quote or not?
* _repr_html
    * charset, encoding
* _repr_sql
    * WHERE x LIKE '%\%%'

>
> --
> --Guido van Rossum (python.org/~guido)
>

Petr Viktorin

unread,

Aug 24, 2015, 2:16:34 PM8/24/15

to Guido van Rossum, Barry Warsaw, Python-Ideas

The $ syntax might be a requirement for Barry, but it's definitely not
required for translations at large.
I agree that it *is* hard to introduce a new marker syntax in a
project, since any change in a string will generally require
re-translation in all languages. For flufl.i18n, $ is definitely best.
But it might not be best new projects/libraries.
Translators can get familiar with lots of things; the projects I
helped translate used %1 (Qt/KDE) or %s (C/printf).

Many Python projects (e.g. Django [0]) use "%(name)s" markers, where
translators often leave off the "s". The brace syntax would be a big
improvement.

[0] https://github.com/django/django/blob/master/django/conf/locale/en/LC_MESSAGES/django.po

Mike Miller

unread,

Aug 24, 2015, 2:21:50 PM8/24/15

to Nick Coghlan, python...@python.org

Ok thanks, I know someone out there is probably using templating to make
templating templates. But, we're getting out into the wilderness here. The
original use cases were shell scripts and "whipping up a quick string", which
I'd argue are more important.

Cheers,
-Mike

On 08/24/2015 02:48 AM, Nick Coghlan wrote:
> On 24 August 2015 at 18:51, Mike Miller <python...@mgmiller.net> wrote

>> Hi, this part I don't get, maybe because it's so late here. Why create
>> Django/Jinja2/i20n templates inside Python code using another templating
>> language (whether Template or .format)?
>>
>> Those kind of templates should be in dedicated text files, no?
>
> Think of meta-templating tools like cookie-cutter or DevAssistant (or
> the project wizards in an IDE) - for those kinds of tools, "source
> file formats" are actually output formats. Once you look at enough
> different parts of the software development pipeline you find that
> pretty much *every* input format is an output format for some other
> tool :)
>
> Cheers,
> Nick.
>

Wes Turner

unread,

Aug 24, 2015, 2:32:02 PM8/24/15

to Mike Miller, Python-Ideas

On Aug 24, 2015 1:21 PM, "Mike Miller" <python...@mgmiller.net> wrote:
>
> Ok thanks, I know someone out there is probably using templating to make templating templates. But, we're getting out into the wilderness here. The original use cases were shell scripts

Printf/str.format/str.__mod__/string concatenation are often
*dangerou;\n\s** in context to shell scripts (unless you're building a "para"+"meter" that will itself be quoted/escaped; or passing tuple cmds to eg subprocess.Popen);
which is why I would use pypi:sarge for Python 2.x+,3.x+ here.

Or yield a sequence of typed strings which can be contextually ANDed.

Barry Warsaw

unread,

Aug 24, 2015, 2:56:27 PM8/24/15

to Guido van Rossum, Python-Ideas

On Aug 24, 2015, at 10:38 AM, Guido van Rossum wrote:

>I cannot accept $ interpolation in the language definition. I also don't
>want PEP 498 and 501 to use different interpolation syntaxes. So to me,
>this means that i18n is off the table as a motivation for PEP 501 (it never
>was on the table for 498), and Nick can focus on motivational examples from
>html/sql/shell code injection for PEP 501 (but only if he can live with the
>PEP 498 surface syntax for interpolation).

I agree with this. Ignoring i18n, str.format() syntax is greatly preferred
over old-school %-syntax IMO, so focusing 498/501 on being compatible with the
former makes a lot of sense. Hopefully we can continue to make %-syntax
obsolete, deprecated, or at least disfavored.

Cheers,
-Barry

Eric V. Smith

unread,

Aug 24, 2015, 4:10:33 PM8/24/15

to python...@python.org

And here's an example with regex's, and a format_spec to say whether to
escape the text or not:

import re
def to_re(istring):
# escape the value of the embedded expressions
result = []
for part in istring.parts():
result.append(part.literal)
if part.expr is not None:
if part.format_spec == 'raw':
result.append(part.value)
else:
result.append(re.escape(part.value))
return re.compile(''.join(result))

delimiter = '+'
trailing_re = r'\S+'
regex = i(r'{delimiter}\d+{delimiter}{trailing_re:raw}')
print(to_re(regex))

If we did i-strings for real, that line would be:
regex = ri'{delimiter}\d+{delimiter}{trailing_re:raw}'

I'm not really sold on i-strings yet. But there's enough here for people
to play with.

Eric.

Nathaniel Smith

unread,

Aug 24, 2015, 4:44:30 PM8/24/15

to Guido van Rossum, Barry Warsaw, Python-Ideas

On Mon, Aug 24, 2015 at 10:38 AM, Guido van Rossum <gu...@python.org> wrote:
> I cannot accept $ interpolation in the language definition. I also don't
> want PEP 498 and 501 to use different interpolation syntaxes. So to me, this
> means that i18n is off the table as a motivation for PEP 501 (it never was
> on the table for 498), and Nick can focus on motivational examples from
> html/sql/shell code injection for PEP 501 (but only if he can live with the
> PEP 498 surface syntax for interpolation).

From the early part of this discussion [1], I had the impression that
the goal was that eventually string interpolation would be on by
default for all strings, with PEP 498 intended as an intermediate step
towards that goal. Is that still true, or is the plan now that
interpolated strings will always require an explicit marker (like
'f')?

I ask because if they *do* require an explicit marker, then obviously
the best thing is for the syntax to match that of .format. But, if
this will be enabled for all strings in Python 3.something, then it
seems like we should be careful now to make sure that the syntax is
clearly distinct from that used for .format ("${...}" or "\{...}" or
...), because anything else creates nasty compatibility problems for
people trying to write format template strings that work on both old
and new Pythons.

(This is also assuming that f-string interpolation and the eventual
plain-old-string interpolation will use the same syntax, but that
seems like a highly desirable property to me..)

-n

[1] http://thread.gmane.org/gmane.comp.python.ideas/34980

--
Nathaniel J. Smith -- http://vorpus.org

Mike Miller

unread,

Aug 24, 2015, 4:58:13 PM8/24/15

to python...@python.org

Hi, here's my latest idea, riffing on other's latest this weekend.

Let's call these e-strings (for expression), as it's easier to refer to the
letter of the proposals than three digit numbers.

So, an e-string looks like an f-string, though at compile-time, it is converted
to an object instead (like i-string):

print(e'Hello {friend}, filename: {filename}.') # converts to ==>

print(estr('Hello {friend}, filename: {filename}.', friend=friend,
filename=filename))

An estr is a subclass of str, therefore able to do the nice things a string can
do. Rendering is deferred, and it also has a raw member, escape(), and
translate() methods:

class estr(str):
# init: saves self.raw, args, kwargs for later
# methods, ops render it
# def escape(self, escape_func): # handles escaping
# def translate(self, template, safe=True): # optional i18n support

To make it as simple as possible to use by end-developers, it 1) doesn't require
str() to be run explicitly, it renders itself when needed via its various
methods and operators. Look for .raw, if you need the original.

Also, 2) a bit of responsibility is pushed to stdlib/pypi. In a handful of
sensitive places, the object is checked beforehand and escaped when needed:

def os_system(command): # imagine os.system, subprocess, dbapi, etc.
if isinstance(command, estr):
command = command.escape(shlex.quote) # each chooses its own rules
do_something(command)

This means a billion lines of code using e-strings won't have to care about
them, only a handful of places. What is easiest to type is now safe as well:

os.system(e'cat {filename}') # sleep easy

A translate method might available also (though we may have given up on i18n
already), to provide a new raw string from a message catalog:

rendered = message.translate(translated_message) # fmt syntax TBD

This should enable the safety and features we'd like, without burdening the
everyday user. I've created a sample script, here is the output:

# consider: estr('Hello {friend}, filename: {filename}.')
friend: 'John'
filename: "somefile; rm -rf ~ 'foo' <html>"

original: Hello {friend}, filename: {filename}.
print(): Hello John, filename: somefile; rm -rf ~ 'foo' <html>.

shell escape:
Hello John, filename: 'somefile; rm -rf ~ '"'"'foo'"'"' <html>'.
html escape:
Hello John, filename: somefile; rm -rf ~ 'foo' <html>.
sql escape: Hello "John", filename: "somefile; rm -rf ~ 'foo' <html>".
logger DEBUG Hello John, filename: somefile; rm -rf ~ 'foo' <html>.

upper+utf8: b"HELLO JOHN, FILENAME: SOMEFILE; RM -RF ~ 'FOO' <HTML>."
translated: Hola John, archivo: somefile; rm -rf ~ 'foo' <html>.

Anything I've missed?

-Mike

On 08/20/2015 04:10 PM, Mike Miller wrote:
> The ground seems to be settling on the issue, so I have tried my hand at a grand
> unified pep for string interpolation.

Nikolaus Rath

unread,

Aug 24, 2015, 5:29:15 PM8/24/15

to python...@python.org

On Aug 24 2015, Mike Miller <python-ideas-9N9v...@public.gmane.org> wrote:
> Also, 2) a bit of responsibility is pushed to stdlib/pypi. In a
> handful of sensitive places, the object is checked beforehand and
> escaped when needed:
>
> def os_system(command): # imagine os.system, subprocess, dbapi, etc.
> if isinstance(command, estr):
> command = command.escape(shlex.quote) # each chooses its own rules
> do_something(command)
>
> This means a billion lines of code using e-strings won't have to care
> about them, only a handful of places. What is easiest to type is now
> safe as well:
>
> os.system(e'cat {filename}') # sleep easy

*shudder*. After years of efforts to get people not to do this, you want
to change course by 180 degrees and start telling people this is ok if
they add an additional single character in front of the string?

This sounds like very bad idea to me for many reasons:

- People will forget to type the 'e', and things will appear to work
but buggy.
- People will forget that they need the 'e' (and the same thing will
happen, further reinforcing the thought that the e is not required)
- People will be confused because other languages don't have the 'e'
(hmm. how do I do this in Perl? I guess I'll just drop the
'e'... *check*, works, great!)
- People will assume that their my_custom_system() call also
special-cases e strings and escape them (which it won't).

Best,
-Nikolaus

--
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

»Time flies like an arrow, fruit flies like a Banana.«

Wes Turner

unread,

Aug 24, 2015, 5:29:43 PM8/24/15

to Mike Miller, python...@python.org

On Mon, Aug 24, 2015 at 3:57 PM, Mike Miller <python...@mgmiller.net> wrote:

Hi, here's my latest idea, riffing on other's latest this weekend.

Let's call these e-strings (for expression), as it's easier to refer to the letter of the proposals than three digit numbers.

So, an e-string looks like an f-string, though at compile-time, it is converted to an object instead (like i-string):

print(e'Hello {friend}, filename: {filename}.') # converts to ==>

print(estr('Hello {friend}, filename: {filename}.', friend=friend,
filename=filename))

An estr is a subclass of str, therefore able to do the nice things a string can do. Rendering is deferred, and it also has a raw member, escape(), and translate() methods:

class estr(str):
# init: saves self.raw, args, kwargs for later
# methods, ops render it
# def escape(self, escape_func): # handles escaping
# def translate(self, template, safe=True): # optional i18n support

* How do I overload/subclass [class estr()]?

* Does it always just read LC_ALL='utf8' (or where do I specify that global/thread/frame-local?)

* How do I escape_func?

Jinja2 uses MarkupSafe, with a class named Markup:

class Markup():
def __html__()
def __html_format__()

IPython can display objects with _repr_fmt_() callables,

which TBH I prefer because it's not name mangled

and so more easily testable. [3,4]

Existing IPython rich display methods [5,6,7,8]

_mime_map = dict(
_repr_png_="image/png",
_repr_jpeg_="image/jpeg",
_repr_svg_="image/svg+xml",
_repr_html_="text/html",
_repr_json_="application/json",
_repr_javascript_="application/javascript",
)

# _repr_latex_ = "text/latex"
# _repr_retina_ = "image/png"

Suggestd IPython methods

- [ ] _repr_shell_

- [ ] single_quote_shell_escape

- [ ] double_quote_shell_escape

- [ ] _repr_sql_ (*NOTE: SQL variants, otherworldly-escaping dependency / newb errors)

[1] https://pypi.python.org/pypi/MarkupSafe

[2] https://github.com/mitsuhiko/markupsafe

[3] https://ipython.org/ipython-doc/dev/config/integrating.html

[4] https://ipython.org/ipython-doc/dev/config/integrating.html#rich-display

[5] https://github.com/ipython/ipython/blob/master/IPython/utils/capture.py

[6] https://github.com/ipython/ipython/blob/master/IPython/utils/tests/test_capture.py

[7] https://github.com/ipython/ipython/blob/master/IPython/core/display.py

[8] https://github.com/ipython/ipython/blob/master/IPython/core/tests/test_display.py

* IPython: _repr_fmt_()

* MarkupSafe: __html__()

Guido van Rossum

unread,

Aug 24, 2015, 5:52:14 PM8/24/15

to Nathaniel Smith, Barry Warsaw, Python-Ideas

On Mon, Aug 24, 2015 at 1:44 PM, Nathaniel Smith <n...@pobox.com> wrote:

From the early part of this discussion [1], I had the impression that
the goal was that eventually string interpolation would be on by
default for all strings, with PEP 498 intended as an intermediate step
towards that goal. Is that still true, or is the plan now that
interpolated strings will always require an explicit marker (like
'f')?

That was not received well, so I think it's dead.

I ask because if they *do* require an explicit marker, then obviously
the best thing is for the syntax to match that of .format. But, if
this will be enabled for all strings in Python 3.something, then it
seems like we should be careful now to make sure that the syntax is
clearly distinct from that used for .format ("${...}" or "\{...}" or
...), because anything else creates nasty compatibility problems for
people trying to write format template strings that work on both old
and new Pythons.

Good point.

(This is also assuming that f-string interpolation and the eventual
plain-old-string interpolation will use the same syntax, but that
seems like a highly desirable property to me..)

-n

[1] http://thread.gmane.org/gmane.comp.python.ideas/34980

--
Nathaniel J. Smith -- http://vorpus.org

Mike Miller

unread,

Aug 24, 2015, 5:55:14 PM8/24/15

to Nikolaus Rath, python...@python.org

On 08/24/2015 02:28 PM, Nikolaus Rath wrote:
> *shudder*. After years of efforts to get people not to do this, you want
> to change course by 180 degrees and start telling people this is ok if
> they add an additional single character in front of the string?
>
> This sounds like very bad idea to me for many reasons:
>
> - People will forget to type the 'e', and things will appear to work
> but buggy.
> - People will forget that they need the 'e' (and the same thing will
> happen, further reinforcing the thought that the e is not required)
> - People will be confused because other languages don't have the 'e'
> (hmm. how do I do this in Perl? I guess I'll just drop the
> 'e'... *check*, works, great!)
> - People will assume that their my_custom_system() call also
> special-cases e strings and escape them (which it won't).
>

No, since the variables will not be replaced, therefore the command-line won't work.

The previous proposals ignored this altogether. A partial solution is better
than none, I think. I don't propose we document this as the recommended way,
anyway. subprocess.call('foo', shell=False) is that.

This is just a way to do the right thing in a number of common situations where
we can do it.

-Mike

Paul Moore

unread,

Aug 24, 2015, 5:55:27 PM8/24/15

to Python-Ideas

On 24 August 2015 at 22:28, Nikolaus Rath <Niko...@rath.org> wrote:
>> os.system(e'cat {filename}') # sleep easy
>
> *shudder*. After years of efforts to get people not to do this, you want
> to change course by 180 degrees and start telling people this is ok if
> they add an additional single character in front of the string?
>
> This sounds like very bad idea to me for many reasons:
>
> - People will forget to type the 'e', and things will appear to work
> but buggy.
> - People will forget that they need the 'e' (and the same thing will
> happen, further reinforcing the thought that the e is not required)
> - People will be confused because other languages don't have the 'e'
> (hmm. how do I do this in Perl? I guess I'll just drop the
> 'e'... *check*, works, great!)
> - People will assume that their my_custom_system() call also
> special-cases e strings and escape them (which it won't).

Agreed. In a convenience library where it's absolutely clear that a
shell is involved (something like sarge or invoke) this is OK, but not
in the stdlib as the "official" way to call external programs.

Also:

- People will fail to understand the difference between e'...' and
f'...' and will use the wrong one when using os.system, and things
will work correctly but with security vulnerabilities.
- Teaching Python will be complicated by needing to explain why both
f'...' and e'...' exist, and what the difference is. Trying to do that
for beginners without baffling them with discussions of security
vulnerabilities will be challenging...

Paul

Mike Miller

unread,

Aug 24, 2015, 5:59:40 PM8/24/15

to Wes Turner, python...@python.org

On 08/24/2015 02:29 PM, Wes Turner wrote:
>
> * How do I overload/subclass [class estr()]?

class wes_estr(estr):
pass

> * Does it always just read LC_ALL='utf8' (or where do I specify that
> global/thread/frame-local?)

No, I just chose that in my script to show it suppoorted str functionality
for example, .encode('utf-8'), it is not otherwise related to estr.

I should post the script.

> * How do I escape_func?

You pass in a function that does the escaping.

> Jinja2 uses MarkupSafe, with a class named Markup:
>
> class Markup():
> def __html__()
> def __html_format__()

By letting the caller set the escaping rules via passed function, estr does not
have to know anything about escaping, and is much simpler. Also the caller
could its own escaping rules.

-Mike

Mike Miller

unread,

Aug 24, 2015, 6:07:21 PM8/24/15

to Paul Moore, Python-Ideas

On 08/24/2015 02:54 PM, Paul Moore wrote:
> Agreed. In a convenience library where it's absolutely clear that a
> shell is involved (something like sarge or invoke) this is OK, but not
> in the stdlib as the "official" way to call external programs.

Don't focus on os.system(), it could be any function, and not particularly
relevant, nor do I recommend this line as the official way.

Remember Nick Coghlan's statement that the "easy way should be the right way"?
That's what this is trying to accomplish.

> - People will fail to understand the difference between e'...' and
> f'...' and will use the wrong one when using os.system, and things
> will work correctly but with security vulnerabilities.

I don't recommend e'' and f'', only e'' at this moment.

-Mike

Wes Turner

unread,

Aug 24, 2015, 6:21:57 PM8/24/15

to Mike Miller, Python-Ideas

On Mon, Aug 24, 2015 at 5:06 PM, Mike Miller <python...@mgmiller.net> wrote:

On 08/24/2015 02:54 PM, Paul Moore wrote:
> Agreed. In a convenience library where it's absolutely clear that a
> shell is involved (something like sarge or invoke) this is OK, but not
> in the stdlib as the "official" way to call external programs.

Don't focus on os.system(), it could be any function, and not particularly
relevant, nor do I recommend this line as the official way.

Remember Nick Coghlan's statement that the "easy way should be the right way"?
That's what this is trying to accomplish.

> - People will fail to understand the difference between e'...' and
> f'...' and will use the wrong one when using os.system, and things
> will work correctly but with security vulnerabilities.

I don't recommend e'' and f'', only e'' at this moment.

How would e strings prevent this:

In [1]: import subprocess
In [2]: subprocess.call('echo 1\necho 2', shell=True)
1
2
Out[2]: 0

In [3]: import sarge
In [4]: sarge.run('echo 1\necho 2')
1 echo 2
Out[4]: <sarge.Pipeline at 0x7f3e8185e790>

In [5]: sarge.shell_quote??
Signature: sarge.shell_quote(s)
Source:
def shell_quote(s):
"""
Quote text so that it is safe for Posix command shells.

For example, "*.py" would be converted to "'*.py'". If the text is
considered safe it is returned unquoted.

:param s: The value to quote
:type s: str (or unicode on 2.x)
:return: A safe version of the input, from the point of view of Posix
command shells
:rtype: The passed-in type
"""
assert isinstance(s, string_types)
if not s:
result = "''"
elif not UNSAFE.search(s):
result = s
else:
result = "'%s'" % s.replace("'", r"'\''")
return result
File: ~/.local/lib/python2.7/site-packages/sarge/__init__.py
Type: function

From a code review standpoint,

my eyes are tired

and I'd rather have more than 1 character to mistype

(because of the hamming distance between

really all of the proposed single-letter string prefixes,

and u'' and r'', and e")

Mike Miller

unread,

Aug 24, 2015, 6:26:40 PM8/24/15

to Wes Turner, Python-Ideas

In the given example it uses shlex.quote on each variable:

https://docs.python.org/dev/library/shlex.html#shlex.quote

Btw, no one has to use this form, it simply helps when someone does.

Mike Miller

unread,

Aug 24, 2015, 6:28:09 PM8/24/15

to python...@python.org

Here's the example script to demonstrate:

https://bitbucket.org/mixmastamyk/docs/src/default/pep/estring_example.py

-Mike

Nathaniel Smith

unread,

Aug 24, 2015, 6:32:51 PM8/24/15

to python...@python.org

On Mon, Aug 24, 2015 at 2:28 PM, Nikolaus Rath <Niko...@rath.org> wrote:
> On Aug 24 2015, Mike Miller <python-ideas-9N9v...@public.gmane.org> wrote:
>> Also, 2) a bit of responsibility is pushed to stdlib/pypi. In a
>> handful of sensitive places, the object is checked beforehand and
>> escaped when needed:
>>
>> def os_system(command): # imagine os.system, subprocess, dbapi, etc.
>> if isinstance(command, estr):
>> command = command.escape(shlex.quote) # each chooses its own rules
>> do_something(command)
>>
>> This means a billion lines of code using e-strings won't have to care
>> about them, only a handful of places. What is easiest to type is now
>> safe as well:
>>
>> os.system(e'cat {filename}') # sleep easy
>
> *shudder*. After years of efforts to get people not to do this, you want
> to change course by 180 degrees and start telling people this is ok if
> they add an additional single character in front of the string?

The problem is that despite years of effort trying to get people not
to do things like this, it's still the case that if you look at, say,
MITRE's ranked list of the "top 25 most dangerous software errors":

https://cwe.mitre.org/top25/index.html

then numbers #1, #2, and #4 are improper quoting. (#3 is buffer overflows.)

Or if you look at the OWASP consensus list on the most critical web
application security risks ("based on 8 datasets from 7 firms that
specialize in application security, including 4 consulting companies
and 3 tool/SaaS vendors (1 static, 1 dynamic, and 1 with both). This
data spans over 500,000 vulnerabilities..."), then numbers #1 and #3
are improper quoting:

https://www.owasp.org/index.php/Top_10_2013-Top_10

I mean, it's great that the rise of languages like Python that have
easy range-checked string manipulation has knocked buffer overflows
out of the #1 spot, but... :-)

Guido is right that the nice thing about classic string interpolation
is that its use in many languages gives us tons of data about how it
works in practice. But one of the things that data tells us is that it
actually causes a lot of problems! Do we actually want to continue the
status quo, where one set of people keep designing languages features
to make it easier and easier to slap strings together, and then
another set of people spend increasing amounts of energy trying to
educate all the users about why they shouldn't actually use those
features? It wouldn't be the end of the world (that's why we call it
"the status quo" ;-)), and trying to design something new and better
is always difficult and risky, but this seems like a good moment to
think very hard about whether there's a better way.

(And possibly about whether that better way is something we could put
up on PyPI now while the 3.6 freeze is still a year out...)

-n

--
Nathaniel J. Smith -- http://vorpus.org

Nathaniel Smith

unread,

Aug 24, 2015, 6:37:33 PM8/24/15

to Mike Miller, python...@python.org

On Mon, Aug 24, 2015 at 1:57 PM, Mike Miller <python...@mgmiller.net> wrote:
> Hi, here's my latest idea, riffing on other's latest this weekend.
>
> Let's call these e-strings (for expression), as it's easier to refer to the
> letter of the proposals than three digit numbers.
>
> So, an e-string looks like an f-string, though at compile-time, it is
> converted to an object instead (like i-string):
>
> print(e'Hello {friend}, filename: {filename}.') # converts to ==>
>
> print(estr('Hello {friend}, filename: {filename}.', friend=friend,
> filename=filename))
>
> An estr is a subclass of str, therefore able to do the nice things a string
> can do. Rendering is deferred, and it also has a raw member, escape(), and
> translate() methods:
>
> class estr(str):
> # init: saves self.raw, args, kwargs for later
> # methods, ops render it
> # def escape(self, escape_func): # handles escaping
> # def translate(self, template, safe=True): # optional i18n support
>
> To make it as simple as possible to use by end-developers, it 1) doesn't
> require str() to be run explicitly, it renders itself when needed via its
> various methods and operators. Look for .raw, if you need the original.

This is a really interesting idea.

You could potentially re-use PyUnicode_READY to do the default rendering.

Some things to think about:

- If I concatenate two e-string objects, or an e-string and a regular
string, or interpolate an e-string into an e-string, then what
happens?

- How problematic will it be that an e-string pins all the
interpolated objects in memory for its lifetime?

-n

--
Nathaniel J. Smith -- http://vorpus.org

Mike Miller

unread,

Aug 24, 2015, 6:45:50 PM8/24/15

to Nathaniel Smith, python...@python.org

On 08/24/2015 03:37 PM, Nathaniel Smith wrote:
> - If I concatenate two e-string objects, or an e-string and a regular
> string, or interpolate an e-string into an e-string, then what
> happens?

In the example url I just posted, concatenation renders each string before
concatenation, the returns a regular string with both concatenated.

If interp into interp ((boggle)), when the passed one gets formated, the
formatting operation will render it. Good test case.

> - How problematic will it be that an e-string pins all the
> interpolated objects in memory for its lifetime?

It will be an object holding a raw template string, and a number of variables.
In normal usage I don't suspect it to be a problem.

-Mike

Guido van Rossum

unread,

Aug 24, 2015, 6:46:41 PM8/24/15

to Nathaniel Smith, python...@python.org

On Mon, Aug 24, 2015 at 3:32 PM, Nathaniel Smith <n...@pobox.com> wrote:

[...]

I mean, it's great that the rise of languages like Python that have
easy range-checked string manipulation has knocked buffer overflows
out of the #1 spot, but... :-)

Guido is right that the nice thing about classic string interpolation
is that its use in many languages gives us tons of data about how it
works in practice. But one of the things that data tells us is that it
actually causes a lot of problems! Do we actually want to continue the
status quo, where one set of people keep designing languages features
to make it easier and easier to slap strings together, and then
another set of people spend increasing amounts of energy trying to
educate all the users about why they shouldn't actually use those
features? It wouldn't be the end of the world (that's why we call it
"the status quo" ;-)), and trying to design something new and better
is always difficult and risky, but this seems like a good moment to
think very hard about whether there's a better way.

Or maybe from the persistence of quoting bugs we could conclude that the ways people slap strings together have very little effect on this category of bugs?

(And possibly about whether that better way is something we could put
up on PyPI now while the 3.6 freeze is still a year out...)

Barry Warsaw

unread,

Aug 24, 2015, 8:20:42 PM8/24/15

to python...@python.org

On Aug 24, 2015, at 11:55 AM, Eric V. Smith wrote:

>I should have added: this is for i-strings that look like PEP 498's
>f-strings. I'm not trying to jump to conclusions about the syntax:

I remember something else about $-strings, based on Mailman's experience.
Originally we also used %(foo)s strings, but when that reached the breaking
point (and PEP 292 was implemented), we changed to $-strings. At that point
we had to provide an upgrade path for settings with the original %-strings.

It turns out to not be too difficult to translate between them. It would
probably not be difficult to translate from $foo to {foo} either, so with a
properly defined hook, the porcelain could use $-strings while all the
underlying machinery could still use {}-strings. It would probably have to be
roughly limited to simple name lookups with dot-chasing, and maybe it's not
worth it.

Cheers,
-Barry

Eric V. Smith

unread,

Aug 24, 2015, 9:50:49 PM8/24/15

to Andrew Barnert, python...@python.org

On 08/24/2015 07:55 PM, Andrew Barnert wrote:

> On Aug 24, 2015, at 08:14, Eric V. Smith <er...@trueblade.com> wrote:
>>
>>> On 08/23/2015 09:13 PM, Guido van Rossum wrote:
>>> But for i-strings, I think it would be good if we could gather more
>>> actual experience using them. Every potential use case brought up for
>>> these so far (translation, html/shell/sql quoting) feels like there's a
>>> lot of work needing to be done to see if the idea is actually viable
>>> there. It would be a shame if we added all the (considerable!) machinery
>>> for i-strings and all we got was yet another way to do it
>>> (https://xkcd.com/927/), without killing at least one competing approach
>>> (similar to the way .format() has failed to replace %).
>>>
>>> It's tough to envision how we could gather more experience with
>>> i-strings *without* building them into the language, but I'm really
>>> hesitant to add them without more experience. (This is the "new on the
>>> job market" paradox. :-) Maybe they could be emulated using a function
>>> call that uses sys._getframe() under the covers? Or maybe it's possible
>>> to cook up an experiment using other syntax hooks? E.g. the coding hack
>>> used in pyxl (https://github.com/dropbox/pyxl).[1]
>>
>>
>> I hope you don't mind that I borrowed the keys to the time machine. I'm
>> using the implementation of _string.formatter_parser() that I added for
>> implementing string.Formatter:
>

> Nifty! When I get a chance, I'll slap this together with an import hook using the untokenize hack, so I can actually play with i-strings (and f-strings) with the proposed syntax without needing a patch. If it looks good, I can write a real implementation that doesn't have all the untokenize problems, which could also eliminate the need for _getframe. (To make it backportable to 3.3/2.7 we'd still need to backport formatter_parser, right? But that still seems like something that could be done and posted on PyPI.)

I don't know what the untokenize problems are, so I'm not sure I can
help there.

I also don't think I'd base any real implementation on
_string.formatted_parser: it won't be terribly efficient. I've created a
project on bitbucket: https://bitbucket.org/ericvsmith/istring where I'm
playing with a "join" method and a callback interface, without ever
exposing the looping and parsing to the caller. I think that would be a
better interface than an iterator exposing the various parts of the string.

But, as Guido suggests above, it's all just an academic exercise to
understand how to best use i-strings. I suggest providing feedback on
their API before implementing anything more serious.

Eric.

Nikolaus Rath

unread,

Aug 24, 2015, 10:06:02 PM8/24/15

to python...@python.org

On Aug 24 2015, Mike Miller <python-ideas-9N9v...@public.gmane.org> wrote:

> On 08/24/2015 02:28 PM, Nikolaus Rath wrote:
>> *shudder*. After years of efforts to get people not to do this, you want
>> to change course by 180 degrees and start telling people this is ok if
>> they add an additional single character in front of the string?
>>
>> This sounds like very bad idea to me for many reasons:
>>
>> - People will forget to type the 'e', and things will appear to work
>> but buggy.
>> - People will forget that they need the 'e' (and the same thing will
>> happen, further reinforcing the thought that the e is not required)
>> - People will be confused because other languages don't have the 'e'
>> (hmm. how do I do this in Perl? I guess I'll just drop the
>> 'e'... *check*, works, great!)
>> - People will assume that their my_custom_system() call also
>> special-cases e strings and escape them (which it won't).
>>
>
> No, since the variables will not be replaced, therefore the
> command-line won't work.

How is that compatible with your statement that

> This means a billion lines of code using e-strings won't have to care
> about them, only a handful of places.

Either str(estr) performs interpolation (so billions of lines of code
don't have to change, and my custom system()-like call get's an
interpolated string as well until I change it to be estr-aware), or it
does not (and billions of lines of code will break when they
unexpectedly get an estr instead of a str).

Best,
-Nikolaus

--
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

»Time flies like an arrow, fruit flies like a Banana.«

Ron Adam

unread,

Aug 24, 2015, 10:24:14 PM8/24/15

to python...@python.org

On 08/24/2015 06:45 PM, Mike Miller wrote:
>> - How problematic will it be that an e-string pins all the
>> interpolated objects in memory for its lifetime?
>
> It will be an object holding a raw template string, and a number of
> variables. In normal usage I don't suspect it to be a problem.

If an objects __str__ method could have an optional fmt='spec' argument,
then an estring, could just hold strings, and not the object references.
That also prevent surprises if the object is mutated between the time
it's estring is created and when the estring is used as a string. For
that matter it prevents an estring from printing one way at one time,
and another at another time.

I don't know if the fomatting can be split like this... Where an object
is formatted to a string representation, and then that is formatted to a
field specification. The later being things like width, fill, right,
center, and left. These are independent of the object and belong to
the string. Things like nubmer of places and sign or to use leading or
trailing zeros is part of the object being converted to a string.

Cheers,
Ron

Mike Miller

unread,

Aug 24, 2015, 10:37:12 PM8/24/15

to python...@python.org

On 08/24/2015 07:05 PM, Nikolaus Rath wrote:
> How is that compatible with your statement that
>
>> This means a billion lines of code using e-strings won't have to care
>> about them, only a handful of places.
>
> Either str(estr) performs interpolation (so billions of lines of code
> don't have to change, and my custom system()-like call get's an
> interpolated string as well until I change it to be estr-aware), or it
> does not (and billions of lines of code will break when they
> unexpectedly get an estr instead of a str).
>

Not sure I understand... your system_like() call already accepts strings that
could be formatted?

The estr adds a protection (by escaping variables) that didn't exist in the
past. It is not removing any protections or best practices. It is therefore
safer than the f-string version, but you read additional protection as more
dangerous, perhaps because someone in the future might get lazy. Is that right?

But, people are already lazy (in a manner...), so it looks like a small win to me.

By "don't have to care" I don't mean we throw out best practices, only that
doing the right thing (rephrased as, not doing the wrong thing) becomes easier,
as Nick C. taught is a good idea in his PEP.

Any future docs certainly won't be shouting, "do this with os.system!!! It's
safe now!!" They will still direct to subprocess.call().
In fact I'm sorry I mentioned os.system at all, it's just a few hours ago
someone chewed out Nick C. for using subprocess.call() in his examples. ;)

-Mike

Eric V. Smith

unread,

Aug 24, 2015, 10:47:02 PM8/24/15

to Ron Adam, python...@python.org

> On Aug 24, 2015, at 10:23 PM, Ron Adam <ron...@gmail.com> wrote:
>
> On 08/24/2015 06:45 PM, Mike Miller wrote:
>>> - How problematic will it be that an e-string pins all the
>>> interpolated objects in memory for its lifetime?
>>
>> It will be an object holding a raw template string, and a number of
>> variables. In normal usage I don't suspect it to be a problem.
>
> If an objects __str__ method could have an optional fmt='spec' argument, then an estring, could just hold strings, and not the object references. That also prevent surprises if the object is mutated between the time it's estring is created and when the estring is used as a string. For that matter it prevents an estring from printing one way at one time, and another at another time.
>
> I don't know if the fomatting can be split like this... Where an object is formatted to a string representation, and then that is formatted to a field specification. The later being things like width, fill, right, center, and left. These are independent of the object and belong to the string. Things like nubmer of places and sign or to use leading or trailing zeros is part of the object being converted to a string.

It's not possible. For examples, look at all of the number format options. How would you implement hex conversions? Or datetime %A?

Eric.

Greg Ewing

unread,

Aug 25, 2015, 12:46:23 AM8/25/15

to python...@python.org

Eric V. Smith wrote:
> An f-string would be shorthand for str(i-string).

If I understand correctly, the point of i-strings would
be to make it easy to do things like sql argument
interpolation the right way.

But if sql(f-string) is still legal (as it seems like
it would have to be for quite a while to come, for
backwards compatibility) then the wrong way is still
just as easy as the right way, and no less obvious
(what do the letters "f" and "i" have to do with sql?).

So it seems to me that having both f-strings and
i-strings will just add a lot of complication and
confusion without really helping anything.

--
Greg

Nick Coghlan

unread,

Aug 25, 2015, 2:23:24 AM8/25/15

to Steven D'Aprano, python...@python.org

On 24 August 2015 at 22:00, Steven D'Aprano <st...@pearwood.info> wrote:
> I mean, I know how to get a closure in general terms, e.g.:
>
> [(lambda : i) for i in range(10)]
>
> but I'm not seeing where you would get a closure *specifically* in
> this situation with your defer function.

I was wrong when I though you could do this trick with f-strings - you
need the delayed interpolation offered by PEP 501's i-strings in order
to access the original objects directly.

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Nick Coghlan

unread,

Aug 25, 2015, 2:35:46 AM8/25/15

to Paul Moore, Eric V. Smith, Python-Ideas

On 25 August 2015 at 01:03, Paul Moore <p.f....@gmail.com> wrote:
> On 24 August 2015 at 13:41, Eric V. Smith <er...@trueblade.com> wrote:
>> On 08/24/2015 07:35 AM, Paul Moore wrote:
>>> I'm once again losing the thread of all the variations being proposed.
>>>
>>> As a reality check, is the expectation that something like the
>>> following will still be possible:
>>>
>>> print(f"Iteration {n}: Duration {end-start} seconds")
>>
>> Yes, that's the PEP 498 proposal. I think (and this is just my opinion)
>> that if we do something more complicated, like the delayed interpolation
>> of i-strings, that we'd still keep f-strings.
>
> OK. That's my point, essentially - the discussion has drifted into
> much more complex areas, with comments about how the wider-ranging
> proposals cover the f-string case as a subset, and I just wanted to be
> sure that there wasn't an implied "so we don't need f-strings any
> more" in there. (Nick at one point spoke quite strongly against adding
> multiple ways of doing the same thing).

That was before my proposed design converged on being a potential
implemention detail of Eric's, though :)

Now we have the option of adding types.InterpolationTemplate as an
implementation detail of f-strings, and then deciding *later* whether
we want to allow creating of interpolation templates with deferred
rendering.

In that regard, Guido suggested that I split PEP 501 into two
different PEPs, one for deferred rendering (which could be done as an
implementation detail of f-strings, with f"templated {text}" being
shorthand for format(i"templated {text}")), and another for
$-substitution over {}-substitution (which would be a competing
proposal for the surface syntax of the substitution expressions). I
think that's a good idea, so I'll do that some time this week (not
sure when, though)

Paul Moore

unread,

Aug 25, 2015, 4:29:46 AM8/25/15

to Mike Miller, Python-Ideas

On 24 August 2015 at 23:04, Mike Miller <pytho...@mgmiller.net> wrote:
> On 08/24/2015 02:54 PM, Paul Moore wrote:
>>

>> Agreed. In a convenience library where it's absolutely clear that a
>> shell is involved (something like sarge or invoke) this is OK, but not
>> in the stdlib as the "official" way to call external programs.
>

> Hmm, don't focus on os.system(), it could be any function, and not

> particularly relevant, nor do I recommend this line as the official way.

Well, can you use an example that isn't misleading in its security
implications? Specifically, I assumed from your use of os.system that
you were proposing that the stdlib function (specifically in this
case, a function that we've been trying to deprecate in favour of more
secure alternatives for years) be updated to understand e-strings.

> Remember Nick Coghlan's statement that the "easy way should be the right
> way"? That's what this is trying to accomplish.

But the right way is not to use os.system, so I don't *want* it to be
easy. If you have a better example than running shell commands, please
explain. (If your example is running full-blown shell syntax, rather
than single commands, please give a more complicated example and we
can let the debate explode into one about portability of shell
constructs - but os.system to run a single command with a set of
arguments is *wrong* and subprocess.Popen was created to replace it
with a cross-platform, secure by default, solution).

>> - People will fail to understand the difference between e'...' and
>> f'...' and will use the wrong one when using os.system, and things
>> will work correctly but with security vulnerabilities.
>
>

> I don't recommend e'' and f'', only e'' at this moment.

Then I'm strongly against this. As I've stated on a number of
occasions, to me the crucial main use of any variation on this
proposal is

print(f"Iteration {n}: Duration {end-start}")

If your e-string proposal works for this (via some consequence of
implicitly calling str()) then it may still be on the cards - but the
need for explicit str() calls in pathlib is a source of frustration
there, so I'd like to be 100% sure that your proposal doesn't result
in a need for explicit str() calls anywhere before accepting that
e-strings can replace f-strings.

By the way, the terminology in this thread (e-strings, f-strings,
i-strings...) is dreadful. We need names that capture the essential
differences (I've already proposed "format strings" for f-strings).
Naming is important!

Eric V. Smith

unread,

Aug 25, 2015, 9:53:32 AM8/25/15

to python...@python.org

On 08/24/2015 06:37 PM, Nathaniel Smith wrote:
> On Mon, Aug 24, 2015 at 1:57 PM, Mike Miller <python...@mgmiller.net> wrote:
>> Hi, here's my latest idea, riffing on other's latest this weekend.
>>
>> Let's call these e-strings (for expression), as it's easier to refer to the
>> letter of the proposals than three digit numbers.
>>
>> So, an e-string looks like an f-string, though at compile-time, it is
>> converted to an object instead (like i-string):
>>
>> print(e'Hello {friend}, filename: {filename}.') # converts to ==>
>>
>> print(estr('Hello {friend}, filename: {filename}.', friend=friend,
>> filename=filename))
>>
>> An estr is a subclass of str, therefore able to do the nice things a string
>> can do. Rendering is deferred, and it also has a raw member, escape(), and
>> translate() methods:
>>
>> class estr(str):
>> # init: saves self.raw, args, kwargs for later
>> # methods, ops render it
>> # def escape(self, escape_func): # handles escaping
>> # def translate(self, template, safe=True): # optional i18n support
>>
>> To make it as simple as possible to use by end-developers, it 1) doesn't
>> require str() to be run explicitly, it renders itself when needed via its
>> various methods and operators. Look for .raw, if you need the original.
>
> This is a really interesting idea.
>
> You could potentially re-use PyUnicode_READY to do the default rendering.

I doubt you could get this to work, although feel free to prove me
wrong. I think you'll end up with the same decision Pathlib made (PEP
428): don't derive from str.

> Some things to think about:
>
> - If I concatenate two e-string objects, or an e-string and a regular
> string, or interpolate an e-string into an e-string, then what
> happens?
>
> - How problematic will it be that an e-string pins all the
> interpolated objects in memory for its lifetime?

Well, it seems to work for logging, but those don't tend to stay around
very long. But this is one of the reasons to play with a sample
implementation, to understand these sorts of issues.

Eric.

Nikolaus Rath

unread,

Aug 25, 2015, 11:03:05 AM8/25/15

to python...@python.org

On Aug 24 2015, Mike Miller <python-ideas-9N9v...@public.gmane.org> wrote:

> On 08/24/2015 07:05 PM, Nikolaus Rath wrote:
>> How is that compatible with your statement that
>>
>>> This means a billion lines of code using e-strings won't have to care
>>> about them, only a handful of places.
>>
>> Either str(estr) performs interpolation (so billions of lines of code
>> don't have to change, and my custom system()-like call get's an
>> interpolated string as well until I change it to be estr-aware), or it
>> does not (and billions of lines of code will break when they
>> unexpectedly get an estr instead of a str).
>>
>
> Not sure I understand... your system_like() call already accepts
> strings that could be formatted?

I'm talking about someone who has implemented a function (for whatever
reason) that behaves like os.system(). Say something like this (probably
the calls are all wrong because I didn't look them up, but I trust
everyone knows what I mean):

def nonblocking_system(cmd):
if os.fork() == 0:
os.exec('/bin/sh', '-c', cmd)

With this function, people have to be really careful about injection
vulnerabilities - just like with os.system():

os.system('rm %s' % file) # danger!
nonblocking_system('rm %s' % file) # danger!

But now you're proposing that os.system() get's support for e-strings,
which are then properly quoted. Now we have this:

os.system(e'rm {file}') # ok
nonblocking_system(e'rm {file}') # you'd think it's ok, but it's not

I think this is a terrible situation, because you can never be quite
sure where an e-string is ok (because the function is prepared for it),
and where it will act just like a string.

> The estr adds a protection (by escaping variables) that didn't exist
> in the past. It is not removing any protections or best practices.

No, but it muddles the water as to what is good and what is bad
practice. 'rm {file}' has always been bad practice, but with e-strings
e'rm {file}' may or may not be bad practice, depending what you do with
it.

Best,
-Nikolaus

--
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

»Time flies like an arrow, fruit flies like a Banana.«

Mike Miller

unread,

Aug 25, 2015, 1:50:00 PM8/25/15

to python...@python.org

On 08/25/2015 08:02 AM, Nikolaus Rath wrote:
> No, but it muddles the water as to what is good and what is bad
> practice. 'rm {file}' has always been bad practice, but with e-strings
> e'rm {file}' may or may not be bad practice, depending what you do with
> it.

It would be bad practice since the function is deprecated, or just discouraged.

But, are you implying that the escaping could be bypassed? Would that be possible?

-Mike

Mike Miller

unread,

Aug 25, 2015, 2:02:59 PM8/25/15

to Paul Moore, Python-Ideas

On 08/25/2015 01:29 AM, Paul Moore wrote:
>> Remember Nick Coghlan's statement that the "easy way should be the right
>> way"? That's what this is trying to accomplish.
>
> But the right way is not to use os.system, so I don't *want* it to be

Ok, a few hours before someone complained to Nick that he was using
subprocess.call as an example when it didn't completely apply. So I moved to
the other alternative example that could be helped, os.system. I have no
particular love for it, and am not recommending it. It was just one function
out of many that needs input to be escaped as far as I was concerned.

I didn't forsee that that the function would be focused on to the point of the
derailing the idea. I suppose I'll try again if you'll bear with me.

> If your e-string proposal works for this (via some consequence of
> implicitly calling str()) then it may still be on the cards - but the
> need for explicit str() calls in pathlib is a source of frustration

In my original message (of this sub-thread) this is one of the main paragraphs:

> To make it as simple as possible to use by end-developers, it 1) doesn't require
> str() to be run explicitly, it renders itself when needed via its various
> methods and operators. Look for .raw, if you need the original.

Also if you check the example script at the bitbucket url, you'll see it is the
case, though I've not yet implemented every case.

-Mike

Mike Miller

unread,

Aug 25, 2015, 2:08:10 PM8/25/15

to Paul Moore, Python-Ideas

On 08/25/2015 01:29 AM, Paul Moore wrote:

> By the way, the terminology in this thread (e-strings, f-strings,
> i-strings...) is dreadful. We need names that capture the essential
> differences (I've already proposed "format strings" for f-strings).
> Naming is important!

Agreed, I have said the same in the context of the written PEPs, however in
informal conversation, I think f,i, and e, are convenient short-hand for the
various ideas.

In my PEP draft you'll see no mention of -strings.

-Mike

Eric V. Smith

unread,

Aug 25, 2015, 2:36:27 PM8/25/15

to python...@python.org

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

In https://bitbucket.org/ericvsmith/istring, in i18n.py, I've added
the awesomely named convert_istring_format_to_dollar_format(). It also
checks that you've only used identifiers and not specified a
format_spec or a conversion character (exact specs TBD). I've not
implemented the reverse function. I imagine you'd convert to $ format
as part of extracting the strings from the source, do the translation,
then convert back as part of building the translation database.

It also shows how to implement _() with i-strings, including safe
substitution required by a bad translation.

I also have examples for logging and building up regex's from
i-strings. I'm mainly using this to investigate the best API for
i-strings. So far, I just have one method, join, that takes some
callbacks. It also lets you substitute alternate strings, as needed
for the _() examples.

But this is all just an experiment. I'm not sold at all on the concept
of i-strings (and even less so on the nearly equivalent e-strings).

Eric.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)

iQEcBAEBAgAGBQJV3LXGAAoJENxauZFcKtNxtu8H/1Sqrr8gyDIQ5piBPj77Hh3E
285Mmk9wrqgd9Xl3dLJBIb5p0H6GvMQi3DezGHDIBpqPBQneA+1cNpMuFJL07WKw
tDXxsqacsiXPdxA9qx+iLP6cb1mwpsC3OtURZDPeVZPU6Ic/aIRk1DdShBleIlH6
v/X6BMQz0mrI/PpI364jo39hUr81iU0XWExeiigOWZu//nkjV+WeOUbdpQCBYl2M
VEpGl5f2TlY0O85MBFdPc8RKGnROq7OyLhi8SvY+gknGPhwMI+gGeh19vyUPpKfW
CEqDju5KWmYW7sCJ0e7JQ+Z5IvSBIAgQoJmfxibW4rhLbc73YwlaGaoYwt831lM=
=Drm6
-----END PGP SIGNATURE-----

Nikolaus Rath

unread,

Aug 25, 2015, 2:40:49 PM8/25/15

to python...@python.org

On Aug 25 2015, Mike Miller <python-ideas-9N9v...@public.gmane.org> wrote:
> On 08/25/2015 08:02 AM, Nikolaus Rath wrote:
>> No, but it muddles the water as to what is good and what is bad
>> practice. 'rm {file}' has always been bad practice, but with e-strings
>> e'rm {file}' may or may not be bad practice, depending what you do with
>> it.
>
> It would be bad practice since the function is deprecated, or just
> discouraged.

What function?

> But, are you implying that the escaping could be bypassed? Would that
> be possible?

According to you, yes. Just look at your example:

| def os_system(command): # imagine os.system, subprocess, dbapi, etc.
| if isinstance(command, estr):
| command = command.escape(shlex.quote) # each chooses its own rules
| do_something(command)

So any function that doesn't special-case estr will "bypass" the
escaping and pass it do it's version of the do_something() function
without quoting.

Best,
-Rikolaus

--
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

»Time flies like an arrow, fruit flies like a Banana.«

Mike Miller

unread,

Aug 25, 2015, 2:54:36 PM8/25/15

to python...@python.org

On 08/25/2015 11:40 AM, Nikolaus Rath wrote:
> So any function that doesn't special-case estr will "bypass" the
> escaping and pass it do it's version of the do_something() function
> without quoting.

Yes, system(command % dangerous) was dangerous and will still be. Confining
input to e-strings is probably not practical. That's a good point.

-Mike

Mike Miller

unread,

Aug 25, 2015, 3:07:32 PM8/25/15

to python...@python.org

TL;DR: (Version 2, hopefully more clear)

Let's discuss whether to make "doing the right thing as easy as doing the wrong
thing" a desired goal for string interpolation.

Details -- we could:

1) Automatically escape potentially dangerous input variables to sensitive
functions, or
2) Make developers do it the hard way, making them completely responsible
for safety, and always responsible.
(Knowing that often they don't).
3) Some combination of the two.

A trivial implementation of 1) is below. Instead of rendering the string
immediately, it is deferred until use, with template and parameters stashed
inside an object, allowing the receiver to specify escaping/quoting rules.

---------------------------------

Let's call these e-strings (for expression), as it's easier to refer to the
letter of the proposals than three digit numbers.

So, an e-string looks like an f-string, though at compile-time, it is converted

to an object instead (like an i-string):

print(e'Hello {friend}, filename: {filename}.') # converts to ==>

print(estr('Hello {friend}, filename: {filename}.', friend=friend,
filename=filename))

An estr is a subclass of str, therefore able to do the nice things a string can

do. Rendering is deferred until the variable is used, and it also has a .raw

member, escape(), and translate() methods:

class estr(str):
# init: saves self.raw, args, kwargs for later
# methods, ops render it
# def escape(self, escape_func): # handles escaping
# def translate(self, template, safe=True): # optional i18n support

To make it as simple as possible to use by end-developers, it:

1) Doesn't require str() to be run explicitly, it renders itself when

needed via its various methods and operators.

Look for .raw, if you need the original. Also,

2) A bit of responsibility is pushed to stdlib/pypi. In a handful of

sensitive places, the object is checked beforehand and escaped when
needed:

# imagine html, db, subprocess input etc.
def sensitive_func_that_escapes(input):
if isinstance(input, estr):
input = input.escape(shlex.quote) # each chooses its own rules
do_something(input)

This means numerous callers using e-strings won't have to do explicit escaping,
only a handful of callee libraries will--which is common with database apis, for
example. What is easiest to type is now safe as well::

sensitive_func_that_escapes_input(e'user input: {input}') # sleep easy

This could enable the safety and features we'd like, without burdening the
everyday user. I've created a sample script to demonstrate at:

https://bitbucket.org/mixmastamyk/docs/src/default/pep/estring_example.py

Here is the output:

# consider: e'Hello {friend}, filename: {filename}.'
friend: 'John'
filename: "somefile; rm -rf ~ 'foo' <html>"

original: Hello {friend}, filename: {filename}.
w/ print(): Hello John, filename: somefile; rm -rf ~ 'foo' <html>.

shell escape:
Hello John, filename: 'somefile; rm -rf ~ '"'"'foo'"'"' <html>'.
html escape:
Hello John, filename: somefile; rm -rf ~ 'foo' <html>.
sql escape: Hello "John", filename: "somefile; rm -rf ~ 'foo' <html>".
logger DEBUG Hello John, filename: somefile; rm -rf ~ 'foo' <html>.

upper+encode: b"HELLO JOHN, FILENAME: SOMEFILE; RM -RF ~ 'FOO' <HTML>."
translated?: Hola John, archivo: somefile; rm -rf ~ 'foo' <html>.

Is this automatic escaping desired? Or should we continue to make the
end-developer fully responsible for escaping input?