Ok, not exactly it was this one, that I may have misunderstood:
https://mail.python.org/pipermail/python-ideas/2015-August/035347.html
On 08/20/2015 05:57 PM, Mike Miller wrote:
I found the b'' idea on a recent message here between you and Nick I think, it
seemed interesting. It's gone now, as well as the typo, thanks MRAB.
On Aug 20, 2015 23:40, "Nick Coghlan" <ncog...@gmail.com> wrote:
>
[...]
> myquery = i"SELECT $column FROM $table;"
> mycommand = i"cat $filename"
> mypage = i"<html><body>$content</body></html>"
>
> It's the opposite of the "interpolating untrusted strings that may
> contain aribtrary expressions" problem - what happens when the
> variables being *substituted* are untrusted? It's easy to say "don't
> do that", but if doing the right thing incurs all the repetition
> currently involved in calling str.format, we're going to see a *lot*
> of people doing the wrong thing. At that point, the JavaScript
> backticks-with-arbitrary-named-callable solution starts looking very
> attractive:
>
> myquery = sql`SELECT $column FROM $table;`
> mycommand = sh`cat $filename`
> mypage = html`<html><body>$content</body></html>`
Surely if using backticks we would drop the ugly prefix syntax and just make it a function call?
myquery = sql(`SELECT $column FROM $table;`)
etc., where `...` returns an object with the string and substitution info inside it.
I can certainly appreciate the argument that safe quoting for string interpolation deserves as much attention at the language level in 2015 as buffer overflow checking deserved back in the day.
Taking that problem seriously though is perhaps an argument against even having a trivial string version, because if it's legal then people will still write
do_sql("SELECT $column FROM $table;")
instead and the only way to get them to consistently use delayed (safe) evaluation would be to constantly educate and audit, which is the opposite of good design for security and exactly the problem we have now. Really what we want from this perspective is that it should be *harder* to get it wrong than to get it right.
Maybe simple no-quoting interpolation should be spelled
str(`hello $planet`)
(or substitute favorite prefix tag if allergic to backticks), so you have to explicitly specify a quoting syntax even if only to say that you want the null syntax.
Alternatively I guess it would be enough if interfaces like our hypothetical sql(...) simply refused to accept raw strings and required delayed interpolation objects only, even for static/constant queries. But I'm unconvinced that this would happen, given the number of preexisting APIs that already accept strings, and the need to continue supporting pre-3.6 versions of python.
-n
On Aug 21, 2015 5:59 PM, "Nick Coghlan" <ncog...@gmail.com> wrote:
>
> On 22 August 2015 at 05:52, Mike Miller <python...@mgmiller.net> wrote:
> > Yes, we were discussing these custom prefixes in Yuri's thread yesterday,
> > but Guido dropped a big -1 there. However, you Eric and Nick make some
> > compelling arguments in favor of them; they do solve several of our
> > outstanding issues.
> >
> > Would he be able to be persuaded to change his mind?
>
> It's also worth reiterating my concept of using "!" to introducing the
> arbitrary "magic happens here" prefixes. That is, you'd write them
> like this:
>
> myquery = !sql"SELECT $column FROM $table;"
> mycommand = !sh"cat $filename"
> mypage = !html"<html><body>$content</body></html>"
>
> I'd previously suggested a syntax along those lines for full compile
> time AST manipulation where the compiler also had to be made aware of
> the prefix names somehow, but I think the proposals that have evolved
> around f-strings make it possible to instead resolve the named
> reference at runtime, while still having the compiler handle the
> subexpression extraction and evaluation.
So, str subclasses with _repr_sql_ functions that sometimes serialize and translate differently based on ~threadlocals for SQL variant, lang, charset
; and a new syntax for str.format(**globals()+locals())?
On 23 August 2015 at 02:16, Guido van Rossum <gu...@python.org> wrote:
> On Sat, Aug 22, 2015 at 3:09 AM, Nick Coghlan <ncog...@gmail.com> wrote:
>> (Similar to yield, it is proposed that interpolation expressions would
>> require parentheses when embedded inside a larger expression)
>
>
> 1. That's an entirely different proposal, you're just reusing the PEP
> number.
It's aiming to solve the same basic problem though, which is the
aspect I consider most important when tackling a design question. The
discussions following the posting of my first draft highlighted some
real limitations of my original design both at a semantic level and at
a motivational level, so I changed it in place rather than introducing
yet another PEP on the same topic (Mike Miller's draft PEP was an
excellent synthesis, but there's no way he could account for the fact
that 501 was still only a first draft).
> 2. Have I died and gone to Perl?
That's my question in relation to PEP 498 - it seems to introduce lots
of line noise for people to learn to read for little to no benefit (my
perspective is heavily influenced by the fact that most of the code I
write myself these days consists of network API calls + logging
messages + UI template rendering, with only very occasional direct
calls to str.format that use anything more complicated than "{}" or
"{!r}" as the substitution field).
As a result, I'd be a lot more comfortable with PEP 498 if it had more
examples of potential practical use cases, akin to the examples
section from PEP 343 for context managers.
While the second draft of PEP 501 is even more line-noisy than PEP 498
due to the use of both "!" and "$", it at least generalises the
underlying semantics of compiler-assisted interpolation to apply to
additional use cases like logging, i18n (including compatibility
with Mozilla's l20n syntax), safe SQL interpolation, safe shell
command interpolation, HTML template rendering, etc.
For the third draft, I'll take another pass at the surface syntax - I
like the currently proposed semantics, but agree the current spelling
is overly sigil heavy.
On 23 August 2015 at 08:50, Guido van Rossum <gu...@python.org> wrote:
> OTOH this topic is rich enough that I have no problem spending a few more
> PEP numbers on it. If Mike asks for a PEP number I am not going to withhold
> it.
Aye, agreed - at the very least, we want to preserve his survey of
interpolation in other languages, as I found that to be an incredibly
valuable contribution.
>> > 2. Have I died and gone to Perl?
>>
>> That's my question in relation to PEP 498 - it seems to introduce lots
>> of line noise for people to learn to read for little to no benefit (my
>> perspective is heavily influenced by the fact that most of the code I
>> write myself these days consists of network API calls + logging
>> messages + UI template rendering, with only very occasional direct
>> calls to str.format that use anything more complicated than "{}" or
>> "{!r}" as the substitution field).
>>
>> As a result, I'd be a lot more comfortable with PEP 498 if it had more
>> examples of potential practical use cases, akin to the examples
>> section from PEP 343 for context managers.
>
> Since you accept "!r", you must be asking about the motivation for including
> ":spec", right?
Sorry, I wasn't clear - PEP 501 also retains the field formatting
capabilities, and is hence strictly "noisier" than PEP 498 (especially
the ! prefix version of the syntax). It's just that it solves enough
*other* problems for it to seem worth the cost to me.
When the benefit
is "str.format is prettier, all other forms of interpolation remain
repetitively verbose",
it seems a very invasive change just to
replace:
print("Chopped {} onions in {:.3f} seconds.".format(n, t1-t0))
with:
print(f"Chopped {n} onions in {t1-t0:.3f} seconds.")
>> While the second draft of PEP 501 is even more line-noisy than PEP 498
>> due to the use of both "!" and "$", it at least generalises the
>> underlying semantics of compiler-assisted interpolation to apply to
>> additional use cases like logging, i18n (including compatibility
>> with Mozilla's l20n syntax), safe SQL interpolation, safe shell
>> command interpolation, HTML template rendering, etc.
>
>
> That's perhaps a bit *too* ambitious. The claim of "safety" for PEP 498 is
> simple -- it does not provide a way for a dynamically generated string to
> access values in the current scope (and it does this by not supporting
> dynamically generated strings). For most domains you mention, safety is much
> more complex, and in fact mostly orthogonal -- code injection attacks rely
> on the value of the interpolated variables, so PEP 498's "safety" does not
> help at all.
Right, but that's where I came to the conclusion that the lack of
arbitrary interpolation support ends up making PEP 498 actively
dangerous, as string interpolation based substitution ends up being so
much prettier than doing things right. Compare:
os.system(f"echo {filename}")
subprocess.call(f"echo {filename}")
subprocess.call(["echo", filename])
Even in that simple case, the two unsafe approaches are much nicer to
read, and as the command line gets more complex, the safe version gets
harder and harder to read relative to the unsafe ones.
With the latest PEP 501 draft (which switched the proposed syntax and
semantics to behave more like a traditional binary operator), we could
make invoking a subprocess *safely* look like:
subprocess.call $"echo $filename"
However, I'm now coming full circle back to the idea of making this a
string prefix, so that would instead look like:
subprocess.call($"echo $filename")
The trick would be to make interpolation lazy *by default* (preserving
the triple of the raw template string, the parsed fields, and the
expression values), and put the default rendering in the resulting
object's *__str__* method.
That description is probably as clear as mud, though, so back to the
PEP I go! :)
>>> s = select([(users.c.fullname + ... ", " + addresses.c.email_address). ... label('title')]).\ ... where(users.c.id == addresses.c.user_id).\ ... where(users.c.name.between('m', 'z')).\ ... where( ... or_( ... addresses.c.email_address.like('%@aol.com'), ... addresses.c.email_address.like('%@msn.com') ... ) ... ) >>> conn.execute(s).fetchall() SELECT users.fullname || ? || addresses.email_address AS title FROM users, addresses WHERE users.id = addresses.user_id AND users.name BETWEEN ? AND ? AND (addresses.email_address LIKE ? OR addresses.email_address LIKE ?) (', ', 'm', 'z', '%@aol.com', '%@msn.com') [(u'Wendy Williams, we...@aol.com',)]
>>> from sqlalchemy.sql import text >>> s = text( ... "SELECT users.fullname || ', ' || addresses.email_address AS title " ... "FROM users, addresses " ... "WHERE users.id = addresses.user_id " ... "AND users.name BETWEEN :x AND :y " ... "AND (addresses.email_address LIKE :e1 " ... "OR addresses.email_address LIKE :e2)") SQL>>> conn.execute(s, x='m', y='z', e1='%@aol.com', e2='%@msn.com').fetchall() [(u'Wendy Williams, we...@aol.com',)]
Actually, it's I who missed something – replied from a phone, and sent
the reply to Chris only instead of to the list. And that killed
further discussion, it seems.
My answer was:
> Not too hard, but getting the exact semantics right could be tricky.
> It's probably something the language/stdlib should enable, rather than
> having it in the stdlib itself.
This seems roughly in line with what Guido was saying earlier. (Am I
misrepresenting your words, Guido?)
I thought a bit about what's bothering me with this idea, and I
realized I just don't like that "quantum effect" – collapsing when
something looks at a value.
All the parts up to that point sound OK, it's the str() that seems too
magical to me.
We could require a more explicit function, not just str(), to format the string:
>>> t0=1; t1=2; n=3
>>> template = i"Peeled {n} onions in {t1-t0:.2f}s"
>>> str(template)
types.InterpolationTemplate(template="Peeled {n} onions in
{t1-t0:.2f}s", fields=(('Peeled', 0, 'n', '', ''), ...), values=(3,
1))
>>> format_template(template) # (or make it a method?)
'Peeled 3 onions in 1s'
This no longer feels "too magic" to me, and it would allow some
experimentation before (if ever) InterpolationTemplate grows a more
convenient str().
Compared to f-strings, all this is doing is exposing the intermediate
structure. (What the "i" really stands for is "internal".)
Now f-strings would be just i-strings with a default formatter applied.
And, InterpolationTemplate should only allow attribute access (i.e. it
shouldn't be structseq). That way the internal structure can be
changed later, and the "old" attributes can be synthetized on access.
Another option would be to put the default rendering in __format__,
and let __str__ fall through to __repr__. That way str(template)
wouldn't render the template, but format(template) would.
> Compared to f-strings, all this is doing is exposing the intermediate
> structure. (What the "i" really stands for is "internal".)
> Now f-strings would be just i-strings with a default formatter applied.
>
> And, InterpolationTemplate should only allow attribute access (i.e. it
> shouldn't be structseq). That way the internal structure can be
> changed later, and the "old" attributes can be synthetized on access.
Yeah, that's fair. I added the __iter__ to make some of the examples
prettier, but it probably isn't worth the loss of future flexibility.
Cheers,
Nick.
--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia
I like the idea, but *please* stop using this example. It's just
terrible. Firstly, subprocess.call defaults to shell=False, so this
wouldn't even work. Secondly, subprocess.call('echo', filename') looks
orders of magnitude cleaner. Thirdly, your i-string wouldn't even know
how to quote because it doesn't know what shell you are using.
Best,
-Nikolaus
--
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F
»Time flies like an arrow, fruit flies like a Banana.«
On Aug 24, 2015 12:39 PM, "Guido van Rossum" <gu...@python.org> wrote:
> (...), and Nick can focus on motivational examples from html/sql/shell code injection for PEP 501 (but only if he can live with the PEP 498 surface syntax for interpolation).
f('select {date} from {tablename}')
~=
['select ', UnescapedStr(date), 'from ', UnescapedStr(tablename)]
* UnescapedUntranslatedSoencodedStr
* _repr_shell
* quote or not?
* _repr_html
* charset, encoding
* _repr_sql
* WHERE x LIKE '%\%%'
>
> --
> --Guido van Rossum (python.org/~guido)
>
On Aug 24, 2015 1:21 PM, "Mike Miller" <python...@mgmiller.net> wrote:
>
> Ok thanks, I know someone out there is probably using templating to make templating templates. But, we're getting out into the wilderness here. The original use cases were shell scripts
Printf/str.format/str.__mod__/string concatenation are often
*dangerou;\n\s** in context to shell scripts (unless you're building a "para"+"meter" that will itself be quoted/escaped; or passing tuple cmds to eg subprocess.Popen);
which is why I would use pypi:sarge for Python 2.x+,3.x+ here.
Or yield a sequence of typed strings which can be contextually ANDed.
*shudder*. After years of efforts to get people not to do this, you want
to change course by 180 degrees and start telling people this is ok if
they add an additional single character in front of the string?
This sounds like very bad idea to me for many reasons:
- People will forget to type the 'e', and things will appear to work
but buggy.
- People will forget that they need the 'e' (and the same thing will
happen, further reinforcing the thought that the e is not required)
- People will be confused because other languages don't have the 'e'
(hmm. how do I do this in Perl? I guess I'll just drop the
'e'... *check*, works, great!)
- People will assume that their my_custom_system() call also
special-cases e strings and escape them (which it won't).
Best,
-Nikolaus
--
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F
»Time flies like an arrow, fruit flies like a Banana.«
Hi, here's my latest idea, riffing on other's latest this weekend.
Let's call these e-strings (for expression), as it's easier to refer to the letter of the proposals than three digit numbers.
So, an e-string looks like an f-string, though at compile-time, it is converted to an object instead (like i-string):
print(e'Hello {friend}, filename: {filename}.') # converts to ==>
print(estr('Hello {friend}, filename: {filename}.', friend=friend,
filename=filename))
An estr is a subclass of str, therefore able to do the nice things a string can do. Rendering is deferred, and it also has a raw member, escape(), and translate() methods:
class estr(str):
# init: saves self.raw, args, kwargs for later
# methods, ops render it
# def escape(self, escape_func): # handles escaping
# def translate(self, template, safe=True): # optional i18n support
class Markup():def __html__()def __html_format__()
_mime_map = dict(_repr_png_="image/png",_repr_jpeg_="image/jpeg",_repr_svg_="image/svg+xml",_repr_html_="text/html",_repr_json_="application/json",_repr_javascript_="application/javascript",)# _repr_latex_ = "text/latex"# _repr_retina_ = "image/png"
From the early part of this discussion [1], I had the impression that
the goal was that eventually string interpolation would be on by
default for all strings, with PEP 498 intended as an intermediate step
towards that goal. Is that still true, or is the plan now that
interpolated strings will always require an explicit marker (like
'f')?
I ask because if they *do* require an explicit marker, then obviously
the best thing is for the syntax to match that of .format. But, if
this will be enabled for all strings in Python 3.something, then it
seems like we should be careful now to make sure that the syntax is
clearly distinct from that used for .format ("${...}" or "\{...}" or
...), because anything else creates nasty compatibility problems for
people trying to write format template strings that work on both old
and new Pythons.
(This is also assuming that f-string interpolation and the eventual
plain-old-string interpolation will use the same syntax, but that
seems like a highly desirable property to me..)
-n
[1] http://thread.gmane.org/gmane.comp.python.ideas/34980
On 08/24/2015 02:54 PM, Paul Moore wrote:
> Agreed. In a convenience library where it's absolutely clear that a
> shell is involved (something like sarge or invoke) this is OK, but not
> in the stdlib as the "official" way to call external programs.
Don't focus on os.system(), it could be any function, and not particularly
relevant, nor do I recommend this line as the official way.
Remember Nick Coghlan's statement that the "easy way should be the right way"?
That's what this is trying to accomplish.
> - People will fail to understand the difference between e'...' and
> f'...' and will use the wrong one when using os.system, and things
> will work correctly but with security vulnerabilities.
I don't recommend e'' and f'', only e'' at this moment.
In [1]: import subprocessIn [2]: subprocess.call('echo 1\necho 2', shell=True)12Out[2]: 0In [3]: import sargeIn [4]: sarge.run('echo 1\necho 2')1 echo 2Out[4]: <sarge.Pipeline at 0x7f3e8185e790>In [5]: sarge.shell_quote??Signature: sarge.shell_quote(s)Source:def shell_quote(s):"""Quote text so that it is safe for Posix command shells.For example, "*.py" would be converted to "'*.py'". If the text isconsidered safe it is returned unquoted.:param s: The value to quote:type s: str (or unicode on 2.x):return: A safe version of the input, from the point of view of Posixcommand shells:rtype: The passed-in type"""assert isinstance(s, string_types)if not s:result = "''"elif not UNSAFE.search(s):result = selse:result = "'%s'" % s.replace("'", r"'\''")return resultFile: ~/.local/lib/python2.7/site-packages/sarge/__init__.pyType: function
I mean, it's great that the rise of languages like Python that have
easy range-checked string manipulation has knocked buffer overflows
out of the #1 spot, but... :-)
Guido is right that the nice thing about classic string interpolation
is that its use in many languages gives us tons of data about how it
works in practice. But one of the things that data tells us is that it
actually causes a lot of problems! Do we actually want to continue the
status quo, where one set of people keep designing languages features
to make it easier and easier to slap strings together, and then
another set of people spend increasing amounts of energy trying to
educate all the users about why they shouldn't actually use those
features? It wouldn't be the end of the world (that's why we call it
"the status quo" ;-)), and trying to design something new and better
is always difficult and risky, but this seems like a good moment to
think very hard about whether there's a better way.
(And possibly about whether that better way is something we could put
up on PyPI now while the 3.6 freeze is still a year out...)
How is that compatible with your statement that
> This means a billion lines of code using e-strings won't have to care
> about them, only a handful of places.
Either str(estr) performs interpolation (so billions of lines of code
don't have to change, and my custom system()-like call get's an
interpolated string as well until I change it to be estr-aware), or it
does not (and billions of lines of code will break when they
unexpectedly get an estr instead of a str).
Best,
-Nikolaus
--
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F
»Time flies like an arrow, fruit flies like a Banana.«
I'm talking about someone who has implemented a function (for whatever
reason) that behaves like os.system(). Say something like this (probably
the calls are all wrong because I didn't look them up, but I trust
everyone knows what I mean):
def nonblocking_system(cmd):
if os.fork() == 0:
os.exec('/bin/sh', '-c', cmd)
With this function, people have to be really careful about injection
vulnerabilities - just like with os.system():
os.system('rm %s' % file) # danger!
nonblocking_system('rm %s' % file) # danger!
But now you're proposing that os.system() get's support for e-strings,
which are then properly quoted. Now we have this:
os.system(e'rm {file}') # ok
nonblocking_system(e'rm {file}') # you'd think it's ok, but it's not
I think this is a terrible situation, because you can never be quite
sure where an e-string is ok (because the function is prepared for it),
and where it will act just like a string.
> The estr adds a protection (by escaping variables) that didn't exist
> in the past. It is not removing any protections or best practices.
No, but it muddles the water as to what is good and what is bad
practice. 'rm {file}' has always been bad practice, but with e-strings
e'rm {file}' may or may not be bad practice, depending what you do with
it.
Best,
-Nikolaus
--
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F
»Time flies like an arrow, fruit flies like a Banana.«
What function?
> But, are you implying that the escaping could be bypassed? Would that
> be possible?
According to you, yes. Just look at your example:
| def os_system(command): # imagine os.system, subprocess, dbapi, etc.
| if isinstance(command, estr):
| command = command.escape(shlex.quote) # each chooses its own rules
| do_something(command)
So any function that doesn't special-case estr will "bypass" the
escaping and pass it do it's version of the do_something() function
without quoting.
Best,
-Rikolaus
--
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F
»Time flies like an arrow, fruit flies like a Banana.«