Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

RFC PEP candidate: q'<delim>'quoted<delim> ?

0 views
Skip to first unread message

Bengt Richter

unread,
Mar 3, 2002, 4:50:58 AM3/3/02
to
Problem: How to put quotes around an arbitrary program text?

Obviously a program may contain quoted material using all the
defined string quoting methods (and this new method as well),
so the problem is defining delimiters that won't occur in the text.

I propose using a variation of the MIME multipart delimiter idea:

The leading delimiter for this new quoted string will be written
just like a raw string, except using 'q' in place of 'r'.
The interpretation of '\' escapes within q'xxx' will be identical to r'xxx'

Immediately following the trailing quote(s) of the q'xxx' (or q'''xxx''') string,
the quoted content starts. '\' characters will be scanned as just
another literal character, and have no escape effect in this content.
The content continues until the defined raw delimiter q-string occurs,
i.e., the xxx part of q'xxx' or q"""xxx""" etc.

Thus the whole value of the q'delim'...delim string representation is exactly
just the characters between the delimiters. E.g., you could write

assert q'<-=delim=->'content here<-=delim=-> == 'content here' # this would be true

without getting an error.

Note the lack of quotes around the final delimiter string, since it itself is
the final delimiter. This can also be used to solve the final unescaped
backslash problem for quoting windows paths:

q'|'c:\foo\bar\|

Also note nestability, assuming you guarantee unique delimiter strings
(which you could do by generating a guid if paranoid, or if you had access
to the whole string before outputting the delimiter, you could e.g. use an MD5
hash of the content in hex as delimiter):

q'
C7593104AF1DE7534F169D5EF1579BF6
'q'|'c:\foo\bar\|
C7593104AF1DE7534F169D5EF1579BF6

(Note two EOLs in the delimiter besides the MD5,
so the quoted content itself has no EOL in it here)

To make this line oriented quoting a little cleaner
(i.e., no funny ' at the beginning of the first quoted line),
one could use Q'xxx'z123 to mean q'xxxz'123
I.e., include the first raw character in the otherwise
super-quoted content as part of the delimiter, and start
the content with the next character. Below is assumed a single
character EOL.

Thus you can include an EOL at both ends of the delimiter, and

assert Q'
C7593104AF1DE7534F169D5EF1579BF6'
q'|'c:\foo\bar\|
C7593104AF1DE7534F169D5EF1579BF6
== q'~'q'|'c:\foo\bar\|~

would not fail (we can write the last, since we know the content).
of course knowing the content here, you could also wrap with

r"q'|'c:\foo\bar\|"

A null q delimiter could be defined to imply delimiting by the end of the file or
other representation container. I.e., q''<-- content up to EOF -->

Escapes are recognized according to raw string rules inside the quotes of the
q delimiter string, so the delimiter itself does have a final backslash problem,
but that shouldn't be too hard to live with, since there's no such problem for the
quoted payload.

A q string could theoretically allow putting unescaped arbitrary binary data in
a source file, though many editors would have problems dealing with it. Even so,
there might be some use for that. E.g., a binary .gif file could easily be
converted to an importable file binding a symbol to the gif data as a string.
Just prefix symbol = q'' as in

symbol = q''<binary .gif data><EOF>.

Or you could delimit on both ends of course.

BTW, this will provide a better raw data encapsulation mechanism than XML has ;-)

(XML's <![CDATA[ ... ]]> construct can't nest, because of the fixed delimiters --
though maybe they have a fix by now. I haven't looked for the most recent spec).

I don't know how hard q'xxx'...xxx would be to implement, but I would think a relatively
minor variant of raw string processing would do it (just at the usual end continue
instead of wrapping up, and look for the string you have so far as a trailing
delimiter for a new string starting with the next character).

The Q'xxx'z...xxxz variant isn't essential. Just a way to put an EOL at the z
for aesthetics.

Regards,
Bengt Richter

Roman Suzi

unread,
Mar 3, 2002, 6:56:45 AM3/3/02
to
On 3 Mar 2002, Bengt Richter wrote:

>Problem: How to put quotes around an arbitrary program text?

Have it in a separate file.

>Obviously a program may contain quoted material using all the
>defined string quoting methods (and this new method as well),
>so the problem is defining delimiters that won't occur in the text.
>
>I propose using a variation of the MIME multipart delimiter idea:
>

>Note the lack of quotes around the final delimiter string, since it itself is
>the final delimiter. This can also be used to solve the final unescaped
>backslash problem for quoting windows paths:
>
> q'|'c:\foo\bar\|

Making Python as gibberish as Perl is. And all that only to
have Windows path be written without double-\
AFAIK, latest Windows also use / for delimiting dirs.
So this is non-problem.

>Regards,
>Bengt Richter
>
>

Sincerely yours, Roman Suzi
--
_/ Russia _/ Karelia _/ Petrozavodsk _/ r...@onego.ru _/
_/ Sunday, March 03, 2002 _/ Powered by Linux RedHat 6.2 _/
_/ "Why build a wall round a cemetery when no-one wants to get in?" _/


Bengt Richter

unread,
Mar 3, 2002, 7:28:42 PM3/3/02
to
On Sun, 3 Mar 2002 14:56:45 +0300 (MSK), Roman Suzi <r...@onego.ru> wrote:

>On 3 Mar 2002, Bengt Richter wrote:
>
>>Problem: How to put quotes around an arbitrary program text?
>
>Have it in a separate file.
>

That's fine in a lot of cases, but does require reliable presence
of both files. It's not so convenient if you are trying to write
e.g. a program generator and want to write out snippets containing
mixed triple quoted doc strings and data etc. Or if you are
writing copied example snips as part of dynamic HTML from CGI.

You can say, "Well adopt a consistent methodology, so you don't
try to quote triple quotes with quotes of the same kind."

Ok. But IMO it's easier to think up a safe delimiter for yourself
and use that to quote. Yes, Perlophobes can point to
$s = <<XXX ;
"""But I'd say convenient precedents were invented for a reason."""
XXX

s= Q'XXX'
"""This would make it as easy in Python."""
XXX

s = '''"""This works a lot of the time, so use it when you prefer ;-)"""\n'''
s = r'''"""Or avoid the \n like this ;-)"""
'''

and, with a simple copy&paste, and I have the above safely in a string:

snippet = Q'===the stuff I want==='
Ok. But IMO it's easier to think up a safe delimiter for yourself
and use that to quote. Yes, Perlophobes can point to
$s = <<XXX ;
"""But I'd say convenient precedents were invented for a reason."""
XXX

s= Q'XXX'
"""This would make it as easy in Python."""
XXX

s = '''"""This works a lot of the time, so use it when you prefer ;-)"""\n'''
s = r'''"""Or avoid the \n like this ;-)"""
'''
===the stuff I want===


Sure, I can copy it and make a separate file, and write some code to read the file
into my snippet string, but would you actually prefer to do it that way? Ok, I could
write a class to systematize it. Maybe I will, when I want use file-stored snippets,
but I'd like both options ;-) And I guess I'd like to have Q' as well as q' ;-)

>>Obviously a program may contain quoted material using all the
>>defined string quoting methods (and this new method as well),
>>so the problem is defining delimiters that won't occur in the text.
>>
>>I propose using a variation of the MIME multipart delimiter idea:
>>
>>Note the lack of quotes around the final delimiter string, since it itself is
>>the final delimiter. This can also be used to solve the final unescaped

^^^^


>>backslash problem for quoting windows paths:
>>
>> q'|'c:\foo\bar\|
>
>Making Python as gibberish as Perl is. And all that only to
>have Windows path be written without double-\

Not 'only'. I said 'also' ;-) Perhaps my choice of '|' delimiter triggered
your 'gibberish as Perl' detector?

I could have written

q'###'c:\foo\bar\###
or
q'[quoting delimiter]'c:\foo\bar\[quoting delimiter]

just as well for this one.

>AFAIK, latest Windows also use / for delimiting dirs.
>So this is non-problem.

Sure, you have identified a context-dependent non-problem ;-)

Regards,
Bengt Richter

Jeff Shannon

unread,
Mar 5, 2002, 3:00:56 PM3/5/02
to

Bengt Richter wrote:

> >Making Python as gibberish as Perl is. And all that only to
> >have Windows path be written without double-\
> Not 'only'. I said 'also' ;-) Perhaps my choice of '|' delimiter triggered
> your 'gibberish as Perl' detector?
>
> I could have written
>
> q'###'c:\foo\bar\###
> or
> q'[quoting delimiter]'c:\foo\bar\[quoting delimiter]
>
> just as well for this one.

I still don't like it. It's very difficult for me to see at a glance, what's part of
the string and what is part of the delimiter, especially with the leading delimiter
being quoted and the trailing one not quoted. It looks unbalanced, it looks ungainly,
and, to me, it just plain looks ugly. I'd expect the above example to be equivalent to
"'c:\\foo\\bar\\", with a leading mismatched single-quote... Your earlier example of
cutting & pasting left me totally confused until I spent a minute sorting through it.

At least to me, this seems totally unclear and totally nonintuitive. It vastly
multiplies the possibilities for writing hard-to-read code, while providing a real
benefit in relatively few situations. Considering how often triple-quotes appear in
text files, I can only imagine this being needed when trying to programmatically
generate Python code, which doesn't seem to be a terribly common task. I'd prefer to
find an alternate solution for the specific case of that task, which doesn't require
changing the core language and creating so much potential for ugly code in every other
task that can be done with Python.

Jeff Shannon
Technician/Programmer
Credit International


Bengt Richter

unread,
Mar 7, 2002, 5:14:13 AM3/7/02
to
On Tue, 05 Mar 2002 12:00:56 -0800, Jeff Shannon <je...@ccvcorp.com> wrote:

>
>
>Bengt Richter wrote:
>
>> >Making Python as gibberish as Perl is. And all that only to
>> >have Windows path be written without double-\
>> Not 'only'. I said 'also' ;-) Perhaps my choice of '|' delimiter triggered
>> your 'gibberish as Perl' detector?
>>
>> I could have written
>>
>> q'###'c:\foo\bar\###
>> or
>> q'[quoting delimiter]'c:\foo\bar\[quoting delimiter]
>>
>> just as well for this one.
>
>I still don't like it. It's very difficult for me to see at a glance, what's part of
>the string and what is part of the delimiter, especially with the leading delimiter
>being quoted and the trailing one not quoted. It looks unbalanced, it looks ungainly,
>and, to me, it just plain looks ugly. I'd expect the above example to be equivalent to
>"'c:\\foo\\bar\\", with a leading mismatched single-quote... Your earlier example of

I understand the reaction, and I had considered defining the delimiter with the quotes
included. Whould you prefer the following?

q'###'c:\foo\bar\'###'
or

q'__quoting delimiter (incl quotes)__'c:\foo\bar\'__quoting delimiter (incl quotes)__'

But then I had to wonder whether using alternative quotes should imply the identical
usage at both ends w.r.t. the quote marks. I.e.,
q'###'c:\foo\bar\'###'


q"###"c:\foo\bar\"###"

q'''###'''c:\foo\bar\'''###'''


q"""###"""c:\foo\bar\"""###"""

All that is very cluttered and ugly. The major use of q' would probably actually be
in the Q' variation, and you could do the above pretty cleanly as:

s = Q'


###'
c:\foo\bar\
###

Note that the delimiter is '\n###\n' in the above, so there is no \n in the
quoted string. I think this would bean easy pattern to use. To quote large unknown
things, you just choose something safe in place of ###, and if you don't want to
clip off the last \n, use Q'###' with no \n in front of the ###.

Note that you could use Q'"""' in place of the leading """ in existing code, to
allow you to put the first line of quoted text on the next line, without getting
a leading \n. I.e.,

s = Q'"""'
First line.
...
Last line.
"""
# this comment immediately follows the quoting delimiter

is equivalent (assuming """ quotes ok) to

s = r"""First line
...
Last line.
"""# this comment immediately follows the quoting delimiter

(i.e., you have to account for the delimiter actually being '"""\n' -- cf. M' below)


>cutting & pasting left me totally confused until I spent a minute sorting through it.
>

A minute seems not too bad ;-) I.e., you wouldn't have to re-think it to use it as
a pattern for arbitrary quoting, I don't think. I only used the example text because
it was a heavy mix of quoting that you could not quote with triple or double quotes.
I thought q' and Q' to be pretty straight forward, once the syntax is grasped.

Can you think of a better way to quote an arbitrary sequence of characters within
a program text?

>At least to me, this seems totally unclear and totally nonintuitive. It vastly

How does it seem if you go along the steps I took?:

1. You have an arbitrary sequence of characters that are to be the value of a string.
2. The sequence may contain both ''' and """ and may even end with \ and it must be unchanged.
3. (2) Means you need a different delimiter than " or ' or """ or '''.
4. Using a string as a delimiter (like MIME or <<XYZ in Perl & shells etc) seems viable, whereas
no fixed delimeters can nest without counting and symmetry rules (which contradicts the
definition of unrestricted text), so yet another thing like XML's <![CDATA[ ... ]]> won't do.
5. Python has a way to define a string, but not a way to indicate that it should
be used as a delimiter.
6. (5) suggests using a Python string to define the delimiter string
7. (5) suggests that the string-as-delimiter needs to be distinguished from others
8. raw and unicode strings use a quote prefix to distinguish themselves from others
9. (5)+(8) suggests using an alternate prefix to define string-as-delimiter: I chose q and Q for quote
10. The actual content string must start somewhere after the delimiter string is defined
11. The obvious place for (10) is the next character after the final quote of the _delimiter_ string.
12. Using an otherwise ordinary python raw string as a delimiter, means the quotes are not included
13. (12) means the postfixed delimiter does not have quotes around it, unless you alter the delimiter
string definition rules to include them.
14. There is an ugliness in using triple quotes to quote multiple lines of text with no leading
empty line, since = """the text of the first line
doesn't line up with the text of the following lines.
"""
15. I thought of Q' to allow lining up all quoted text lines in a block by using the first character
following Q'xxx' as the last character of the delimiter (thus using it up and allowing the real
quoted text to start on the next line). Alternatively, we could replace the second ' after Q
and delimit the delimiter-string with ' on the front and \n at the end, including neither.
Perhaps that would be cleaner for ordinary multiline quotes (and change the prefix to M)? E.g.,

print M'XXX
First line.
Second line.
XXX


This would let you do the ugly windows path string even more cleanly:
s = M'


###
c:\foo\bar\
###

It's getting cleaner looking, don't you think?
(The delimiter is '\n###' above, from source "M'\n###\n<content>\n###" ).
(delim-string delimiters)--> ^ ^^

>multiplies the possibilities for writing hard-to-read code, while providing a real

I don't buy this, unless someone is being perverse in using it, or hasn't thought
of really clean examples yet ;-).

The point is to make the few places where it IS needed simple and clean. The point
of posting it for discussion is to tease out better alternatives and/or usage patterns
that really address the problem(s) without declaring it/them a/() non-problem(s).
C'mon, how is the print M'XXX example above hard to digest? ;-)

M'"""
This could be a doc-string with
no leading \n and looking as a
block the way it would print.
"""

>benefit in relatively few situations. Considering how often triple-quotes appear in
>text files, I can only imagine this being needed when trying to programmatically
>generate Python code, which doesn't seem to be a terribly common task. I'd prefer to
>find an alternate solution for the specific case of that task, which doesn't require
>changing the core language and creating so much potential for ugly code in every other

I don't see where an added capability, which does not alter interpretation of existing
code, has "potential for ugly code" -- unless someone is abusing the capability.

>task that can be done with Python.
>

How would you handle the problem as stated (cf. 1-15 above)?

I appreciate the comments. I think they have led me to better examples, and possible
variations on the original idea. If someone has better variations or alternative
better solutions, I'd like to hear them. Thank you.

Regards,
Bengt Richter

Chris Gonnerman

unread,
Mar 7, 2002, 9:09:52 AM3/7/02
to
Ugly. You state that code like this:

def somefunc():
astring = """this is the first line
this is the second"""

... is ugly. Fine, but IMHO so is the proposed syntax.

Recall always that the charm of Python is it's simplicity. Even with the
new language features added in 2.1 and 2.2, a "bear of little brain" can
remember all the rules more or less at the same time.

I am against pasting and patching one after another new language construct,
syntax sweetener, etc. to Python because it detracts from that core
simplicity
each and every time.

This is why I am against incorporating the "$varname" interpolation
operator.
At least, IMHO it is not ugly.

There seem to be a never-ending stream of new language feature proposals.
Praise Guido they don't all get incorporated!

(Don't take this personally. I've had my share of similar ideas; we all
think we are like Guido and can improve the language. Learning to trust his
language-design skills was hard, but rewarding.)

Terry Reedy

unread,
Mar 7, 2002, 1:46:03 PM3/7/02
to

"Bengt Richter" <bo...@oz.net> wrote in message
news:a67ehl$uhi$0...@216.39.172.122...

> How does it seem if you go along the steps I took?:
>
> 1. You have an arbitrary sequence of characters that are to be
> the value of a string.
> 2. The sequence may contain both ''' and """ and may even end
> with \ and it must be unchanged.
> 3. (2) Means you need a different delimiter than " or ' or """ or
'''.

Since (3) does not follow from (2), it seems like this proposal is
unnecessary. Strings read from a file (this includes raw_input() from
users) already meet desiderata 1 and 2.

>>> s=raw_input('Hi: ')
Hi: Python delimiters include ', ", ''', """, and \
>>> print s
Python delimiters include ', ", ''', """, and \

In the extremely rare case in which one really *needs* all 4 string
delimiters within a single string and the string *must* get its value
from a programmer-writter string literal rather than a file, one can
use octal escapes (or \\ for \).

>>> print """Python's string delimiters are ', ", ''', and
\042\042\042."""
Python's string delimiters are ', ", ''', and """.
>>> print '''Python's string delimiters are ', ", \047\047\047, and
""".'''
Python's string delimiters are ', ", ''', and """.

Terry J. Reedy

Bengt Richter

unread,
Mar 7, 2002, 7:09:45 PM3/7/02
to
On Thu, 7 Mar 2002 08:09:52 -0600, "Chris Gonnerman" <chris.g...@newcenturycomputers.net> wrote:

>Ugly. You state that code like this:
>
>def somefunc():
> astring = """this is the first line
>this is the second"""
>
>... is ugly. Fine, but IMHO so is the proposed syntax.
>
>Recall always that the charm of Python is it's simplicity. Even with the
>new language features added in 2.1 and 2.2, a "bear of little brain" can
>remember all the rules more or less at the same time.
>

Well, I think the charm is that one may use a very lean and clean subset of
the full set of features, but that need not imply that the full set must be limited.

Python already has plenty to boggle a "bear of little brain" in meta-classes
etc., so I don't think a feature that can safely be ignored should be criticized
on the basis of its taxing little bears' brains ;-)

>I am against pasting and patching one after another new language construct,
>syntax sweetener, etc. to Python because it detracts from that core
>simplicity
>each and every time.
>

I really agree. But I think the ability to __quote__ arbitrary raw text in source
is a __missing capability__. It's not something you can already do, that I'm just
proposing a sugar coating for. NB: I'm not saying you can't __define__ arbitrary string
content by writing suitable escape sequences etc. That's not the problem. The problem is
safely delimiting arbitrary __existing__ text __within__ a source file, without changing
it and without having to inspect it or make iffy assumptions about its content.

I think, e.g., that "for i in 5: ..." is *more* sugary, because you already have the
ability to do it with "for i in xrange(5): ..."

>This is why I am against incorporating the "$varname" interpolation
>operator.
>At least, IMHO it is not ugly.
>

But things like this you can often build using classes and subclassing. I would rather
extend the building capability (as has been done greatly with 2.2) than add pre-built gizmos.

>There seem to be a never-ending stream of new language feature proposals.
>Praise Guido they don't all get incorporated!
>

Agreed.


>(Don't take this personally. I've had my share of similar ideas; we all
>think we are like Guido and can improve the language. Learning to trust his
>language-design skills was hard, but rewarding.)

Well, once this has boiled down to a clear proposal, I'd like to hear his pronouncement.
Thus far, I don't seem to have succeeded in communicating the fact that Python can't
do now what it could using some implementation of this idea. It's not a new idea. It exists
in other contexts, so it's not a personal pet invention. It's a matter of whether Guido
will think the base functionality is desirable. If he does, he'll likely think of better
syntax for it. I would guess he invented triple quotes as an easy and convenient way to
deal with quoting a mix of single quotes. This just takes the next step, recognizing
that another fixed sequence (or two) of characters as delimiters will not do it.

Regards,
Bengt Richter

Bengt Richter

unread,
Mar 7, 2002, 7:36:39 PM3/7/02
to
On Thu, 07 Mar 2002 18:46:03 GMT, "Terry Reedy" <tej...@yahoo.com> wrote:

>
>"Bengt Richter" <bo...@oz.net> wrote in message
>news:a67ehl$uhi$0...@216.39.172.122...
>> How does it seem if you go along the steps I took?:
>>
>> 1. You have an arbitrary sequence of characters that are to be
>> the value of a string.
>> 2. The sequence may contain both ''' and """ and may even end
>> with \ and it must be unchanged.
>> 3. (2) Means you need a different delimiter than " or ' or """ or
>'''.
>
>Since (3) does not follow from (2), it seems like this proposal is
>unnecessary. Strings read from a file (this includes raw_input() from
>users) already meet desiderata 1 and 2.
>

Sorry, I wasn't making clear an unspoken condition I really had in mind ;-)
I think (3) does follow (2) using the added explicit condition that the
text quotation is to be part of the source text of a program or module,
and the quoted text must not need to be changed or restricted to do it.

>>>> s=raw_input('Hi: ')
>Hi: Python delimiters include ', ", ''', """, and \
>>>> print s
>Python delimiters include ', ", ''', """, and \
>

This violates my unspoken condition, but certainly i/o can get you whatever.

> In the extremely rare case in which one really *needs* all 4 string
>delimiters within a single string and the string *must* get its value
>from a programmer-writter string literal rather than a file, one can
>use octal escapes (or \\ for \).
>

This also violates the unspoken condition ;-)
The problem to solve is not constructing an arbitrary-valued string
by writing an escaped sequence adhering to current python string syntax.
That can be done, as you show.

The problem is to do it by using an existing source of text, without
changing the text, and without using an external file to delimit it.

The simplest practical example is wanting to use a paste operation
to insert arbitrary text into a program source without having to inspect
the text or modify it, yet be able to use it to define a string
with the exact raw text value.

Show me how to do that, and I'll be convinced there is no need for
string-delimited quotation.

>>>> print """Python's string delimiters are ', ", ''', and
>\042\042\042."""
>Python's string delimiters are ', ", ''', and """.
>>>> print '''Python's string delimiters are ', ", \047\047\047, and
>""".'''
>Python's string delimiters are ', ", ''', and """.
>

Sorry for not being clear. I guess that's what this discussion process
is for. It's clarifying the issues for me too ;-)

Regards,
Bengt Richter

Greg Ewing

unread,
Mar 7, 2002, 9:54:29 PM3/7/02
to
Bengt Richter wrote:
>
> The problem is to do it by using an existing source of text, without
> changing the text, and without using an external file to delimit it.

Even something like q'delim' doesn't allow you to
easily use completely arbitrary text, because you
still have to pick *some* string that doesn't occur
in the text. Although you don't have to modify the
text, you do have to inspect it in order to choose
a suitable delimiter.

I have another idea that doesn't suffer from that
problem.

def string my_string:
This is a free-form string constant. Its value consists
of all the text at this indentation level, verbatim,
with the indentation stripped off. It can contain
', ", ''', """, \ or any other characters.

--
Greg Ewing, Computer Science Dept, University of Canterbury,
Christchurch, New Zealand
To get my email address, please visit my web page:
http://www.cosc.canterbury.ac.nz/~greg

Terry Reedy

unread,
Mar 7, 2002, 11:51:39 PM3/7/02
to

"Bengt Richter" <bo...@oz.net> wrote in message
news:a6912n$jnt$0...@216.39.172.122...

> Sorry, I wasn't making clear an unspoken condition I really had in
mind ;-)
> I think (3) does follow (2) using the added explicit condition that
the
> text quotation is to be part of the source text of a program or
module,
> and the quoted text must not need to be changed or restricted to do
it.

With the added conditions, the logic works much better.

...


> The simplest practical example is wanting to use a paste operation
> to insert arbitrary text into a program source without having to
inspect
> the text or modify it, yet be able to use it to define a string
> with the exact raw text value.

As Greg Ewing noted, you still have to scan the text with your method,
so why not scan for triple quotes - which are *extremely* rare in any
text except Python code and *quite* easy to spot (speaking for
myself). One can always use the interpreter to check also. Write
s='''\
<paste text here>
'''
and run or paste into interpreter. If there is a ''' that one missed,
there will almost certainly be a SyntaxError reported, with the line
number. One could also follow the definition of s with 'print
s[:-100]' to see if entire quotation got included in assignment.

For myself, I can hardly imagine wanting to incorparate gobs of
someone else's text into my code. If I did, I would want to look at
it and/or isolate it in a separate module where it could be tested as
above. I would also feel free to modify it with escape codes if
necessary.

Terry J. Reedy

Joshua Macy

unread,
Mar 7, 2002, 11:57:10 PM3/7/02
to
Greg Ewing wrote:

> I have another idea that doesn't suffer from that
> problem.
>
> def string my_string:
> This is a free-form string constant. Its value consists
> of all the text at this indentation level, verbatim,
> with the indentation stripped off. It can contain
> ', ", ''', """, \ or any other characters.
>
>


But then you have to indent everything in the text to that level, which
isn't quite the same as cut-and-paste arbitrary text any more. Granted,
it's not hard to do with a good editor, but neither is escaping a
whatever triple quotes you might find in the text (even Notepad can do
that).

For that matter, the PEPs to make Python more unforgiving of what's
contained in strings (only valid declared encodings or binary as octal
escape sequences) seem to be advancing rapidly, so schemes for stuffing
more arbitrary and unexamined junk into the source code seem to be
heading in the wrong direction.

I guess I don't really see the motivation for the proposed feature. If
triple quoting isn't sufficient, I'd take that as a sign that the
material should be a separate data file...it's not like Python makes it
hard to read an external file into a string.

Joshua

Bengt Richter

unread,
Mar 8, 2002, 1:28:37 AM3/8/02
to
On Fri, 08 Mar 2002 15:54:29 +1300, Greg Ewing <gr...@cosc.canterbury.ac.nz> wrote:

>Bengt Richter wrote:
>>
>> The problem is to do it by using an existing source of text, without
>> changing the text, and without using an external file to delimit it.
>
>Even something like q'delim' doesn't allow you to
>easily use completely arbitrary text, because you
>still have to pick *some* string that doesn't occur
>in the text. Although you don't have to modify the
>text, you do have to inspect it in order to choose
>a suitable delimiter.

Unless you choose one with vanishingly small probability
of being included, like a generated guid. A smart editor
could do this for you. You could also generate a unique
hash that couldn't (conjecture ;-) be part of the hashed
string -- but that would be a form of scanning, and you'd
have to look ahead through the entire text before
you wrote the initial delimiter.

>
>I have another idea that doesn't suffer from that
>problem.
>
> def string my_string:
> This is a free-form string constant. Its value consists
> of all the text at this indentation level, verbatim,
> with the indentation stripped off. It can contain
> ', ", ''', """, \ or any other characters.
>

I like it, though it doesn't meet the no-changes-to-the-text requirement.

I particularly like the indent-stripping for doc-strings. Since it's
a minor variation on string parsing, I think I would tend to want to
use a variant-id prefix character instead of a special def. E.g., using your strings:

def foo():
d"


This is a free-form string constant. Its value consists
of all the text at this indentation level, verbatim,
with the indentation stripped off. It can contain
', ", ''', """, \ or any other characters.

Doc-string indentation would look nicer in code sources :)
pass

Regards,
Bengt Richter

Terry Reedy

unread,
Mar 8, 2002, 9:45:35 AM3/8/02
to

"Bengt Richter" <bo...@oz.net> wrote in message
news:a69lml$b2v$0...@216.39.172.122...

> On Fri, 08 Mar 2002 15:54:29 +1300, Greg Ewing
<gr...@cosc.canterbury.ac.nz> wrote:
> >Even something like q'delim' doesn't allow you to
> >easily use completely arbitrary text, because you
> >still have to pick *some* string that doesn't occur
> >in the text. Although you don't have to modify the
> >text, you do have to inspect it in order to choose
> >a suitable delimiter.

> Unless you choose one with vanishingly small probability
> of being included, like a generated guid.

This was Guido's intention when he choose triple quotes and I think he
did excellently well. I don't believe I had even seen """ and quite
possibly not ''' ever before learning Python. I do know that I found
""" to be quite jarring, as if I have never seen it before.

> A smart editor could do this for you.

A smart editor could also scan for triple quotes in pasted text and
select one that was not found as the enclosing quotes and do the octal
quoting fixup if both were.

Terry J. Reedy

Roman Suzi

unread,
Mar 8, 2002, 1:13:48 PM3/8/02
to
On 4 Mar 2002, Bengt Richter wrote:

>On Sun, 3 Mar 2002 14:56:45 +0300 (MSK), Roman Suzi <r...@onego.ru> wrote:
>
>>On 3 Mar 2002, Bengt Richter wrote:
>>
>>>Problem: How to put quotes around an arbitrary program text?
>>
>>Have it in a separate file.
>>
>That's fine in a lot of cases, but does require reliable presence
>of both files. It's not so convenient if you are trying to write
>e.g. a program generator and want to write out snippets containing
>mixed triple quoted doc strings and data etc. Or if you are
>writing copied example snips as part of dynamic HTML from CGI.

>===the stuff I want===


>
>
>Sure, I can copy it and make a separate file, and write some code to
>read the file into my snippet string, but would you actually prefer to
>do it that way? Ok, I could write a class to systematize it. Maybe I
>will, when I want use file-stored snippets, but I'd like both options
>;-) And I guess I'd like to have Q' as well as q' ;-)

Well, maybe my first impressions were wrong. After all, the feature is
convenient. But probably you will need to coordinate it well
with encoding things. What if "dochere"'s encoding need to be
different from main program one? Are binary data allowed?
Otherwise the feature will become grammatical disaster...

>>Making Python as gibberish as Perl is. And all that only to
>>have Windows path be written without double-\
>Not 'only'. I said 'also' ;-) Perhaps my choice of '|' delimiter triggered
>your 'gibberish as Perl' detector?

;-) Maybe. I wonder why Perl novices do not know about "dochere"
capabilities of Perl.

Sincerely yours, Roman Suzi
--
_/ Russia _/ Karelia _/ Petrozavodsk _/ r...@onego.ru _/

_/ Friday, March 08, 2002 _/ Powered by Linux RedHat 6.2 _/
_/ "People are always available for work in the past tense." _/


Bengt Richter

unread,
Mar 8, 2002, 4:39:56 PM3/8/02
to
On Fri, 08 Mar 2002 14:45:35 GMT, "Terry Reedy" <tej...@yahoo.com> wrote:

>
>"Bengt Richter" <bo...@oz.net> wrote in message
>news:a69lml$b2v$0...@216.39.172.122...
>> On Fri, 08 Mar 2002 15:54:29 +1300, Greg Ewing
><gr...@cosc.canterbury.ac.nz> wrote:
>> >Even something like q'delim' doesn't allow you to
>> >easily use completely arbitrary text, because you
>> >still have to pick *some* string that doesn't occur
>> >in the text. Although you don't have to modify the
>> >text, you do have to inspect it in order to choose
>> >a suitable delimiter.
>
>> Unless you choose one with vanishingly small probability
>> of being included, like a generated guid.
>
>This was Guido's intention when he choose triple quotes and I think he
>did excellently well. I don't believe I had even seen """ and quite
>possibly not ''' ever before learning Python. I do know that I found
>""" to be quite jarring, as if I have never seen it before.
>

Of course, now that Python sources abound, the probability of
encountering triple quotes is no longer vanishingly small, so
what will Guido do next, if he finds a motivation to do at the next
level what he did with triple quotes?

>> A smart editor could do this for you.
>
>A smart editor could also scan for triple quotes in pasted text and
>select one that was not found as the enclosing quotes and do the octal
>quoting fixup if both were.
>

True, but on second thought, actually more likely you'd use a separate
utility to generate a guid, since it has to go look for nics etc, and
just paste it in the editor. VC++ comes with such a utility.

My thought of pasting arbitrary binary octets in a quoted context actually
potentially involves multiple encodings, though: The encoding of the source,
the encoding used by the editor capturing and displaying the source, the
encoding used by the editor for the python source, the encoding for its
display, the encoding used for python internal representation, and the
encoding used for python interactive display, to name a few. (I'd guess MvL
has thought more deeply on that than I ;-)

The thing that occurs to me is that pasting into a raw-string context might
involve a contradiction, even with current Python r'strings':

Suppose what you pasted contained a single ^G ('\x07'), what would happen
when it came time to render it on the screen? You couldn't do the normal escapes (e.g. \x07),
because in the context of r'...' that would be four characters, not one.
So how would '\x07' (single character) be represented if it were part of pasted text?

I guess I ought to try eval("r'\x07'") and see what happens ;-)

>>> eval("r'\x07'")
'\x07'
>>> list(eval("r'\x07'"))
['\x07']
>>> list("r'\x07'")
['r', "'", '\x07', "'"]
>>> list(r'\x07')
['\\', 'x', '0', '7']

Is that correct? Shouldn't the expression r'\x07' return 4 characters
as it does if you list them with list(r'\x07')?

I.e., shouldn't list(eval("r'\x07'")) return the same as list(r'\x07'),
and shouldn't eval("r'\x07'") raise an illegal-representation (bad
raw-string syntax) exception?

Would someone explain this:

>>> eval("list(r'\x07')")
['\x07']
>>> list(r'\x07')
['\\', 'x', '0', '7']

(Python 2.2 (#28, Dec 21 2001, 12:21:22) [MSC 32 bit (Intel)] on win32)

also:
Python 1.5.2 (#1, May 28 2000, 18:04:10) [GCC egcs-2.91.66 19990314/Linux (egcs
- on linux2
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> eval("list(r'\x07')")
['\007']
>>> list(r'\x07')
['\\', 'x', '0', '7']

Pretty consistent. What's the difference between direct interactive eval and programmed eval?
(I guess we have a new subject ;-)

Regards,
Bengt Richter


Bengt Richter

unread,
Mar 8, 2002, 6:29:03 PM3/8/02
to
On Fri, 08 Mar 2002 04:57:10 GMT, Joshua Macy <l0819m0v...@sneakemail.com> wrote:

>Greg Ewing wrote:
>
>> I have another idea that doesn't suffer from that
>> problem.
>>
>> def string my_string:
>> This is a free-form string constant. Its value consists
>> of all the text at this indentation level, verbatim,
>> with the indentation stripped off. It can contain
>> ', ", ''', """, \ or any other characters.
>>
>>
>
>
>But then you have to indent everything in the text to that level, which
>isn't quite the same as cut-and-paste arbitrary text any more. Granted,
>it's not hard to do with a good editor, but neither is escaping a
>whatever triple quotes you might find in the text (even Notepad can do
>that).
>
>For that matter, the PEPs to make Python more unforgiving of what's
>contained in strings (only valid declared encodings or binary as octal
>escape sequences) seem to be advancing rapidly, so schemes for stuffing
>more arbitrary and unexamined junk into the source code seem to be
>heading in the wrong direction.
>

This is an interesting aspect. My example of pasting text into a current source
really should take into account the encoding of both the text to be pasted and the
destination context. I.e., ISTM a conversion to whatever the current internal representation
of the destination was should be done (which might raise an exception), followed by
conversion for display purposes (which might also raise an exception in the context
of a raw string, where unprintable characters can't be used! See my other post for
some weirdness re current r'strings').


>I guess I don't really see the motivation for the proposed feature. If
>triple quoting isn't sufficient, I'd take that as a sign that the
>material should be a separate data file...it's not like Python makes it
>hard to read an external file into a string.
>

No, but they are hard to see when you are reading code that just does i/o,
and if you have dozens of snippets, that would be a lot of files. Of course,
you could put them in a multipart MIME formatted file with, ahem, delimiters
between the pieces ;-)

Bengt Richter

unread,
Mar 8, 2002, 6:37:37 PM3/8/02
to

I'd prefer to have a quick look and choose an easy delimiter and be done,
instead of all that testing to see if there were tricky \''' sequences
and whatnot, and then fussing with the content, which diff could then
no longer see as quivalent to the original.

Regards,
Bengt Richter

Gustavo Cordova

unread,
Mar 8, 2002, 6:45:46 PM3/8/02
to
I like the idea. If only as a method to keep all the
data pertaining to a program --images, even-- in a
single file. I hate having to lug around lots of
icons and images just for a single program.

icon = q'###END#OF#ICON###'
aklkflj as.asd.a.s
... lotsa data ...
lkjasd flaskdflakals ak
###END#OF#ICON###

But then, I'm weird that way.

Kinda like a CDATA section in a program. I like it.

But I'm not gonna die without it.

AND before someone else utters "save everything into
a BSD db file and distribute that", don't; it's a
dumb idea.

I'd rather distribute an XML file with lotsa items and
CDATA sections for binary or quoted data.

Hmmm... that doesn't seem like such a bad idea.
Except having to lug around an XML parser, and
all the whatwithyou that it takes. :-(

Hmmm... miniXML is for that...

Please disregard everything I said.

tgif :-)

-gus

Bengt Richter

unread,
Mar 8, 2002, 8:36:37 PM3/8/02
to
On Fri, 8 Mar 2002 21:13:48 +0300 (MSK), Roman Suzi <r...@onego.ru> wrote:

>On 4 Mar 2002, Bengt Richter wrote:
>
>>On Sun, 3 Mar 2002 14:56:45 +0300 (MSK), Roman Suzi <r...@onego.ru> wrote:
>>
>>>On 3 Mar 2002, Bengt Richter wrote:
>>>
>>>>Problem: How to put quotes around an arbitrary program text?
>>>
>>>Have it in a separate file.
>>>
>>That's fine in a lot of cases, but does require reliable presence
>>of both files. It's not so convenient if you are trying to write
>>e.g. a program generator and want to write out snippets containing
>>mixed triple quoted doc strings and data etc. Or if you are
>>writing copied example snips as part of dynamic HTML from CGI.
>
>>===the stuff I want===
>>
>>
>>Sure, I can copy it and make a separate file, and write some code to
>>read the file into my snippet string, but would you actually prefer to
>>do it that way? Ok, I could write a class to systematize it. Maybe I
>>will, when I want use file-stored snippets, but I'd like both options
>>;-) And I guess I'd like to have Q' as well as q' ;-)
>
>Well, maybe my first impressions were wrong. After all, the feature is
>convenient. But probably you will need to coordinate it well
>with encoding things. What if "dochere"'s encoding need to be
>different from main program one? Are binary data allowed?
>Otherwise the feature will become grammatical disaster...
>

Thanks for the first glimmer of positive feedback ;-)
But you put your finger on an interesting aspect, which is important
whether this particular quoting mechanism exists or not. Cf. my other post
in this thread re eval("r'\x07'") etc.

It's funny, but "raw" strings are less likely to represent binary data
than ordinary strings. You can't re-render a raw string containing binary data
as a raw string not containing binary data, whereas you can with an ordinary
string, since escapes are available. The normal "raw" string is actually
usually representing a source string, without interpreting escapes, so it
itself can't represent control characters within the normal source alphabet,
it can only represent representations of control/unprintable characters.
So the "raw" name is misleading in a way.

If you pasted binary (i.e., encoded as uninterpreted octets) anywhere into a
Python source encoded as Latin-1, presumably the octets would go 1:1 into
the source, but when you saw them on the screen, they would appear according
to the screen font, yet when re-rendered, would appear escaped (inside strings
otherwise they would be syntax errors (except maybe comments?)), as in:

>>> '^G','\x07',r'^G'
('\x07', '\x07', '\x07')

... where I typed Ctrl-G binary data into the source where the screen rendered ^G.

Presumably, if the source encoding were UTF-8, pasting octets would change
them to UTF-8. However, interpreting the UTF-8 source representation of
a an octet-string (o'...' ?) would generate the original binary octet
sequence as the value of the internal data representation at run time. I think ;-)

>>>Making Python as gibberish as Perl is. And all that only to
>>>have Windows path be written without double-\
>>Not 'only'. I said 'also' ;-) Perhaps my choice of '|' delimiter triggered
>>your 'gibberish as Perl' detector?
>
>;-) Maybe. I wonder why Perl novices do not know about "dochere"
>capabilities of Perl.
>

I don't know. It's not that prominent in the camel book, but I found it ;-)

Regards,
Bengt Richter

Greg Ewing

unread,
Mar 10, 2002, 10:41:56 PM3/10/02
to
Joshua Macy wrote:
>
> But then you have to indent everything in the text to that level, which
> isn't quite the same as cut-and-paste arbitrary text any more.

True, but to me, that's a feature. It annoys me
that triple-quoted strings can't be made to fit
in with the indentation structure of the rest of
the program.

> If
> triple quoting isn't sufficient, I'd take that as a sign that the
> material should be a separate data file...

One problem with that is the hassle of making sure
that the separate file remains with the Python
code and is accessible from it. Maybe there should
be a function which will search for an arbitrary
file using the same rules as are used for finding
modules during import.

Greg Ewing

unread,
Mar 10, 2002, 10:31:45 PM3/10/02
to
Bengt Richter wrote:
>
> Would someone explain this:
>
> >>> eval("list(r'\x07')")
> ['\x07']
> >>> list(r'\x07')
> ['\\', 'x', '0', '7']

The string literal you're passing to eval is
non-raw, so the \x07 gets interpreted before
eval even sees it. Try this instead:

>>> eval(r"list(r'\x07')")

Bengt Richter

unread,
Mar 10, 2002, 10:58:38 PM3/10/02
to
On Mon, 11 Mar 2002 16:31:45 +1300, Greg Ewing <gr...@cosc.canterbury.ac.nz> wrote:

>Bengt Richter wrote:
>>
>> Would someone explain this:
>>
>> >>> eval("list(r'\x07')")
>> ['\x07']
>> >>> list(r'\x07')
>> ['\\', 'x', '0', '7']
>
>The string literal you're passing to eval is
>non-raw, so the \x07 gets interpreted before
>eval even sees it. Try this instead:
>
> >>> eval(r"list(r'\x07')")
>

Aaugh, touché & d'oh. That's embarrassing ;-/

Regards,
Bengt Richter


Bengt Richter

unread,
Mar 11, 2002, 1:43:42 AM3/11/02
to

Especially since the original motivation was:
+---


|So how would '\x07' (single character) be represented if it were part of pasted text?
|
|I guess I ought to try eval("r'\x07'") and see what happens ;-)

+---

I.e., I was _intending_ to get \x07 interpreted, to simulate pasting a single control
character into a raw string. In another post I just did it directly:

+---


| >>> '^G','\x07',r'^G'
| ('\x07', '\x07', '\x07')
|
|... where I typed Ctrl-G binary data into the source where the screen rendered ^G.

+---

I just fooled myself looking at all those list results ;-/

Regards,
Bengt Richter

Jeff Shannon

unread,
Mar 12, 2002, 1:11:19 PM3/12/02
to

Bengt Richter wrote:

> The problem to solve is not constructing an arbitrary-valued string
> by writing an escaped sequence adhering to current python string syntax.
> That can be done, as you show.
>
> The problem is to do it by using an existing source of text, without
> changing the text, and without using an external file to delimit it.

My take on all of this is:

1) Only when dealing with Python source code or raw binary data is it likely
that triple quotes will be found. Raw binary data is better expressed entirely
in quoted hex or octal form, anyhow, to make it clear that it *is* binary, so
this change is really only beneficial when quoting Python code.

2) Only when cutting and pasting large sections of Python code (small sections
are unlikely to have multiline quotes) to be used as string literals in other
Python code, does this quoting problem arise. One can read from a file, import
another module, read from a database, etc, etc, without having much worry here;
cutting and pasting seems the only case where quotation issues come into play.

3) The need to use arbitrary Python code as a string literal is... well...
rather abstruse. It is not something that comes up often, and (ISTM) is
confined to relatively specialized tasks. (I can't think of anything other than
that code generator you mentioned, and even there, I wouldn't think that
*arbitrary* code would be desirable...)

4) Python code that uses both ''' and """ within reasonably-sized stretches of
code is pretty rare.

5) Because of (4), most cases of cutting and pasting Python code into string
literals can be handled simply by choosing the style of triple quotes that the
source doesn't use. To be honest, your example of quoting a segment of code
that includes a quoted segment of code seems rather artificial and contrived; I
find it hard to imagine a case where this would be desirable, and (ISTM) odds
are that any such case would likely be served as well (or better) by some other
technique (such as the aforementioned data files, etc.)

6) In most cases where it *is* desirable to quote large sections of Python
code, you'd want to be reviewing and modifying and tidying the code *anyhow*.
During such tidying, it would not be difficult to normalize the quoted text to
use all the same type of triple quotes, or (if they're nested) change one type
to be represented as escaped hex. (Note again, as per (4), nested triple quotes
seem to be rare.)

7) As a result of all of this, there seem to be vanishingly few cases in which
quoting reasonably sized sections of text to use as string literals, can't be
accomplished through the use of standard triple quotes. Only when triple quotes
are nested within the source text, or when it's impractical to see *what* you're
quoting (and why are you quoting it if you don't know what it is?), would the
standard triple quotes be insufficient.

8) If we *had* this arbitrary-delimited quote, then many people would feel that
they should be used everywhere, even though triple quotes would often do the job
just as well (and far more clearly). For example, note how many people want to
use eval() and exec to accomplish things that can be done far more simply,
cleanly, and safely by using a standard module. I'm sure that the same would
happen with arbitrary delimiters. Not only that, but I can well imagine people
choosing "clever" delimiters that, while being unique enough to be parseable,
don't stand out visibly enough to make it easily apparent where the particular
quoted string ends. (As per your previous example, it took me over a minute to
figure out where those quotes ended; with triple quotes, it takes me about a
second. That's almost two orders of magnitude difference -- to me, that's
significant confusion.)

In short, I see the need for arbitrarily-delimited string literals to be a very
rare, and very specialized thing, and providing special core syntax to deal with
this rare special case is both unnecessary and likely to create lots of
opportunity for confusion. And (ISTM) there is only one of your conditions
which makes current Python syntax unsuitable for the usage you're envisioning --
the desire to not have to actually examine what you're cutting and pasting.
(Note that, if you're doing this a lot, it shouldn't be too hard to have a smart
editor examine pasted text and create a suitable string literal that can be
normally triple-quoted.) Python is intended to be simple and clean; this
proposal is (IMHO) neither.

Neil Hodgson

unread,
Mar 12, 2002, 3:56:34 PM3/12/02
to
In my experience writing a lexer for Perl (for use in syntax styling
editors), 'here documents' and arbitrary quote sequences add much complexity
to a lexer. The full generality of stacked here documents is a real pain to
deal with:

print <<"foo", <<"foo"; # you can stack them
I said foo.
foo
I said bar.
foo

Neil


Jason Orendorff

unread,
Mar 12, 2002, 4:43:34 PM3/12/02
to
Neil Hodgson wrote:
> In my experience writing a lexer for Perl (for use in syntax styling
> editors), 'here documents' and arbitrary quote sequences add much
> complexity to a lexer. [...]

(skeptical look) Are you really claiming to have written a lexer
for Perl? Is that even possible?

## Jason Orendorff http://www.jorendorff.com/

phil hunt

unread,
Mar 12, 2002, 5:18:09 PM3/12/02
to
On 3 Mar 2002 09:50:58 GMT, Bengt Richter <bo...@oz.net> wrote:
>Problem: How to put quotes around an arbitrary program text?

I assume you mean here a *Python* program.

> assert q'<-=delim=->'content here<-=delim=-> == 'content here' # this would be true
>
>without getting an error.

Looks a bit complex. What if you have a quoted string inside a
quoted string, e.g, you want to quote this:

list = [1, q'|'string|, 3]

if you just choose the same dewlimiter for the next time, it fails:

q'|'list = [1, q'|'string|, 3]|

as this would parse as the string "list = [1, q'" followed by a
syntax error. Of course, you could choose another delimeter, but
it'd be nice if this could be done automatically by a simple-minded
program.

Lisp is famous for being able to quote program text. The way it
manages it is by using (...); this is possible because any brackets
inside must match. Let's try this for Python, using {} instead of
(), to prevent it from looking like a function call:

list = [1, q{string}, 3]

and:

q{list = [1, q{string}, 3]}

This works. The reason it works is that the outer '{' only tries to
connect to the matching '}'. Inner braces are matched with
themselves; braces inside normal strings or comments are ignored.

>Note the lack of quotes around the final delimiter string, since it itself is
>the final delimiter. This can also be used to solve the final unescaped

>backslash problem for quoting windows paths:
>
> q'|'c:\foo\bar\|

"""c:\foo\bar\""" and r"c:\foo\bar\" work just as well.

The best solution, of course, is to not use Windows.


--
<"><"><"> Philip Hunt <ph...@comuno.freeserve.co.uk> <"><"><">
"I would guess that he really believes whatever is politically
advantageous for him to believe."
-- Alison Brooks, referring to Michael
Portillo, on soc.history.what-if

Neil Hodgson

unread,
Mar 12, 2002, 5:33:53 PM3/12/02
to
Jason Orendorff:

> Neil Hodgson wrote:
> > In my experience writing a lexer for Perl (for use in syntax styling
> > editors), 'here documents' and arbitrary quote sequences add much
> > complexity to a lexer. [...]
>
> (skeptical look) Are you really claiming to have written a lexer
> for Perl? Is that even possible?

No. I started writing a lexer for Perl, but it is incomplete and is
likely to remain incomplete. The subset of Perl handled is sufficient for
syntax styling but is not sufficient for a language interpreter. The code is
available as part of Scintilla and is used in various IDEs. At one stage it
was used in Komodo, but ActiveState may have changed to using the Perl
interpreter's lexer. I wrote the first versions and then other contributors
extended the lexer into some of the more difficult pieces of Perl syntax.

The implementation of lexing and parsing in the Perl interpreter is a bit
messy, using parse level information to guide lexical analysis.

Neil

Clark C . Evans

unread,
Mar 12, 2002, 6:24:40 PM3/12/02
to
On Sun, Mar 03, 2002 at 09:50:58AM +0000, Bengt Richter wrote:
| Problem: How to put quotes around an arbitrary program text?

I would suggest that this is done via indentation, like
YAML. With indentation you don't need to have an end
delimiter.

Clark

Bengt Richter

unread,
Mar 12, 2002, 8:13:13 PM3/12/02
to
On Tue, 12 Mar 2002 22:18:09 +0000, ph...@comuno.freeserve.co.uk (phil hunt) wrote:

>On 3 Mar 2002 09:50:58 GMT, Bengt Richter <bo...@oz.net> wrote:
>>Problem: How to put quotes around an arbitrary program text?
>
>I assume you mean here a *Python* program.

As the destination context, yes, but for the quoted matter, not necessarily.

>
>> assert q'<-=delim=->'content here<-=delim=-> == 'content here' # this would be true
>>
>>without getting an error.
>
>Looks a bit complex. What if you have a quoted string inside a
>quoted string, e.g, you want to quote this:
>
> list = [1, q'|'string|, 3]
>
>if you just choose the same dewlimiter for the next time, it fails:

you wouldn't choose the same delimiter ;-) I.e., the main point of
using an (almost) arbitrary sequence of characters as a delimiter is
so you'll always have a choice of new delimiters if you need them to
wrap around text containing old ones.

>
> q'|'list = [1, q'|'string|, 3]|
>
>as this would parse as the string "list = [1, q'" followed by a
>syntax error. Of course, you could choose another delimeter, but
>it'd be nice if this could be done automatically by a simple-minded
>program.

That could be arranged.


>
>Lisp is famous for being able to quote program text. The way it
>manages it is by using (...); this is possible because any brackets
>inside must match. Let's try this for Python, using {} instead of
>(), to prevent it from looking like a function call:
>
> list = [1, q{string}, 3]
>
>and:
>
> q{list = [1, q{string}, 3]}
>
>This works. The reason it works is that the outer '{' only tries to
>connect to the matching '}'. Inner braces are matched with
>themselves; braces inside normal strings or comments are ignored.
>

But text with matching braces or matching whatever doesn't qualify as
arbitrary content. E.g., you should be able to quote a broken lisp program
if you want to.

>>Note the lack of quotes around the final delimiter string, since it itself is
>>the final delimiter. This can also be used to solve the final unescaped
>>backslash problem for quoting windows paths:
>>
>> q'|'c:\foo\bar\|
>
>"""c:\foo\bar\""" and r"c:\foo\bar\" work just as well.
>

You didn't mean that ;-)

>>> """c:\foo\bar\"""
...
... (it's waiting for three unescaped quotes in a row)
... """
'c:\x0coo\x08ar"""\n\n(it\'s waiting for three unescaped quotes in a row)\n'
>>>
>>> r"c:\foo\bar\"
File "<string>", line 1
r"c:\foo\bar\"
^
SyntaxError: invalid token
>>>

BTW,

q'"""'c:\foo\bar\"""
and
q'"'c:\foo\bar\"

would work, if you wanted to use """ and " as q-delimiters,
because *all* escape characters are ignored in the content.

>The best solution, of course, is to not use Windows.
>

You probably did mean that ;-)

Regards,
Bengt Richter

Christopher Barber

unread,
Mar 13, 2002, 10:20:19 AM3/13/02
to
bo...@oz.net (Bengt Richter) writes:

> you wouldn't choose the same delimiter ;-) I.e., the main point of
> using an (almost) arbitrary sequence of characters as a delimiter is
> so you'll always have a choice of new delimiters if you need them to
> wrap around text containing old ones.

For comparison, in the Curl language, there is a concept of tagged verbatim
strings, which look like:

|<tag>"...stuff..."<tag>|

where <tag> can be any identifier (or nothing).

If the tag is an integer, it specifies the number of characters in the string:

|3"foo"3|

This is useful when you are generating code containing arbitrary string
literals.

There are corresponding tagged multi-line comments as well.

- Christopher

phil hunt

unread,
Mar 13, 2002, 9:39:29 AM3/13/02
to
On 13 Mar 2002 01:13:13 GMT, Bengt Richter <bo...@oz.net> wrote:
>>>
>>> q'|'c:\foo\bar\|
>>
>>"""c:\foo\bar\""" and r"c:\foo\bar\" work just as well.
>>
>You didn't mean that ;-)
>
> >>> """c:\foo\bar\"""
> ...
> ... (it's waiting for three unescaped quotes in a row)
> ... """
> 'c:\x0coo\x08ar"""\n\n(it\'s waiting for three unescaped quotes in a row)\n'
> >>>
> >>> r"c:\foo\bar\"
> File "<string>", line 1
> r"c:\foo\bar\"
> ^
> SyntaxError: invalid token
> >>>
>
>>The best solution, of course, is to not use Windows.
>>
>You probably did mean that ;-)

Absolutely. It's the solution I use, and it's never failed me yet.

Trent Mick

unread,
Mar 13, 2002, 12:21:11 PM3/13/02
to
[Neil Hodgson wrote]

> No. I started writing a lexer for Perl, but it is incomplete and is
> likely to remain incomplete. The subset of Perl handled is sufficient for
> syntax styling but is not sufficient for a language interpreter. The code is
> available as part of Scintilla and is used in various IDEs. At one stage it
> was used in Komodo, but ActiveState may have changed to using the Perl
> interpreter's lexer. I wrote the first versions and then other contributors
> extended the lexer into some of the more difficult pieces of Perl syntax.

Komodo (and Visual Perl sort of too) is still using LexPerl.cxx from the
scintilla project. I *believe* we have some additional bug fixes that should
come down the pipe when we get that in order.


Trent

--
Trent Mick
Tre...@ActiveState.com

Bengt Richter

unread,
Mar 13, 2002, 4:32:01 PM3/13/02
to
On 13 Mar 2002 10:20:19 -0500, Christopher Barber <cba...@curl.com> wrote:

>bo...@oz.net (Bengt Richter) writes:
>
>> you wouldn't choose the same delimiter ;-) I.e., the main point of
>> using an (almost) arbitrary sequence of characters as a delimiter is
>> so you'll always have a choice of new delimiters if you need them to
>> wrap around text containing old ones.
>
>For comparison, in the Curl language, there is a concept of tagged verbatim
>strings, which look like:
>
> |<tag>"...stuff..."<tag>|
>
>where <tag> can be any identifier (or nothing).
>
>If the tag is an integer, it specifies the number of characters in the string:
>
> |3"foo"3|
>
>This is useful when you are generating code containing arbitrary string
>literals.
>

And for fast skips across the source by just a seek, if encoding is fixed.

>There are corresponding tagged multi-line comments as well.
>

I like the symmetry. And the count idea, which I've seen in other forms,
but not like that. I'll have to look at the multi-line format. Thanks.

Regards,
Bengt Richter

Bengt Richter

unread,
Mar 13, 2002, 4:55:11 PM3/13/02
to

Unless you want the option of excluding the last EOL.
For line-oriented stuff it would be nice though, especially
for doc string readability. Actually a blank leading line
is not bad when you print a big doc string anyway...

Thanks for making me aware of YAML.

Regards,
Bengt Richter

0 new messages