how to determine an 'open' string?

Sean 'Shaleh' Perry

unread,

May 16, 2002, 11:06:52 AM5/16/02

to

On 16-May-2002 holger krekel wrote:
> hello,
>
> with my replacement rlcompleter module i'd like to
> have a *correct* check if a string is 'open'.
> examples:
>
> asd"""askdjalsdk # open
> aksdjasd # closed
> asjdkk"kajsd'''' # open
> "'asdasd" # closed
> """dontcountoneven" # open
>
> so i need a function which takes these strings as
> an argument and return 1 for 'open', 0 for a 'closed' string.
>

Seems a really simple solution is count the number of each type of quote in the
string. But first you need to find all of the triple quotes.

for each quote type:
count = find all triple quotes
if count is even: closed

count = find all normal quotes
if count is even: closed

if not closed: open

holger krekel

unread,

May 16, 2002, 10:40:07 AM5/16/02

to

hello,

with my replacement rlcompleter module i'd like to
have a *correct* check if a string is 'open'.
examples:

asd"""askdjalsdk # open
aksdjasd # closed
asjdkk"kajsd'''' # open
"'asdasd" # closed
"""dontcountoneven" # open

so i need a function which takes these strings as
an argument and return 1 for 'open', 0 for a 'closed' string.

Any working ideas?

holger

Harvey Thomas

unread,

May 16, 2002, 11:59:00 AM5/16/02

to

Holger wrote

I think this is OK

import re

rex = re.compile('"""|\'\'\'|"|\'')

def quotecompleted(str):
global rex
f = rex.findall(str)
lf = len(f)
if lf < 2: #the trivial cases
return lf
else:
cmp = f[0]
i = 1
while i < lf:
if cmp == f[i]:
if i + 1 == lf:
return 0
else:
cmp = f[i + 1]
i += 2
else:
i += 1
return 1

HTH

Harvey

_____________________________________________________________________
This message has been checked for all known viruses by the MessageLabs Virus Scanning Service.

Sean 'Shaleh' Perry

unread,

May 16, 2002, 12:05:06 PM5/16/02

to

>>
>> Seems a really simple solution is count the number of each type of quote in
>> the
>> string. But first you need to find all of the triple quotes.
>

> i thought along those lines, too, but couldn't get it correct easily.

>
>> for each quote type:
>> count = find all triple quotes
>> if count is even: closed
>

> for
> """''' askldjl'''
>
> this returns 'closed': wrong!
>

right, you need to handle ", then '.

import re

test_string = '"""\'\'\' askldjl\'\'\''

STATE_OPEN = 0
STATE_CLOSED = 1
state = STATE_OPEN

dblqt_triple = re.compile(r'(""")')
snglqt_triple = re.compile(r"(''')")

m = dblqt_triple.search(test_string)
count = len(m.groups())
if count == 0 or (count % 2) == 0:
state = STATE_CLOSED
new_string = dblqt_triple.sub('', test_string)

m = snglqt_triple.search(new_string)
count = len(m.groups())
if count == 0 or (count % 2) == 0:
state = STATE_CLOSED
new_string = snglqt_triple.sub('', new_string)

and a similar pattern for checking normal quotes.

holger krekel

unread,

May 16, 2002, 11:16:37 AM5/16/02

to

Sean 'Shaleh' Perry wrote:

>
> On 16-May-2002 holger krekel wrote:
> > hello,
> >
> > with my replacement rlcompleter module i'd like to
> > have a *correct* check if a string is 'open'.
> > examples:
> >
> > asd"""askdjalsdk # open
> > aksdjasd # closed
> > asjdkk"kajsd'''' # open
> > "'asdasd" # closed
> > """dontcountoneven" # open
> >
> > so i need a function which takes these strings as
> > an argument and return 1 for 'open', 0 for a 'closed' string.
> >
>

> Seems a really simple solution is count the number of each type of quote in the
> string. But first you need to find all of the triple quotes.

i thought along those lines, too, but couldn't get it correct easily.

> for each quote type:
> count = find all triple quotes
> if count is even: closed

for
"""''' askldjl'''

this returns 'closed': wrong!

holger

Sean 'Shaleh' Perry

unread,

May 16, 2002, 12:55:19 PM5/16/02

to

On 16-May-2002 holger krekel wrote:

> Sean 'Shaleh' Perry wrote:
>> >
>> > what about
>> >
>> > '''"""'''argh"""'''"""
>> >
>> >:-)
>> >
>>
>> what about it? I count 3 single qt triples and 3 double qt triples. 3 % 2
>> is
>> not 0.
>
> but it should be 0. the string *is closed*.
>

heh, even I misparsed that when i read it (-: shucks. Guess you need a WHOLE
LOT MORE code (-: Sounds like you get to do some real parsing of the code to
see if it is sane.

holger krekel

unread,

May 16, 2002, 12:59:17 PM5/16/02

to

Harvey Thomas wrote:
> > asd"""askdjalsdk # open
> > aksdjasd # closed
> > asjdkk"kajsd'''' # open
> > "'asdasd" # closed
> > """dontcountoneven" # open
> >

> > Any working ideas?
> >
> > holger
> >
> I think this is OK
>
> import re
>
> rex = re.compile('"""|\'\'\'|"|\'')
>
> def quotecompleted(str):
> global rex
> f = rex.findall(str)
> lf = len(f)
> if lf < 2: #the trivial cases
> return lf
> else:
> cmp = f[0]
> i = 1
> while i < lf:
> if cmp == f[i]:
> if i + 1 == lf:
> return 0
> else:
> cmp = f[i + 1]
> i += 2
> else:
> i += 1
> return 1

i think that's it. very nice!

i might try to shorten it a bit, though :-)

thanks,

holger

holger krekel

unread,

May 16, 2002, 12:50:40 PM5/16/02

to

Sean 'Shaleh' Perry wrote:
> >
> > what about
> >
> > '''"""'''argh"""'''"""
> >
> >:-)
> >
>
> what about it? I count 3 single qt triples and 3 double qt triples. 3 % 2 is
> not 0.

but it should be 0. the string *is closed*.

holger

holger krekel

unread,

May 16, 2002, 12:33:09 PM5/16/02

to

Sean 'Shaleh' Perry wrote:
>
> >>
> >> Seems a really simple solution is count the number of each type of quote in
> >> the
> >> string. But first you need to find all of the triple quotes.
> >
> > i thought along those lines, too, but couldn't get it correct easily.
> >
> >> for each quote type:
> >> count = find all triple quotes
> >> if count is even: closed
> >
> > for
> > """''' askldjl'''
> >
> > this returns 'closed': wrong!
> >
>

> right, you need to handle ", then '.
>
> import re
>
> test_string = '"""\'\'\' askldjl\'\'\''
>
> STATE_OPEN = 0
> STATE_CLOSED = 1
> state = STATE_OPEN
>
> dblqt_triple = re.compile(r'(""")')
> snglqt_triple = re.compile(r"(''')")
>
> m = dblqt_triple.search(test_string)
> count = len(m.groups())
> if count == 0 or (count % 2) == 0:
> state = STATE_CLOSED
> new_string = dblqt_triple.sub('', test_string)
>
> m = snglqt_triple.search(new_string)
> count = len(m.groups())
> if count == 0 or (count % 2) == 0:
> state = STATE_CLOSED
> new_string = snglqt_triple.sub('', new_string)

what about

'''"""'''argh"""'''"""

:-)

holger

Michael Hudson

unread,

May 16, 2002, 1:52:12 PM5/16/02

to

holger krekel <py...@devel.trillke.net> writes:

> hello,
>
> with my replacement rlcompleter module i'd like to
> have a *correct* check if a string is 'open'.
> examples:

You're going to have fun with strings containing spaces aren't you?

Cheers,
M.

--
Richard Gabriel was wrong: worse is not better, lying is better.
Languages and systems succeed in the marketplace to the extent that
their proponents lie about what they can do.
-- Tim Bradshaw, comp.lang.lisp

Skip Montanaro

unread,

May 16, 2002, 1:43:15 PM5/16/02

to

holger> with my replacement rlcompleter module i'd like to
holger> have a *correct* check if a string is 'open'.

How about just trying to eval() the string? Assuming it begins with a
quotation mark or apostrophe it should be safe to call eval(). Either it's
a complete string in which case eval() is safe, or it's an open string and
you get a SyntaxError. You obviously don't eval stuff that doesn't start
with other characters.

--
Skip Montanaro (sk...@pobox.com - http://www.mojam.com/)
"Excellant Written and Communications Skills required" - seen on chi.jobs

Brian Quinlan

unread,

May 16, 2002, 1:54:04 PM5/16/02

to

Skip wrote:
> holger> with my replacement rlcompleter module i'd like to
> holger> have a *correct* check if a string is 'open'.
>
> How about just trying to eval() the string? Assuming it begins with a
> quotation mark or apostrophe it should be safe to call eval().

I don't think so. How about this string:

'Cya' + os.system('rm -rf /') + 'Later'

It starts and ends with an apostrophe but I wouldn't want to eval it.

Cheers,
Brian

holger krekel

unread,

May 16, 2002, 1:56:20 PM5/16/02

to

Skip Montanaro wrote:
>
> holger> with my replacement rlcompleter module i'd like to
> holger> have a *correct* check if a string is 'open'.
>
> How about just trying to eval() the string? Assuming it begins with a

> quotation mark or apostrophe it should be safe to call eval(). Either it's
> a complete string in which case eval() is safe, or it's an open string and
> you get a SyntaxError. You obviously don't eval stuff that doesn't start
> with other characters.

i have to do this with arbitrary statements/expressions.
Harvey Thomas found a nice solution fit to the problem.

thanks,

holger

Bernhard Herzog

unread,

May 16, 2002, 2:43:30 PM5/16/02

to

Skip Montanaro <sk...@pobox.com> writes:

> holger> with my replacement rlcompleter module i'd like to
> holger> have a *correct* check if a string is 'open'.
>
> How about just trying to eval() the string? Assuming it begins with a
> quotation mark or apostrophe it should be safe to call eval().

If you mean the builtin eval without any form of restricted execution,
you're not safe. Consider

s = """'', eval(<evilcode>)"""
eval(s)

Where <evil code> can do practically anything!

If you chose <evil code> carefully, the code might do anything. E.g.:

>>> s = """'', eval(compile("import os; os.system('ls')", "", "single"))"""
>>> eval(s)
build configure.in Lib Misc pyconfig.h.in
buildno CVS libpython2.1.a Modules python
config.cache Demo LICENSE Objects Python
config.h Doc Mac Parser README
config.log Grammar Makefile PC RISCOS
config.status Include Makefile.pre PCbuild setup.py
configure install-sh Makefile.pre.in PLAN.txt Tools
0
('', None)
>>>

Better:

>>> myglobals = {"__builtins__":{}}
>>> eval(s, myglobals, {})
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<string>", line 0, in ?
NameError: name 'eval' is not defined
>>>

But then you don't know whether the string contains correct quotes...

Bernhard

--
Intevation GmbH http://intevation.de/
Sketch http://sketch.sourceforge.net/
MapIt! http://www.mapit.de/

Terry Reedy

unread,

May 16, 2002, 2:53:05 PM5/16/02

to

"holger krekel" <py...@devel.trillke.net> wrote in message
news:mailman.1021560087...@python.org...

What means 'open'? A few examples do not a definition make. Write a
complete definition (one that applies to all strings) that *you*
regard as correct and think you are willing to live with. Then
translate to code (probably the easier part).

Terry J. Reedy

Sean 'Shaleh' Perry

unread,

May 16, 2002, 12:36:17 PM5/16/02

to

holger krekel

unread,

May 16, 2002, 2:18:30 PM5/16/02

to

Michael Hudson wrote:
> holger krekel <py...@devel.trillke.net> writes:
>
> > hello,
> >
> > with my replacement rlcompleter module i'd like to
> > have a *correct* check if a string is 'open'.
> > examples:
>
> You're going to have fun with strings containing spaces aren't you?

huh? not that i know of :-)

holger

holger krekel

unread,

May 16, 2002, 3:22:37 PM5/16/02

to

sorry. i was to implicit in denoting that i refer
to the known python rules for the definition.
probably i thought that the reference of 'rlcompleter'
gave enough context.

the question is more exactly:

Given a string S, does S end in an unmatched or matched
quotation (quotation marks beeing ',",""",''').
The matching rules are those of our beloved python.

And no, the code was not exactly obvious. There are many ways how
you can do it wrongly :-)

Harvey Thomas has already solved this nicely.

holger

Skip Montanaro

unread,

May 16, 2002, 4:04:36 PM5/16/02

to

>> How about just trying to eval() the string? Assuming it begins with
>> a quotation mark or apostrophe it should be safe to call eval().

Bernhard> If you mean the builtin eval without any form of restricted
Bernhard> execution, you're not safe.

Whoops! Thanks. Them dang paren-less tuples!

John La Rooy

unread,

May 16, 2002, 5:00:25 PM5/16/02

to

On Thu, 16 May 2002 18:59:17 +0200
holger krekel <py...@devel.trillke.net> wrote:

>
> i think that's it. very nice!
>
> i might try to shorten it a bit, though :-)
>
> thanks,
>
> holger
>
>

is this short enough for you?

import re

def quoteopen(s):
quot=re.compile("(?P<quot>\"\"\"|'''|\"|').*?(?P=quot)")
s=quot.sub("",s)
return "'" in s or '"' in s

John

holger krekel

unread,

May 16, 2002, 4:31:29 PM5/16/02

to

Harvey Thomas wrote:
> import re
>
> rex = re.compile('"""|\'\'\'|"|\'')
>
> def quotecompleted(str):
> global rex
> f = rex.findall(str)
> lf = len(f)
> if lf < 2: #the trivial cases

> ...<13 lines more>

i have come what seems to be the final version :-)

def open_quote(text, rex=re.compile('"""|\'\'\'|"|\'')):
""" return the open quote at the end of text.
if all string-quotes are matched, return the
empty string. Based on ideas from Harvey Thomas.
"""
rfunc = lambda x,y: x!=y and (x or y) or ''
quotes = rex.findall(text)
return quotes and reduce(rfunc,quotes) or ''

regards and thanks for all suggestions,

holger

John La Rooy

unread,

May 16, 2002, 6:03:57 PM5/16/02

to

Oh you changed the return value from 0 to 1 to the left over string ;o)
in that case...

import re

def quoteopen(s,quot=re.compile("(?P<quot>\"\"\"|'''|\"|').*?(?P=quot)")):
return quot.sub("",s)

Noone has really discussed why simply counting the quotes is wrong. The
reason is that while scanning the string from left to right, if you come
across a quote you have to ignore all the *other* types of quotes until
you find a matching one. That's what the x!=y in Holger's solution is
checking for, and the (?P<quot>...).*?(?P=quot) does in mine.

One thing that is missing here is if there is an escaped quote in the string.
Neither of the regexps here look for them. Holger said he wanted it to work
like python strings.

Only Holger knows ;o) Do you need to check for escaped quotes Holger?

holger krekel

unread,

May 16, 2002, 5:43:46 PM5/16/02

to

To John La Rooy wrote:
> > is this short enough for you?
> >
> > import re
> >
> > def quoteopen(s):
> > quot=re.compile("(?P<quot>\"\"\"|'''|\"|').*?(?P=quot)")
> > s=quot.sub("",s)
> > return "'" in s or '"' in s
>

> my other version also returns the 'open quote' but
> yours is shorter. you won :-)

NO! you lost :-/

it doesn't work because the rex tries too hard to match.
paste this to your interpreter...

quoteopen('"""a"a""')

and it will match in ".*?" pairs which yields
the wrong result.

> regexes often offer more than one might think...

especial more subtlety :-)

holger

holger krekel

unread,

May 16, 2002, 5:24:56 PM5/16/02

to

my other version also returns the 'open quote' but

yours is shorter. you won :-)

regexes often offer more than one might think...

thanks,

holger

John La Rooy

unread,

May 16, 2002, 7:06:40 PM5/16/02

to

bugger ;o)
we both lose :/
>>> open_quote('"a"""')
'"'

that should be closed, right? or am i misunderstanding the question?

if should return anything that *isn't* quoted like

q('A"quoted bit"B') --> 'AB'

might need more examples of return values, because "the way python treats quotes"
doesn't define that for you

John

back to the drawing board...

holger krekel

unread,

May 16, 2002, 6:21:01 PM5/16/02

to

John La Rooy wrote:
> Oh you changed the return value from 0 to 1 to the left over string ;o)

huh? um, i don't think so. my open_quote returns
the open *quote* (or '') not the 'left over string'.

> import re
>
> def quoteopen(s,quot=re.compile("(?P<quot>\"\"\"|'''|\"|').*?(?P=quot)")):
> return quot.sub("",s)

> Noone has really discussed why simply counting the quotes is wrong. The
> reason is that while scanning the string from left to right, if you come
> across a quote you have to ignore all the *other* types of quotes until
> you find a matching one. That's what the x!=y in Holger's solution is
> checking for, and the (?P<quot>...).*?(?P=quot) does in mine.

only the regex is too hungry. it eagerly matches """a"a"" although
we don't want it to.

> One thing that is missing here is if there is an escaped quote in the string.
> Neither of the regexps here look for them. Holger said he wanted it to work
> like python strings.
>
> Only Holger knows ;o) Do you need to check for escaped quotes Holger?

yes and i thought that open_quote does it correctly. not?
the reason i need normal python semantics is that it is
for a 'commandline-completion' module. So it should
e.g. work correctly for the regexes we are talking about :-)

> [me]

> >
> > def open_quote(text, rex=re.compile('"""|\'\'\'|"|\'')):
> > """ return the open quote at the end of text.
> > if all string-quotes are matched, return the
> > empty string. Based on ideas from Harvey Thomas.
> > """
> > rfunc = lambda x,y: x!=y and (x or y) or ''
> > quotes = rex.findall(text)
> > return quotes and reduce(rfunc,quotes) or ''

holger

holger krekel

unread,

May 16, 2002, 8:31:10 PM5/16/02

to

John La Rooy wrote:
> bugger ;o)
> we both lose :/
> >>> open_quote('"a"""')
> '"'
>
> that should be closed, right? or am i misunderstanding the question?

no you are not. it should be closed. i have a fix below.

> if should return anything that *isn't* quoted like
>
> q('A"quoted bit"B') --> 'AB'

i don't need this variation currently.

> might need more examples of return values, because "the way python treats quotes"
> doesn't define that for you

It does. Just enter a string at the interactive prompt and hit
return: if python prints the 'continuation prompt', you are inside a string.
Note that you can enter e.g. "" "" '' "" """""" """ because python automatically
concatenates these strings (no plus needed).

So here is the fixed version together with some test cases:

def open_quote(text, rex=re.compile('"""|\'\'\'|"|\'')):
""" return the open quote at the end of text.
if all string-quotes are matched, return the

empty string. thanks to Harvey Thomas&John La Roy.
"""
rfunc = lambda x,y: x=='' and y or not y.startswith(x) and x or ''

quotes = rex.findall(text)
return quotes and reduce(rfunc,quotes) or ''

assert(open_quote(r'''a''')=='')
assert(open_quote(r'''"''')=='"')
assert(open_quote(r'''\'''')=="'")
assert(open_quote(r'''"a"""''')=='')
assert(open_quote(r'''"""a"''')=='"""')
assert(open_quote(r'''"""a""""''')=='"')
assert(open_quote(r'''"""a""""''')=='"')
assert(open_quote(r'''"a"b"a"''')=='')
assert(open_quote(r'''"""''""''"""''')=='')
assert(open_quote(r'''"""''""''"''')=='"""')
assert(open_quote(r'''r"""\"\"\"ad"""''')=='')
assert(open_quote(r'''r""\"ad"""''')=='')
assert(open_quote(r'''r""\""ad"""''')=='"""')

assert('''good night''')

holger

Michael Hudson

unread,

May 17, 2002, 4:51:24 AM5/17/02

to

holger krekel <py...@devel.trillke.net> writes:

Because of the way readline works. It calls the completer function
with the word stem, so if the buffer looks like

>>> "a very long string
^
when you hit TAB the completer will get sent just "string" (I think).

Cheers,
M.

--
> say-hi-to-the-flying-pink-elephants-for-me-ly y'rs,
No way, the flying pink elephants are carrying MACHINE GUNS!
Aiiee!! Time for a kinder, gentler hallucinogen...
-- Barry Warsaw & Greg Ward, python-dev

holger krekel

unread,

May 17, 2002, 5:12:01 AM5/17/02

to

Michael Hudson wrote:
> [me]
> > [you]

> > >
> > > You're going to have fun with strings containing spaces aren't you?
> >
> > huh? not that i know of :-)
>
> Because of the way readline works. It calls the completer function
> with the word stem, so if the buffer looks like
>
> >>> "a very long string
> ^
> when you hit TAB the completer will get sent just "string" (I think).

Actually what you get is everything from the last 'delimiter'.
Delimiters are set via readline.set_completer_delims(delims_str).

With my implementation completer_delims are set to the empty string.
i want to have the complete line up to the cursor. This is because
you cannot find any good delimiter set capable of helping you
with the python-expression.

cheers,

holger

Michael Hudson

unread,

May 17, 2002, 5:25:27 AM5/17/02

to

holger krekel <py...@devel.trillke.net> writes:

> Michael Hudson wrote:
> > [me]
> > > [you]
> > > >
> > > > You're going to have fun with strings containing spaces aren't you?
> > >
> > > huh? not that i know of :-)
> >
> > Because of the way readline works. It calls the completer function
> > with the word stem, so if the buffer looks like
> >
> > >>> "a very long string
> > ^
> > when you hit TAB the completer will get sent just "string" (I think).
>
> Actually what you get is everything from the last 'delimiter'.
> Delimiters are set via readline.set_completer_delims(delims_str).

Ah yes. Does that actually work? I seem to remember a report of it
not working properly, but can't find it now...

Cheers,
M.

--
MAN: How can I tell that the past isn't a fiction designed to
account for the discrepancy between my immediate physical
sensations and my state of mind?
-- The Hitch-Hikers Guide to the Galaxy, Episode 12

Bernhard Herzog

unread,

May 17, 2002, 5:39:33 AM5/17/02

to

holger krekel <py...@devel.trillke.net> writes:

> def open_quote(text, rex=re.compile('"""|\'\'\'|"|\'')):
> """ return the open quote at the end of text.
> if all string-quotes are matched, return the
> empty string. thanks to Harvey Thomas&John La Roy.
> """
> rfunc = lambda x,y: x=='' and y or not y.startswith(x) and x or ''
> quotes = rex.findall(text)
> return quotes and reduce(rfunc,quotes) or ''

This doesn't even look at backslashes in the string.

>>> open_quote("'\\'") # should return "'"
''

Since you're trying to parse python source code: why don't you try the
tokenize module?

Bernhard Herzog

unread,

May 17, 2002, 5:43:24 AM5/17/02

to

Skip Montanaro <sk...@pobox.com> writes:

> >> How about just trying to eval() the string? Assuming it begins with
> >> a quotation mark or apostrophe it should be safe to call eval().
>
> Bernhard> If you mean the builtin eval without any form of restricted
> Bernhard> execution, you're not safe.
>
> Whoops! Thanks. Them dang paren-less tuples!

I could have used any other expression that can start with a string
literal, like "" + something as in some other postings. The tuple syntax
had the big advantage that it doesn't matter what the part after the
comma evaluates to as long as it doesn't raise any exceptions.

holger krekel

unread,

May 17, 2002, 10:09:55 AM5/17/02

to

Bernhard Herzog wrote:
> holger krekel <py...@devel.trillke.net> writes:
>
> > def open_quote(text, rex=re.compile('"""|\'\'\'|"|\'')):
> > """ return the open quote at the end of text.
> > if all string-quotes are matched, return the
> > empty string. thanks to Harvey Thomas&John La Roy.
> > """
> > rfunc = lambda x,y: x=='' and y or not y.startswith(x) and x or ''
> > quotes = rex.findall(text)
> > return quotes and reduce(rfunc,quotes) or ''
>
> This doesn't even look at backslashes in the string.
>
> >>> open_quote("'\\'") # should return "'"
> ''

i have a somewhat corrected version but ...

> Since you're trying to parse python source code: why don't you try the
> tokenize module?

this seems like a good idea! I didn't know that tokenize does this
(in other contexts like c++/java 'tokenize' was a more general thing not
directly capable of parsing quoted strings correctly etc.)

Preliminary tests show that python's tokenize is relatively easy to handle
although i already found some gotchas. E.g. tokenize produces for

print "

first the ERRORTOKEN ' ' and then the ERRORTOKEN '"'. (strange, not?)
I guess i have to understand 'tokenize' better to have it under full control.

i knew there must be a right way :-)

thanks again,

holger