Seems a really simple solution is count the number of each type of quote in the
string. But first you need to find all of the triple quotes.
for each quote type:
count = find all triple quotes
if count is even: closed
count = find all normal quotes
if count is even: closed
if not closed: open
with my replacement rlcompleter module i'd like to
have a *correct* check if a string is 'open'.
examples:
asd"""askdjalsdk # open
aksdjasd # closed
asjdkk"kajsd'''' # open
"'asdasd" # closed
"""dontcountoneven" # open
so i need a function which takes these strings as
an argument and return 1 for 'open', 0 for a 'closed' string.
Any working ideas?
holger
import re
rex = re.compile('"""|\'\'\'|"|\'')
def quotecompleted(str):
global rex
f = rex.findall(str)
lf = len(f)
if lf < 2: #the trivial cases
return lf
else:
cmp = f[0]
i = 1
while i < lf:
if cmp == f[i]:
if i + 1 == lf:
return 0
else:
cmp = f[i + 1]
i += 2
else:
i += 1
return 1
HTH
Harvey
_____________________________________________________________________
This message has been checked for all known viruses by the MessageLabs Virus Scanning Service.
right, you need to handle ", then '.
import re
test_string = '"""\'\'\' askldjl\'\'\''
STATE_OPEN = 0
STATE_CLOSED = 1
state = STATE_OPEN
dblqt_triple = re.compile(r'(""")')
snglqt_triple = re.compile(r"(''')")
m = dblqt_triple.search(test_string)
count = len(m.groups())
if count == 0 or (count % 2) == 0:
state = STATE_CLOSED
new_string = dblqt_triple.sub('', test_string)
m = snglqt_triple.search(new_string)
count = len(m.groups())
if count == 0 or (count % 2) == 0:
state = STATE_CLOSED
new_string = snglqt_triple.sub('', new_string)
and a similar pattern for checking normal quotes.
i thought along those lines, too, but couldn't get it correct easily.
> for each quote type:
> count = find all triple quotes
> if count is even: closed
for
"""''' askldjl'''
this returns 'closed': wrong!
holger
heh, even I misparsed that when i read it (-: shucks. Guess you need a WHOLE
LOT MORE code (-: Sounds like you get to do some real parsing of the code to
see if it is sane.
i think that's it. very nice!
i might try to shorten it a bit, though :-)
thanks,
holger
but it should be 0. the string *is closed*.
holger
what about
'''"""'''argh"""'''"""
:-)
holger
> hello,
>
> with my replacement rlcompleter module i'd like to
> have a *correct* check if a string is 'open'.
> examples:
You're going to have fun with strings containing spaces aren't you?
Cheers,
M.
--
Richard Gabriel was wrong: worse is not better, lying is better.
Languages and systems succeed in the marketplace to the extent that
their proponents lie about what they can do.
-- Tim Bradshaw, comp.lang.lisp
How about just trying to eval() the string? Assuming it begins with a
quotation mark or apostrophe it should be safe to call eval(). Either it's
a complete string in which case eval() is safe, or it's an open string and
you get a SyntaxError. You obviously don't eval stuff that doesn't start
with other characters.
--
Skip Montanaro (sk...@pobox.com - http://www.mojam.com/)
"Excellant Written and Communications Skills required" - seen on chi.jobs
I don't think so. How about this string:
'Cya' + os.system('rm -rf /') + 'Later'
It starts and ends with an apostrophe but I wouldn't want to eval it.
Cheers,
Brian
i have to do this with arbitrary statements/expressions.
Harvey Thomas found a nice solution fit to the problem.
thanks,
holger
> holger> with my replacement rlcompleter module i'd like to
> holger> have a *correct* check if a string is 'open'.
>
> How about just trying to eval() the string? Assuming it begins with a
> quotation mark or apostrophe it should be safe to call eval().
If you mean the builtin eval without any form of restricted execution,
you're not safe. Consider
s = """'', eval(<evilcode>)"""
eval(s)
Where <evil code> can do practically anything!
If you chose <evil code> carefully, the code might do anything. E.g.:
>>> s = """'', eval(compile("import os; os.system('ls')", "", "single"))"""
>>> eval(s)
build configure.in Lib Misc pyconfig.h.in
buildno CVS libpython2.1.a Modules python
config.cache Demo LICENSE Objects Python
config.h Doc Mac Parser README
config.log Grammar Makefile PC RISCOS
config.status Include Makefile.pre PCbuild setup.py
configure install-sh Makefile.pre.in PLAN.txt Tools
0
('', None)
>>>
Better:
>>> myglobals = {"__builtins__":{}}
>>> eval(s, myglobals, {})
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<string>", line 0, in ?
NameError: name 'eval' is not defined
>>>
But then you don't know whether the string contains correct quotes...
Bernhard
--
Intevation GmbH http://intevation.de/
Sketch http://sketch.sourceforge.net/
MapIt! http://www.mapit.de/
What means 'open'? A few examples do not a definition make. Write a
complete definition (one that applies to all strings) that *you*
regard as correct and think you are willing to live with. Then
translate to code (probably the easier part).
Terry J. Reedy
huh? not that i know of :-)
holger
sorry. i was to implicit in denoting that i refer
to the known python rules for the definition.
probably i thought that the reference of 'rlcompleter'
gave enough context.
the question is more exactly:
Given a string S, does S end in an unmatched or matched
quotation (quotation marks beeing ',",""",''').
The matching rules are those of our beloved python.
And no, the code was not exactly obvious. There are many ways how
you can do it wrongly :-)
Harvey Thomas has already solved this nicely.
holger
Bernhard> If you mean the builtin eval without any form of restricted
Bernhard> execution, you're not safe.
Whoops! Thanks. Them dang paren-less tuples!
>
> i think that's it. very nice!
>
> i might try to shorten it a bit, though :-)
>
> thanks,
>
> holger
>
>
is this short enough for you?
import re
def quoteopen(s):
quot=re.compile("(?P<quot>\"\"\"|'''|\"|').*?(?P=quot)")
s=quot.sub("",s)
return "'" in s or '"' in s
John
i have come what seems to be the final version :-)
def open_quote(text, rex=re.compile('"""|\'\'\'|"|\'')):
""" return the open quote at the end of text.
if all string-quotes are matched, return the
empty string. Based on ideas from Harvey Thomas.
"""
rfunc = lambda x,y: x!=y and (x or y) or ''
quotes = rex.findall(text)
return quotes and reduce(rfunc,quotes) or ''
regards and thanks for all suggestions,
holger
import re
def quoteopen(s,quot=re.compile("(?P<quot>\"\"\"|'''|\"|').*?(?P=quot)")):
return quot.sub("",s)
Noone has really discussed why simply counting the quotes is wrong. The
reason is that while scanning the string from left to right, if you come
across a quote you have to ignore all the *other* types of quotes until
you find a matching one. That's what the x!=y in Holger's solution is
checking for, and the (?P<quot>...).*?(?P=quot) does in mine.
One thing that is missing here is if there is an escaped quote in the string.
Neither of the regexps here look for them. Holger said he wanted it to work
like python strings.
Only Holger knows ;o) Do you need to check for escaped quotes Holger?
NO! you lost :-/
it doesn't work because the rex tries too hard to match.
paste this to your interpreter...
quoteopen('"""a"a""')
and it will match in ".*?" pairs which yields
the wrong result.
> regexes often offer more than one might think...
especial more subtlety :-)
holger
my other version also returns the 'open quote' but
yours is shorter. you won :-)
regexes often offer more than one might think...
thanks,
holger
bugger ;o)
we both lose :/
>>> open_quote('"a"""')
'"'
that should be closed, right? or am i misunderstanding the question?
if should return anything that *isn't* quoted like
q('A"quoted bit"B') --> 'AB'
might need more examples of return values, because "the way python treats quotes"
doesn't define that for you
John
back to the drawing board...
huh? um, i don't think so. my open_quote returns
the open *quote* (or '') not the 'left over string'.
> import re
>
> def quoteopen(s,quot=re.compile("(?P<quot>\"\"\"|'''|\"|').*?(?P=quot)")):
> return quot.sub("",s)
> Noone has really discussed why simply counting the quotes is wrong. The
> reason is that while scanning the string from left to right, if you come
> across a quote you have to ignore all the *other* types of quotes until
> you find a matching one. That's what the x!=y in Holger's solution is
> checking for, and the (?P<quot>...).*?(?P=quot) does in mine.
only the regex is too hungry. it eagerly matches """a"a"" although
we don't want it to.
> One thing that is missing here is if there is an escaped quote in the string.
> Neither of the regexps here look for them. Holger said he wanted it to work
> like python strings.
>
> Only Holger knows ;o) Do you need to check for escaped quotes Holger?
yes and i thought that open_quote does it correctly. not?
the reason i need normal python semantics is that it is
for a 'commandline-completion' module. So it should
e.g. work correctly for the regexes we are talking about :-)
> [me]
> >
> > def open_quote(text, rex=re.compile('"""|\'\'\'|"|\'')):
> > """ return the open quote at the end of text.
> > if all string-quotes are matched, return the
> > empty string. Based on ideas from Harvey Thomas.
> > """
> > rfunc = lambda x,y: x!=y and (x or y) or ''
> > quotes = rex.findall(text)
> > return quotes and reduce(rfunc,quotes) or ''
holger
no you are not. it should be closed. i have a fix below.
> if should return anything that *isn't* quoted like
>
> q('A"quoted bit"B') --> 'AB'
i don't need this variation currently.
> might need more examples of return values, because "the way python treats quotes"
> doesn't define that for you
It does. Just enter a string at the interactive prompt and hit
return: if python prints the 'continuation prompt', you are inside a string.
Note that you can enter e.g. "" "" '' "" """""" """ because python automatically
concatenates these strings (no plus needed).
So here is the fixed version together with some test cases:
def open_quote(text, rex=re.compile('"""|\'\'\'|"|\'')):
""" return the open quote at the end of text.
if all string-quotes are matched, return the
empty string. thanks to Harvey Thomas&John La Roy.
"""
rfunc = lambda x,y: x=='' and y or not y.startswith(x) and x or ''
quotes = rex.findall(text)
return quotes and reduce(rfunc,quotes) or ''
assert(open_quote(r'''a''')=='')
assert(open_quote(r'''"''')=='"')
assert(open_quote(r'''\'''')=="'")
assert(open_quote(r'''"a"""''')=='')
assert(open_quote(r'''"""a"''')=='"""')
assert(open_quote(r'''"""a""""''')=='"')
assert(open_quote(r'''"""a""""''')=='"')
assert(open_quote(r'''"a"b"a"''')=='')
assert(open_quote(r'''"""''""''"""''')=='')
assert(open_quote(r'''"""''""''"''')=='"""')
assert(open_quote(r'''r"""\"\"\"ad"""''')=='')
assert(open_quote(r'''r""\"ad"""''')=='')
assert(open_quote(r'''r""\""ad"""''')=='"""')
assert('''good night''')
holger
Because of the way readline works. It calls the completer function
with the word stem, so if the buffer looks like
>>> "a very long string
^
when you hit TAB the completer will get sent just "string" (I think).
Cheers,
M.
--
> say-hi-to-the-flying-pink-elephants-for-me-ly y'rs,
No way, the flying pink elephants are carrying MACHINE GUNS!
Aiiee!! Time for a kinder, gentler hallucinogen...
-- Barry Warsaw & Greg Ward, python-dev
Actually what you get is everything from the last 'delimiter'.
Delimiters are set via readline.set_completer_delims(delims_str).
With my implementation completer_delims are set to the empty string.
i want to have the complete line up to the cursor. This is because
you cannot find any good delimiter set capable of helping you
with the python-expression.
cheers,
holger
> Michael Hudson wrote:
> > [me]
> > > [you]
> > > >
> > > > You're going to have fun with strings containing spaces aren't you?
> > >
> > > huh? not that i know of :-)
> >
> > Because of the way readline works. It calls the completer function
> > with the word stem, so if the buffer looks like
> >
> > >>> "a very long string
> > ^
> > when you hit TAB the completer will get sent just "string" (I think).
>
> Actually what you get is everything from the last 'delimiter'.
> Delimiters are set via readline.set_completer_delims(delims_str).
Ah yes. Does that actually work? I seem to remember a report of it
not working properly, but can't find it now...
Cheers,
M.
--
MAN: How can I tell that the past isn't a fiction designed to
account for the discrepancy between my immediate physical
sensations and my state of mind?
-- The Hitch-Hikers Guide to the Galaxy, Episode 12
> def open_quote(text, rex=re.compile('"""|\'\'\'|"|\'')):
> """ return the open quote at the end of text.
> if all string-quotes are matched, return the
> empty string. thanks to Harvey Thomas&John La Roy.
> """
> rfunc = lambda x,y: x=='' and y or not y.startswith(x) and x or ''
> quotes = rex.findall(text)
> return quotes and reduce(rfunc,quotes) or ''
This doesn't even look at backslashes in the string.
>>> open_quote("'\\'") # should return "'"
''
Since you're trying to parse python source code: why don't you try the
tokenize module?
> >> How about just trying to eval() the string? Assuming it begins with
> >> a quotation mark or apostrophe it should be safe to call eval().
>
> Bernhard> If you mean the builtin eval without any form of restricted
> Bernhard> execution, you're not safe.
>
> Whoops! Thanks. Them dang paren-less tuples!
I could have used any other expression that can start with a string
literal, like "" + something as in some other postings. The tuple syntax
had the big advantage that it doesn't matter what the part after the
comma evaluates to as long as it doesn't raise any exceptions.
i have a somewhat corrected version but ...
> Since you're trying to parse python source code: why don't you try the
> tokenize module?
this seems like a good idea! I didn't know that tokenize does this
(in other contexts like c++/java 'tokenize' was a more general thing not
directly capable of parsing quoted strings correctly etc.)
Preliminary tests show that python's tokenize is relatively easy to handle
although i already found some gotchas. E.g. tokenize produces for
print "
first the ERRORTOKEN ' ' and then the ERRORTOKEN '"'. (strange, not?)
I guess i have to understand 'tokenize' better to have it under full control.
i knew there must be a right way :-)
thanks again,
holger