eval to dict problems NEWB going crazy !

manstey

unread,

Jul 6, 2006, 6:34:32 AM7/6/06

to

Hi,

I have a text file called a.txt:

# comments
[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]
[('recId', 5), ('parse', {'pos': u'np', 'gen': u'm'})]
[('recId', 7 ), ('parse', {'pos': u'np', 'gen': u'm'})]

I read it using this:

filAnsMorph = codecs.open('a.txt', 'r', 'utf-8') # Initialise input
file
dicAnsMorph = {}
for line in filAnsMorph:
if line[0] != '#': # Get rid of comment lines
x = eval(line)
dicAnsMorph[x[0][1]] = x[1][1] # recid is key, parse dict is
value

But it crashes every time on x = eval(line). Why is this? If I change
a.txt to:

# comments
[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]

it works fine. Why doesn't it work with multiple lines? it's driving me
crazy!

Thanks,
Matthew

Bruno Desthuilliers

unread,

Jul 6, 2006, 7:04:54 AM7/6/06

to

try with:
x = eval(line.strip('\n'))

--
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'on...@xiludom.gro'.split('@')])"

manstey

unread,

Jul 6, 2006, 7:25:19 AM7/6/06

to

That doesn't work. I just get an error:

x = eval(line.strip('\n'))

File "<string>", line 1

[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]

SyntaxError: unexpected EOF while parsing

any other ideas?

Eric Deveaud

unread,

Jul 6, 2006, 7:36:43 AM7/6/06

to

manstey wrote:
> That doesn't work. I just get an error:
>
> x = eval(line.strip('\n'))
> File "<string>", line 1
> [('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]
>
> SyntaxError: unexpected EOF while parsing
>

is the last line of your file empty ??

what with

for line in filAnsMorph:
# remove any trailing and leading whitespace includes removing \n
line = line.strip()

# Get rid of comment lines

if line.startswith('#'):
continue
# Get rid of blank line
if line == '':
continue
#do the job
x = eval(line)

NB by default strip() removes leading and trailing characters from the target
string. with whitspace defined as whitespace = '\t\n\x0b\x0c\r '

Eric

Fredrik Lundh

unread,

Jul 6, 2006, 7:51:08 AM7/6/06

to pytho...@python.org

"manstey" <man...@csu.edu.au> wrote:

> That doesn't work. I just get an error:
>
> x = eval(line.strip('\n'))
> File "<string>", line 1
> [('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]
>
> SyntaxError: unexpected EOF while parsing
>
> any other ideas?

hint 1:

>>> eval("[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]\n")

[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]

>>> eval("[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]")

[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]

hint 2:

>>> eval("")
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<string>", line 0

^

SyntaxError: unexpected EOF while parsing

>>> eval("\n")
Traceback (most recent call last):
File "<stdin>", line 1, in ?

File "<string>", line 1

^

SyntaxError: unexpected EOF while parsing

hint 3: adding a "print" statement *before* the offending line is often a good way
to figure out why something's not working. "repr()" is also a useful thing:

if line[0] != '#': # Get rid of comment lines

print repr(line) # DEBUG: let's see what we're trying to evaluate

x = eval(line)
dicAnsMorph[x[0][1]] = x[1][1] # recid is key, parse dict is

</F>

Roel Schroeven

unread,

Jul 6, 2006, 8:00:58 AM7/6/06

to

manstey schreef:

It looks like it's because of the trailing newline. When you read a file
like that, the newline at the end of each line is still in line. You can
strip it e.g. with rstrip, like so:

x = eval(line.rstrip('\n'))

--
If I have been able to see further, it was only because I stood
on the shoulders of giants. -- Isaac Newton

Roel Schroeven

Fredrik Lundh

unread,

Jul 6, 2006, 8:10:14 AM7/6/06

to pytho...@python.org

> hint 1:

hint 1b:

>>> eval("[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]")

[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]

>>> eval("[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]\n")

[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]

>>> eval("[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]\r\n")

Traceback (most recent call last):
File "<stdin>", line 1, in ?

File "<string>", line 1
[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]

^
SyntaxError: invalid syntax

</F>

Steven D'Aprano

unread,

Jul 7, 2006, 10:52:32 AM7/7/06

to

On Thu, 06 Jul 2006 03:34:32 -0700, manstey wrote:

> Hi,
>
> I have a text file called a.txt:
>
> # comments
> [('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]
> [('recId', 5), ('parse', {'pos': u'np', 'gen': u'm'})]
> [('recId', 7 ), ('parse', {'pos': u'np', 'gen': u'm'})]
>
> I read it using this:
>
> filAnsMorph = codecs.open('a.txt', 'r', 'utf-8') # Initialise input
> file
> dicAnsMorph = {}
> for line in filAnsMorph:
> if line[0] != '#': # Get rid of comment lines
> x = eval(line)
> dicAnsMorph[x[0][1]] = x[1][1] # recid is key, parse dict is
> value
>
> But it crashes every time on x = eval(line). Why is this?

Some people have incorrectly suggested the solution is to remove the
newline from the end of the line. Others have already pointed out one
possible solution.

I'd like to ask, why are you using eval in the first place?

The problem with eval is that it is simultaneously too finicky and too
powerful. It is finicky -- it has problems with lines ending with a
carriage return, empty lines, and probably other things. But it is also
too powerful. Your program wants a specific piece of data, but eval
will accept any string which is a valid Python expression. eval is quite
capable of giving you a dictionary, or an int, or just about anything --
and, depending on your code, you might not find out for a long time,
leading to hard-to-debug bugs.

Is your data under your control? Could some malicious person inject data
into your file a.txt? If so, you should be aware of the security
implications:

# comment

[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]
[('recId', 5), ('parse', {'pos': u'np', 'gen': u'm'})]

# line injected by a malicious user
"__import__('os').system('echo if I were bad I could do worse')"

[('recId', 7 ), ('parse', {'pos': u'np', 'gen': u'm'})]

Now, if the malicious user can only damage their own system, maybe you
don't care -- but the security hole is there. Are you sure that no
malicious third party, given *only* write permission to the file a.txt,
could compromise your entire system?

Personally, I would never use eval on any string I didn't write myself. If
I was thinking about evaluating a user-string, I would always write a
function to parse the string and accept only the specific sort of data I
expected. In your case, a quick-and-dirty untested function might be:

def parse(s):
"""Parse string s, and return a two-item list like this:

[tuple(string, integer), tuple(string, dict(string: string)]
"""

def parse_tuple(s):
"""Parse a tuple with two items exactly."""
s = s.strip()
assert s.startswith("(")
assert s.endswith(")")
a, b = s[1:-1].split(",")
return (a.strip(), b.strip())

def parse_dict(s):
"""Parse a dict with two items exactly."""
s = s.strip()
assert s.startswith("{")
assert s.endswith("}")
a, b = s[1:-1].split(",")
key1, value1 = a.strip().split(":")
key2, value2 = b.strip().split(":")
return {key1.strip(): value1.strip(), key2.strip(): value2.strip()}

def parse_list(s):
"""Parse a list with two items exactly."""
s = s.strip()
assert s.startswith("[")
assert s.endswith("]")
a, b = s[1:-1].split(",")
return [a.strip(), b.strip()]

# Expected format is something like:
# [tuple(string, integer), tuple(string, dict(string: string)]
L = parse_list(s)
T0 = parse_tuple(L[0])
T1 = parse_tuple(L[1])
T0 = (T0[0], int(T0[1]))
T1 = (T1[0], parse_dict(T1[1]))
return [T0, T1]

That's a bit more work than eval, but I believe it is worth it.

--
Steven

Ant

unread,

Jul 7, 2006, 12:39:38 PM7/7/06

to

> [('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]
> [('recId', 5), ('parse', {'pos': u'np', 'gen': u'm'})]
> # line injected by a malicious user
> "__import__('os').system('echo if I were bad I could do worse')"
> [('recId', 7 ), ('parse', {'pos': u'np', 'gen': u'm'})]

I'm curious, if you disabled import, could you make eval safe?

For example:

>>> eval("__import__('os').system('echo if I were bad I could do worse')")

if I were bad I could do worse

0
>>> eval("__import__('os').system('echo if I were bad I could do worse')", {'__import__': lambda x:None})

Traceback (most recent call last):
File "<stdin>", line 1, in ?

File "<string>", line 0, in ?
AttributeError: 'NoneType' object has no attribute 'system'

So, it seems to be possible to disable access to imports, but is this
enough? Are there other ways to access modules, or do damage via
built-in commands?

It seems that there must be a way to use eval safely, as there are
plenty of apps that embed python as a scripting language - and what's
the point of an eval function if impossible to use safely, and you have
to write your own Python parser!!

Fredrik Lundh

unread,

Jul 7, 2006, 1:43:12 PM7/7/06

to pytho...@python.org

Ant wrote:

> It seems that there must be a way to use eval safely, as there are
> plenty of apps that embed python as a scripting language - and what's
> the point of an eval function if impossible to use safely, and you have
> to write your own Python parser!!

embedding python != accepting scripts from anywhere.

</F>

Fredrik Lundh

unread,

Jul 7, 2006, 1:57:02 PM7/7/06

to pytho...@python.org

Steven D'Aprano wrote:

> Personally, I would never use eval on any string I didn't write myself. If
> I was thinking about evaluating a user-string, I would always write a
> function to parse the string and accept only the specific sort of data I
> expected. In your case, a quick-and-dirty untested function might be:

for a more robust approach, you can use Python's tokenizer module,
together with the iterator-based approach described here:

http://online.effbot.org/2005_11_01_archive.htm#simple-parser-1

here's a (tested!) variant that handles lists and dictionaries as well:

import cStringIO, tokenize

def sequence(next, token, end):
out = []
token = next()
while token[1] != end:
out.append(atom(next, token))
token = next()
if token[1] == "," or token[1] == ":":
token = next()
return out

def atom(next, token):
if token[1] == "(":
return tuple(sequence(next, token, ")"))
elif token[1] == "[":
return sequence(next, token, "]")
elif token[1] == "{":
seq = sequence(next, token, "}")
res = {}
for i in range(0, len(seq), 2):
res[seq[i]] = seq[i+1]
return res
elif token[0] in (tokenize.STRING, tokenize.NUMBER):
return eval(token[1]) # safe use of eval!
raise SyntaxError("malformed expression (%s)" % token[1])

def simple_eval(source):
src = cStringIO.StringIO(source).readline
src = tokenize.generate_tokens(src)
src = (token for token in src if token[0] is not tokenize.NL)
res = atom(src.next, src.next())
if src.next()[0] is not tokenize.ENDMARKER:
raise SyntaxError("bogus data after expression")
return res

(now waiting for paul to post the obligatory pyparsing example).

</F>

Steven D'Aprano

unread,

Jul 7, 2006, 8:42:41 PM7/7/06

to

On Fri, 07 Jul 2006 19:57:02 +0200, Fredrik Lundh wrote:

> Steven D'Aprano wrote:
>
>> Personally, I would never use eval on any string I didn't write myself. If
>> I was thinking about evaluating a user-string, I would always write a
>> function to parse the string and accept only the specific sort of data I
>> expected. In your case, a quick-and-dirty untested function might be:
>
> for a more robust approach, you can use Python's tokenizer module,
> together with the iterator-based approach described here:
>
> http://online.effbot.org/2005_11_01_archive.htm#simple-parser-1
>
> here's a (tested!) variant that handles lists and dictionaries as well:

[snip code]

Thanks Fredrik, that's grand.

--
Steven.

Steven D'Aprano

unread,

Jul 7, 2006, 10:19:37 PM7/7/06

to

On Fri, 07 Jul 2006 09:39:38 -0700, Ant wrote:

>
>> [('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]
>> [('recId', 5), ('parse', {'pos': u'np', 'gen': u'm'})]
>> # line injected by a malicious user
>> "__import__('os').system('echo if I were bad I could do worse')"
>> [('recId', 7 ), ('parse', {'pos': u'np', 'gen': u'm'})]
>
> I'm curious, if you disabled import, could you make eval safe?

Safer, but possibly not safe.

> For example:
>
>>>> eval("__import__('os').system('echo if I were bad I could do worse')")
> if I were bad I could do worse
> 0
>>>> eval("__import__('os').system('echo if I were bad I could do worse')", {'__import__': lambda x:None})
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> File "<string>", line 0, in ?
> AttributeError: 'NoneType' object has no attribute 'system'
>
> So, it seems to be possible to disable access to imports, but is this
> enough? Are there other ways to access modules, or do damage via
> built-in commands?

Does your code already import os? Then there is no need for the import at
all.

eval("os.system('echo BOOM!')",{'__import__': lambda x:None})

Or, we can do this:

bomb = """eval("__import__('os').system('echo BOOM!')", __builtins__)"""
eval(bomb, {'__import__': None})

The obvious response is to block eval:

eval(bomb, {'__import__': None, 'eval': None})

Does this make it safe now? I don't know -- I've hunted around for ten
minutes trying to break it, and haven't, but that might just mean I'm not
enough of a hacker or thinking deviously enough. Possibly eval() is more
limited, and therefore "safer", than exec, but I wouldn't want to risk
real data on that assumption.

Of course, this approach only protects against one class of attacks.
Suppose Evil J. Cracker has write access to your file, and is happy enough
with just a denial of service attack:

[('recId', 3), ('parse', {'pos': u'np'*1024**4, 'gen': u'm'})]

Do you have a couple of terrabytes of free memory on your system?

Of course, if your code is only going to be used by *trusted* users, then
you don't have to worry about malicious attacks. You do have to worry
about accidental bugs though. What if one of the lines is missing a
delimiter or otherwise malformed? The call to eval() will fail, and your
code will halt. Is that what you want, or is it better to skip over the
bad data and continue? (A try...except... block could be useful here.)

Anyway, eval is a legitimate tool to use, although it is often over-kill
for the tasks people use it for. In the Original Poster's example, he
doesn't really want to evaluate an arbitrary Python expression, he wants
to evaluate a specific data structure.

> It seems that there must be a way to use eval safely,

"Must" does not mean "I wish there was".

> as there are
> plenty of apps that embed python as a scripting language -

As Fredrik points out, embedded Python isn't the same as running
untrusted code. The reality is, Python has not been designed for running
untrusted code safely. There was an attempt at a restricted-execution
module, but Guido decided to remove it -- see this thread here for his
reasoning:

http://mail.python.org/pipermail/python-dev/2002-December/031160.html

> and what's
> the point of an eval function if impossible to use safely, and you have
> to write your own Python parser!!

As for eval, it's a sledge-hammer. Sledge-hammers are legitimate tools,
for when you need one. eval is for evaluating arbitrary Python
expressions -- my rule of thumb (yours may be different) is that any time
I expect arbitrary data, eval is the right tool for the job, but if I
expect *specific* data, I use something else.

Imagine if the only way to get an integer was by calling eval on the
string -- I think we'd all agree that would be a bad move. Instead we have
a function which does nothing but convert strings (well, any object
really) to integers: int. It would be great if Python included tools to do
the same for dicts and lists, reducing the need for people to use a
sledge-hammer.

Anyway, my point was that you, the developer, have to weigh up the costs
and benefits of eval over a custom parser. The benefit is that eval is
already there, built-in and debugged. The costs are that it can be
insecure, and that it doesn't give you fine control over what data you
parse or how forgiving the parser is.

After that, the decision is yours.

--
Steven.

Sion Arrowsmith

unread,

Jul 10, 2006, 10:14:56 AM7/10/06

to

And also using eval (or exec or execfile) != accepting scripts from
anywhere. You've got to consider where the data can have come from
and what (broad) context it's being eval()'d in. Last time I did
something like this was with execfile for advanced configuration of
a server, and if a hostile party were in a position to inject
malicious code into *that* then subversion of our program would be
the least of anyone's concern.

--
\S -- si...@chiark.greenend.org.uk -- http://www.chaos.org.uk/~sion/
___ | "Frankly I have no feelings towards penguins one way or the other"
\X/ | -- Arthur C. Clarke
her nu becomeþ se bera eadward ofdun hlæddre heafdes bæce bump bump bump

Ant

unread,

Jul 10, 2006, 10:44:52 AM7/10/06

to

> As Fredrik points out, embedded Python isn't the same as running
> untrusted code. The reality is, Python has not been designed for running
> untrusted code safely.

So how do python app's typically embed python? For example things like
Zope and idle are scripted using Python - presumably they restrict the
execution of the scripts to a restricted set of modules/objects - but
how is this done?

Perhaps idle doesn't require safety from untrusted code, but surely
Zope does. So there must be some way of executing arbitrary untrusted
code in an app within some kind of sandbox...

Fredrik Lundh

unread,

Jul 10, 2006, 12:19:48 PM7/10/06

to pytho...@python.org

Ant wrote:

> So how do python app's typically embed python? For example things like
> Zope and idle are scripted using Python - presumably they restrict the
> execution of the scripts to a restricted set of modules/objects - but
> how is this done?

why? anyone capable of adding code to idle already has access to
everything that code can access...

> Perhaps idle doesn't require safety from untrusted code, but surely
> Zope does. So there must be some way of executing arbitrary untrusted
> code in an app within some kind of sandbox...

afaik, zope uses a custom parser.

</F>

Message has been deleted