Understanding unmatched parenthesis in read-string

194 views
Skip to first unread message

noahlz

unread,
Apr 29, 2013, 4:26:52 PM4/29/13
to clo...@googlegroups.com
(Disclaimer: I post this aware that read-string is considered dangerous for untrusted code and having starred tools.reader)

I was writing some code using read-string and encountered the following (somewhat odd?) behavior:

Clojure 1.5.1
user=> (read-string "1000N(")
1000N
user=> (read-string "1000N)")
1000N
user=> (read-string "(1000N")
RuntimeException EOF while reading  clojure.lang.Util.runtimeException (Util.java:219)

user=> (read-string ")1000N")
RuntimeException Unmatched delimiter: )  clojure.lang.Util.runtimeException (Util.java:219)


So if the string ends with an unmatched ) or (, the preceding value gets returned and the unmatched character discarded. But if the string starts with an unmatched parens - EOF (as expected). I was a little surprised as I expected the first to cases to throw some kind of RuntimeException.

What is the explanation for this behavior if any, and where can I go / read more about the underlying theory of "correctly" handling this case? I'm aware that lexical parsing is a big topic - just wondering what the ruling was here (if any) and looking for a jumping off point into further readings. Also if this was discussed elsewhere (searching "read-string unmatched paren" yielded nothing).

Thanks!

Weber, Martin S

unread,
Apr 29, 2013, 4:32:49 PM4/29/13
to clo...@googlegroups.com
user=> (doc read-string)
-------------------------
clojure.core/read-string
([s])
  Reads *one* object from the string s
nil
(emphasis on *one* by me)

one object from ":a(" = :a; ":a)" = :a; "( … " = fail; ")…" = fail. (remember whitespace in front of a paren doesn't matter)

Have fun.

--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

noahlz

unread,
Apr 29, 2013, 4:43:04 PM4/29/13
to clo...@googlegroups.com, martin...@nist.gov
Understood, but what I was wondering is why the trailing parenthesis is discarded / not considered part of the "object" expression?

Ben Wolfson

unread,
Apr 29, 2013, 4:44:13 PM4/29/13
to clo...@googlegroups.com, martin...@nist.gov
Because "1000N" is a complete expression, as you can verify with your REPL.


--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
Ben Wolfson
"Human kind has used its intelligence to vary the flavour of drinks, which may be sweet, aromatic, fermented or spirit-based. ... Family and social life also offer numerous other occasions to consume drinks for pleasure." [Larousse, "Drink" entry]

noahlz

unread,
Apr 29, 2013, 4:57:56 PM4/29/13
to clo...@googlegroups.com, martin...@nist.gov
Ok. The parser reads a single complete expression and discards the rest. It understands that once it has hit a new character that represents the beginning of a new expression, it doesn't care.

I suppose I thought the parser would raise an error on detecting an unmatched parenthesis, but that's wrong.

Interestingly, when I try this at the repl (1.5.1) it errors as I expected (probably why I expected it in the first place):

user=> 1000N)
1000N

RuntimeException Unmatched delimiter: )  clojure.lang.Util.runtimeException (Util.java:219)


Of course, the repl doesn't use "read-string." So, the next step in my journey is to investigate the source of clojure.main. But - someone wants to take the opportunity to ruin the surprise for me - with a more detailed explanation / theory discussion - I'm open to it :)

Ben Wolfson

unread,
Apr 29, 2013, 5:02:15 PM4/29/13
to clo...@googlegroups.com, martin...@nist.gov
On Mon, Apr 29, 2013 at 1:57 PM, noahlz <nzu...@gmail.com> wrote:
Ok. The parser reads a single complete expression and discards the rest. It understands that once it has hit a new character that represents the beginning of a new expression, it doesn't care.

I suppose I thought the parser would raise an error on detecting an unmatched parenthesis, but that's wrong.

Interestingly, when I try this at the repl (1.5.1) it errors as I expected (probably why I expected it in the first place):

user=> 1000N)
1000N

RuntimeException Unmatched delimiter: )  clojure.lang.Util.runtimeException (Util.java:219)


Of course, the repl doesn't use "read-string." So, the next step in my journey is to investigate the source of clojure.main. But - someone wants to take the opportunity to ruin the surprise for me - with a more detailed explanation / theory discussion - I'm open to it :)

But notice that this *also* returns 1000N, first. So the cases aren't that different: it managed to read a complete expression, and evaluated it. Then it tried to read *another* expression from the unconsumed input. The difference is only that read-string doesn't attempt to exhaust its input argument.

Cedric Greevey

unread,
Apr 29, 2013, 6:07:01 PM4/29/13
to clo...@googlegroups.com
On Mon, Apr 29, 2013 at 5:02 PM, Ben Wolfson <wol...@gmail.com> wrote:
On Mon, Apr 29, 2013 at 1:57 PM, noahlz <nzu...@gmail.com> wrote:
Ok. The parser reads a single complete expression and discards the rest. It understands that once it has hit a new character that represents the beginning of a new expression, it doesn't care.

I suppose I thought the parser would raise an error on detecting an unmatched parenthesis, but that's wrong.

Interestingly, when I try this at the repl (1.5.1) it errors as I expected (probably why I expected it in the first place):

user=> 1000N)
1000N

RuntimeException Unmatched delimiter: )  clojure.lang.Util.runtimeException (Util.java:219)


Of course, the repl doesn't use "read-string." So, the next step in my journey is to investigate the source of clojure.main. But - someone wants to take the opportunity to ruin the surprise for me - with a more detailed explanation / theory discussion - I'm open to it :)

But notice that this *also* returns 1000N, first. So the cases aren't that different: it managed to read a complete expression, and evaluated it. Then it tried to read *another* expression from the unconsumed input. The difference is only that read-string doesn't attempt to exhaust its input argument.

If you want to exhaust read-string's input argument, getting back a vector of all of the objects in the input and an error if any of them are syntactically invalid, just call (read-string (str "[" in-string "]")). This also deals with empty inputs in a non-blowing-up manner, returning an empty vector, which might allow uniform handling of the cases (empty? in-string) and (not (empty? in-string)) in some instances.

noahlz

unread,
Apr 29, 2013, 6:21:49 PM4/29/13
to clo...@googlegroups.com


On Monday, April 29, 2013 6:07:01 PM UTC-4, Cedric Greevey wrote:

If you want to exhaust read-string's input argument, getting back a vector of all of the objects in the input and an error if any of them are syntactically invalid, just call (read-string (str "[" in-string "]")). This also deals with empty inputs in a non-blowing-up manner, returning an empty vector, which might allow uniform handling of the cases (empty? in-string) and (not (empty? in-string)) in some instances.


Interestingly, my code already contains something like the following:

(let [expr (-> (str "(" input ")") read-string)] ...)

It felt wrong when I wrote this, but it seems like I was in the right track? I'm guessing vectors are safer than lists for passing to eval?

 

Ben Wolfson

unread,
Apr 29, 2013, 6:23:53 PM4/29/13
to clo...@googlegroups.com
On Mon, Apr 29, 2013 at 3:21 PM, noahlz <nzu...@gmail.com> wrote:
 I'm guessing vectors are safer than lists for passing to eval?

They're equally unsafe.
Reply all
Reply to author
Forward
0 new messages