*read-eval* defaulting to false

1,117 views
Skip to first unread message

Andy Fingerhut

unread,
Feb 1, 2013, 1:29:35 AM2/1/13
to cloju...@googlegroups.com
A new ticket has gone from non-existent to garnering the most votes in JIRA in about 12 hours. There is a link to the Clojure group discussion thread in the ticket:

http://dev.clojure.org/jira/browse/CLJ-1153

One of Chas's messages in that thread suggests that the currently attached patch isn't enough due to some subtleties of how functions are printed in core_print.clj.

Andy

Stuart Halloway

unread,
Feb 1, 2013, 8:23:15 AM2/1/13
to cloju...@googlegroups.com
I am in favor of changing the default, haven't looked at the implementation issues.

Please weigh in on shipping 1.5 so we can turn our attention to enhancements again.



Andy

--
You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure-dev...@googlegroups.com.
To post to this group, send email to cloju...@googlegroups.com.
Visit this group at http://groups.google.com/group/clojure-dev?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.



Chas Emerick

unread,
Feb 1, 2013, 8:25:58 AM2/1/13
to cloju...@googlegroups.com
It's quite a rabbit hole, really. I'll be whacking away at it again next week, but I've blown my budget for now.

1.5.0 is clearly well along (RC4, I think?), but I personally would be happy to have the final release held off to address *read-eval*'s default. I have no visibility into what drives release schedules, etc. (and I'm only tenuously aware of what a proper "fix" will require), but recent (regular) events elsewhere make me think that virtually no one will mind waiting for a release to be stamped final in order to address what is a widening arbitrary code execution vulnerability given the utility of and growing emphasis on using e.g. edn for services. Such questions don't generally percolate down to the language level (the Javabean Expression metasploit being a recent exception), but in our case, the language defines and implements the interchange format (as opposed to e.g. YAML libraries elsewhere).

In the end, I think it's important for us to collectively demonstrate (and have) a security-minded approach. I am happy to contribute in any way that is helpful in resolving this (including producing a comprehensive patch, if people are willing to be patient ;-).

Thanks,

- Chas

Andy Fingerhut

unread,
Feb 1, 2013, 8:34:08 AM2/1/13
to cloju...@googlegroups.com
Sorry if I gave the impression that I thought this new ticket needed to go into 1.5.  If it doesn't go into 1.5, I'd guess it will get attention for the next release after that.

I have no objection on shipping 1.5.0-RC4 as 1.5, but I don't have any production code at stake.  What little stuff I have is working well with 1.5.0-RC4.

Sean Corfield has mentioned that he tests his company's production code with latest master every so often to make sure nothing in it will break his code, and he's given it a thumbs up.  Can you solicit feedback from other companies with Clojure code in production (e.g. Prismatic)?

Thanks,
Andy

Paul Stadig

unread,
Feb 1, 2013, 8:52:48 AM2/1/13
to cloju...@googlegroups.com
I'll try running the Sonian code base against 1.5.0-RC4 and report back.

I think I'd be in favor of delaying a 1.5 final until there is some
resolution to the *read-eval* issue.

Chas, do you have an order of magnitude for the changes that are
required? one hour, one day, one week? Perhaps we can apply more
resources to it? I'd be willing to help.


Paul

Paul Stadig

unread,
Feb 1, 2013, 9:40:29 AM2/1/13
to cloju...@googlegroups.com
Our tests ran without any issues.

Paul

Chas Emerick

unread,
Feb 1, 2013, 9:54:15 AM2/1/13
to cloju...@googlegroups.com
I wouldn't recommend trusting any of my estimates, since I only see things dimly in Compiler.java. The best I could suggest, if pressed, would be 'days', not 'weeks'. Someone that's done more compiler hacking might be able to be more firm about it.

I honestly don't want to dig back into the semantics of enhancements vs. bugs/defects and how that affects Clojure release plans...but (there's always a 'but' ;-), I think the issue type (though not the priority) currently set on CLJ-1153 is definitionally correct, unless one thinks e.g. my risk assessment upthread and elsewhere is wildly exaggerated.

- Chas

Mike Drogalis

unread,
Feb 1, 2013, 10:34:01 AM2/1/13
to cloju...@googlegroups.com
I'm in favor of waiting to get *read-eval* squared away for 1.5. Here's why. Anyone who's already using edge versions of 1.5 is probably already aware of the risk since they would seem more active in the community. In addition, there's little difference between using 1.5 RC4 and what will be 1.5. So those people are probably content for a while longer.

Major releases bring "everyone else" into the scene. The time between major releases for Clojure is pretty lengthy. Do we want to wait that long before we introduce the fix?

Softaddicts

unread,
Feb 1, 2013, 11:37:23 AM2/1/13
to cloju...@googlegroups.com
Since 2009, we hopped over major releases 1.0 (?!?!), 1.2 and 1.4.

It takes a significant effort to retest our software (it runs 24/7 in an
hospital) and we decide to go in according to the new features in the new release
coming by versus the needs we want to address in the near future.

When we jumped from 1.1 to 1.3, it had 0 impact on our code (1.1), we merely stretched
out for the new features we needed like protocols which appeared in 1.2.
We had started dev in 1.2 but mostly to prototype code. 1.3 was the final target.

We are starting to work on migrating to 1.5 but it will not be in production
before late spring/early summer. We have a need for the new reducer library
and this is our primary motivation in moving forward.

We have to deal with timestamp literals and a few other issues from the 1.4
changes. These changes by themselves require a full test round and some simulation
time before moving this to prod. They impact our core.

Presently, we load code dynamically in several places and the semantic of read-eval is
essential to us whatever it will end up being. Postponing this is moves the problem
away for a year but it means another full round of tests. It's not something that looks
to us as an incremental hop.

If this can be fixed in the final 1.5 release, it would be welcomed even at the expense
of waiting a bit. Otherwise it will require another round of core software modifications
and full testing to jump to 1.6 (unlikely) or 1.7 in a 18 months or so.

We would rather take the hit now :)

Luc P.


> I'm in favor of waiting to get *read-eval* squared away for 1.5. Here's
> why. Anyone who's already using edge versions of 1.5 is probably already
> aware of the risk since they would seem more active in the community. In
> addition, there's little difference between using 1.5 RC4 and what will be
> 1.5. So those people are probably content for a while longer.
>
> Major releases bring "everyone else" into the scene. The time between major
> releases for Clojure is pretty lengthy. Do we want to wait that long before
> we introduce the fix?
>
>
> On Fri, Feb 1, 2013 at 9:54 AM, Chas Emerick <ch...@cemerick.com> wrote:
>
> > I wouldn't recommend trusting any of my estimates, since I only see things
> > dimly in Compiler.java. The best I could suggest, if pressed, would be
> > 'days', not 'weeks'. Someone that's done more compiler hacking might be
> > able to be more firm about it.
> >
> > I honestly don't want to dig back into the semantics of enhancements
Softaddicts<lprefo...@softaddicts.ca> sent by ibisMail from my ipad!

Alex Baranosky

unread,
Feb 1, 2013, 12:09:39 PM2/1/13
to cloju...@googlegroups.com
I can say with certainty that the dev team at Runa wouldn't mind waiting for this to be included in Clojure 1.5

Alex

Paul deGrandis

unread,
Feb 1, 2013, 1:21:32 PM2/1/13
to cloju...@googlegroups.com
I've tested all my deployments with the 1.5 RCs (some even depend on them).

I definitely don't mind waiting a little longer to get this rolled in.  Makes 1.5 that much more compelling - better performance AND better security.

Paul

Chas Emerick

unread,
Feb 1, 2013, 2:05:08 PM2/1/13
to cloju...@googlegroups.com
I have added a patch to CLJ-1153 that appears to address the *read-eval* problem:

http://dev.clojure.org/jira/browse/CLJ-1153?focusedCommentId=30523#comment-30523

on github: https://github.com/cemerick/clojure/commit/1f5c19c07443d2535ede4ff71d23b40c195d617f

artifact on Clojars: [com.cemerick/clojure "1.5.0-SNAPSHOT"]

It tests well for me, but needs to be exercised as much as possible. The Leiningen dependency above is 1.5.0-RC4 + the patch on the ticket. Note that you'll need to set your project's global :exclusions to [org.clojure/clojure] in order for the com.cemerick/clojure artifact to supersede it.

Also, if any Clojure compiler / asm experts are looking on, your eyeballs are needed. Take a look at what I've done, and comment on the issue or here (or on the regular Clojure list or in #clojure irc if you can't post here).

If no one raises any alarms after a while, I'll go post on the main Clojure ML and hope that a much broader audience will attempt to find (unexpected) breakage.

Cheers,

- Chas

Paul Stadig

unread,
Feb 1, 2013, 3:11:22 PM2/1/13
to cloju...@googlegroups.com
On Fri, Feb 1, 2013 at 2:05 PM, Chas Emerick <ch...@cemerick.com> wrote:
> Also, if any Clojure compiler / asm experts are looking on, your eyeballs are needed. Take a look at what I've done, and comment on the issue or here (or on the regular Clojure list or in #clojure irc if you can't post here).

Can't say I would qualify as an expert, but I looked over the changes
and they seem pretty clear.

I'm running our tests against the changes and it's definitely a
breaking change. We may be a strange use case, since we have an
encrypted storage format that we use that includes print-dup'ed
metadata. So we definitely have to change our code. Anywhere we are
binding *print-dup* to true to write output we have to bind
*read-eval* to true when reading back the input. Totally workable.

This will be good though, because I just ran our tests against
1.5.0-RC4, so I can verify how this one change effects our code.

Will report back later.


Paul

Allen Rohner

unread,
Feb 1, 2013, 3:24:55 PM2/1/13
to cloju...@googlegroups.com
At CircleCI (22k LoClojure), we've been running (alter-var-root *read-eval* (constantly false)), with only minor code changes required. 

+1 on blocking the release for this patch. If someone is held up, 1.5-RC4 is always available.  

Steve Miner

unread,
Feb 1, 2013, 3:25:09 PM2/1/13
to cloju...@googlegroups.com
FWIW, I tested [com.cemerick/clojure "1.5.0-SNAPSHOT"] with my main project and it worked fine. I don't depend on *read-eval* or *print-dup* so I'm not a great test case.

Thanks for working on this patch.

Paul Stadig

unread,
Feb 1, 2013, 3:45:43 PM2/1/13
to cloju...@googlegroups.com
After making a few small changes to explicitly bind *read-eval* to
true when we're reading in clojure data, all of our tests pass. We use
*print-dup* pretty heavily, and none of the data we *print-dup* and
*read-eval* is user generated. We may not even have to *print-dup*
anymore now that we have instant literals and such, but we have legacy
data that we'd have to read.

Anyway, our tests passed with Chas' patch. I'll add a note to the Jira
ticket. I won't say our tests are comprehensive, but we deal with JDBC
and lots of different datatypes (BigInts, strings, dates, and such).


Paul

On Fri, Feb 1, 2013 at 3:25 PM, Steve Miner <steve...@gmail.com> wrote:
> FWIW, I tested [com.cemerick/clojure "1.5.0-SNAPSHOT"] with my main project and it worked fine. I don't depend on *read-eval* or *print-dup* so I'm not a great test case.
>
> Thanks for working on this patch.
>

Jason Wolfe

unread,
Feb 2, 2013, 1:52:31 AM2/2/13
to cloju...@googlegroups.com
Hi all,

I just bumped us to 1.5.0-RC4 at Prismatic and ran our test suite.

By far the biggest issue we ran into is that the semantics of boxing seems to have changed in an unexpected way.

user> [(class (hash "test")) (class (System/identityHashCode "foo")) (class (rand-int 100))]
[java.lang.Integer java.lang.Integer java.lang.Integer]

which I believe used to return [Long Long Long].  This broke many of our tests, and I'd imagine will lead to further unexpected issues since this has been a particularly nasty issue to track down in the past and often manifests well downstream from the producer of a boxed Integer that was expected to be a Long.

I haven't been following things enough to know if this is an intended change, but unless there are big benefits to it I think we'd prefer the status quo.

Other than that, there were a few other minor test failures of the sanity-check variant:
 - SerialVersionID for persistent vector has changed.  Not unexpected, but something to note if people are using Java serialization for Clojure types (obviously not a great idea).
 - (array-map :a 1 :a 1) no longer throws an error.  Again not a big deal, we'll just have to manually check for this in the place we were relying on (and testing for) Clojure's built-in sanity check.

Happy to answer any questions or discuss the above further.  And we're excited for 1.5!

Cheers, 
Jason

Jason Wolfe

unread,
Feb 2, 2013, 2:03:11 AM2/2/13
to cloju...@googlegroups.com


On Friday, February 1, 2013 10:52:31 PM UTC-8, Jason Wolfe wrote:
Hi all,

I just bumped us to 1.5.0-RC4 at Prismatic and ran our test suite.

By far the biggest issue we ran into is that the semantics of boxing seems to have changed in an unexpected way.

user> [(class (hash "test")) (class (System/identityHashCode "foo")) (class (rand-int 100))]
[java.lang.Integer java.lang.Integer java.lang.Integer]

which I believe used to return [Long Long Long].  This broke many of our tests, and I'd imagine will lead to further unexpected issues since this has been a particularly nasty issue to track down in the past and often manifests well downstream from the producer of a boxed Integer that was expected to be a Long.

I haven't been following things enough to know if this is an intended change, but unless there are big benefits to it I think we'd prefer the status quo.

I just checked and I see this is also the behavior in 1.4, so I suppose this ship has sailed some time ago. (We're running a patched version of 1.3 currently).  Sorry for the noise...

Chas Emerick

unread,
Feb 2, 2013, 8:43:08 AM2/2/13
to cloju...@googlegroups.com
On Feb 2, 2013, at 2:03 AM, Jason Wolfe wrote:



On Friday, February 1, 2013 10:52:31 PM UTC-8, Jason Wolfe wrote:
Hi all,

I just bumped us to 1.5.0-RC4 at Prismatic and ran our test suite.

By far the biggest issue we ran into is that the semantics of boxing seems to have changed in an unexpected way.

user> [(class (hash "test")) (class (System/identityHashCode "foo")) (class (rand-int 100))]
[java.lang.Integer java.lang.Integer java.lang.Integer]

which I believe used to return [Long Long Long].  This broke many of our tests, and I'd imagine will lead to further unexpected issues since this has been a particularly nasty issue to track down in the past and often manifests well downstream from the producer of a boxed Integer that was expected to be a Long.

I haven't been following things enough to know if this is an intended change, but unless there are big benefits to it I think we'd prefer the status quo.

I just checked and I see this is also the behavior in 1.4, so I suppose this ship has sailed some time ago. (We're running a patched version of 1.3 currently).  Sorry for the noise...

Yeah, integers are no longer promoted when boxed.  Here's Rich's message from when that behaviour changed for 1.4:


I won't try to summarize the thread any further. :-)

- Chas

Andy Fingerhut

unread,
Feb 2, 2013, 10:38:20 AM2/2/13
to cloju...@googlegroups.com

On Feb 1, 2013, at 10:52 PM, Jason Wolfe wrote:

> Hi all,
>
> I just bumped us to 1.5.0-RC4 at Prismatic and ran our test suite.
>
> By far the biggest issue we ran into is that the semantics of boxing seems to have changed in an unexpected way.
>
> user> [(class (hash "test")) (class (System/identityHashCode "foo")) (class (rand-int 100))]
> [java.lang.Integer java.lang.Integer java.lang.Integer]
>
> which I believe used to return [Long Long Long]. This broke many of our tests, and I'd imagine will lead to further unexpected issues since this has been a particularly nasty issue to track down in the past and often manifests well downstream from the producer of a boxed Integer that was expected to be a Long.
>
> I haven't been following things enough to know if this is an intended change, but unless there are big benefits to it I think we'd prefer the status quo.
>
> Other than that, there were a few other minor test failures of the sanity-check variant:
> - SerialVersionID for persistent vector has changed. Not unexpected, but something to note if people are using Java serialization for Clojure types (obviously not a great idea).
> - (array-map :a 1 :a 1) no longer throws an error. Again not a big deal, we'll just have to manually check for this in the place we were relying on (and testing for) Clojure's built-in sanity check.

This change is new in 1.5, and wasn't in 1.4. Here is a wiki page with some discussion of the change, and a link to a discussion thread on the Clojure Google group that motivated it:

http://dev.clojure.org/display/design/Allow+duplicate+map+keys+and+set+elements

Current state is that a literal like {:a 1 :a 1} throws an exception, but constructors like array-map explicitly allow duplicates and work as if by repeated uses of assoc.

Marshall Bockrath-Vandegrift

unread,
Feb 2, 2013, 10:43:57 AM2/2/13
to cloju...@googlegroups.com
I started trying the patched Clojure 1.5.0 on some of our code at
Damballa, and unfortunately ran into problems relatively quickly. I
hadn't examined the patch itself prior to testing, and it wasn't
obvious to me that this change would default *read-eval* to false for
reading actual Clojure code during normal Clojure compilation. But
that appears to be what's happening. We've been using the read-eval
reader macro in code to inject literals for primitive/array types for
defining multimethods and protocols. For example:

(defmethod example-method #=(java.lang.Class/forName "[B")
...)

Under the patch, this throws the obvious exception for the obvious
reason. Is the intent to make *read-eval* false while performing
normal code-loading? And if so, is there an obvious work-around for
these scenarios?

-Marshall

On Fri, Feb 1, 2013 at 2:05 PM, Chas Emerick <ch...@cemerick.com> wrote:

Chas Emerick

unread,
Feb 2, 2013, 10:59:20 AM2/2/13
to cloju...@googlegroups.com
On Feb 2, 2013, at 10:43 AM, Marshall Bockrath-Vandegrift wrote:

> I started trying the patched Clojure 1.5.0 on some of our code at
> Damballa, and unfortunately ran into problems relatively quickly. I
> hadn't examined the patch itself prior to testing, and it wasn't
> obvious to me that this change would default *read-eval* to false for
> reading actual Clojure code during normal Clojure compilation. But
> that appears to be what's happening. We've been using the read-eval
> reader macro in code to inject literals for primitive/array types for
> defining multimethods and protocols. For example:
>
> (defmethod example-method #=(java.lang.Class/forName "[B")
> ...)
>
> Under the patch, this throws the obvious exception for the obvious
> reason. Is the intent to make *read-eval* false while performing
> normal code-loading? And if so, is there an obvious work-around for
> these scenarios?
>
> -Marshall

Unless someone has something really clever up their sleeve, this is inevitable. The power of eval is that it is equivalent to (and reuses) all of the machinery used for "normal" code loading.

But, you don't need #= for multimethods or protocols:

=> (defmulti foo type)
#'user/foo
=> (defmethod foo :default [x] (class x))
#<MultiFn clojure.lang.MultiFn@2b06c17b>
=> (defmethod foo
(java.lang.Class/forName "[B")
[x]
(class x))
#<MultiFn clojure.lang.MultiFn@2b06c17b>
=> (foo "hi")
java.lang.String
=> (foo (.getBytes "hi"))
[B

=> (defprotocol Bar
(bar [x]))
Bar
=> (extend (Class/forName "[B")
Bar
{:bar class})
nil
=> (bar (.getBytes "hi"))
[B

FWIW, I've never seen #= positively documented (i.e. it's always been discussed as "there's this thing, use it if you want, but you really shouldn't"). I don't believe it's needed to use any other feature of the language.

Thanks for testing! I hope the alternative usage above is suitable for you.

Cheers,

- Chas

Marshall Bockrath-Vandegrift

unread,
Feb 2, 2013, 11:34:09 AM2/2/13
to cloju...@googlegroups.com
On Sat, Feb 2, 2013 at 10:59 AM, Chas Emerick <ch...@cemerick.com> wrote:

> Unless someone has something really clever up their sleeve, this is inevitable. The power of eval is that it is equivalent to (and reuses) all of the machinery used for "normal" code loading.

What about just binding *read-eval* to true in clojure.core/load for
the scope of the call to clojure.lang.RT/load?

> But, you don't need #= for multimethods or protocols:

Oh, I see what happened. It *is* necessary for use in
extend-protocol, because that macro partitions its argument forms by
looking for non-lists. The only way to extend-protocol to a
primitive/array type is to use #=. The pattern was then just
duplicated in our code to places where it isn't strictly necessary.

> FWIW, I've never seen #= positively documented (i.e. it's always been discussed as "there's this thing, use it if you want, but you really shouldn't"). I don't believe it's needed to use any other feature of the language.

This is true, but I think there's value to a kind of "universal escape
hatch" for constructing forms intended for the compiler.

> Thanks for testing! I hope the alternative usage above is suitable for you.

Avoiding or extending extend-protocol isn't a big issue, but I would
like to explore the suggestion I proposed above.

Thanks!

-Marshall

Michał Marczyk

unread,
Feb 2, 2013, 12:10:16 PM2/2/13
to cloju...@googlegroups.com
On 2 February 2013 17:34, Marshall Bockrath-Vandegrift
<lla...@gmail.com> wrote:
> Oh, I see what happened. It *is* necessary for use in
> extend-protocol, because that macro partitions its argument forms by
> looking for non-lists. The only way to extend-protocol to a
> primitive/array type is to use #=. The pattern was then just
> duplicated in our code to places where it isn't strictly necessary.

extend-protocol expands to a sequence of calls to extend-type, which
itself expands to a sequence of calls to extend, which is a function.
extend-type and extend work fine for extending to types specified by
complex expressions:

user=> (defprotocol PFoo (foo [x]))
PFoo
user=> (extend Object PFoo {:foo (fn [x] :default)})
nil
user=> (extend (class (int-array 0)) PFoo {:foo (fn [x] :ints)})
nil
user=> (foo [])
:default
user=> (foo (int-array 0))
:ints
user=> (extend-type (class (double-array 0)) PFoo (foo [x] :doubles))
nil
user=> (foo (double-array 0))
:doubles

Cheers,
Michał

Marshall Bockrath-Vandegrift

unread,
Feb 2, 2013, 12:31:47 PM2/2/13
to cloju...@googlegroups.com
On Sat, Feb 2, 2013 at 12:10 PM, Michał Marczyk
<michal....@gmail.com> wrote:

> extend-protocol expands to a sequence of calls to extend-type, which
> itself expands to a sequence of calls to extend, which is a function.
> extend-type and extend work fine for extending to types specified by
> complex expressions:

Right, and thanks for clarifying. I certainly didn't mean to imply
otherwise -- just that specifically the extend-protocol macro
identifies type-forms by their non-list-ness, and thus cannot be used
with expressions evaluating to types without using #=.

-Marshall

Chas Emerick

unread,
Feb 2, 2013, 3:56:09 PM2/2/13
to cloju...@googlegroups.com
On Feb 2, 2013, at 11:34 AM, Marshall Bockrath-Vandegrift wrote:

> On Sat, Feb 2, 2013 at 10:59 AM, Chas Emerick <ch...@cemerick.com> wrote:
>
>> Unless someone has something really clever up their sleeve, this is inevitable. The power of eval is that it is equivalent to (and reuses) all of the machinery used for "normal" code loading.
>
> What about just binding *read-eval* to true in clojure.core/load for
> the scope of the call to clojure.lang.RT/load?

Perhaps binding *read-eval* within RT/load would maintain the current vulnerability, but perhaps not; I haven't evaluated that in particular. In either case, I don't think that complicating things with different defaults depending on code loading modalities (i.e. RT/load vs. (eval (read ...))) would do anyone any favors. For example, that would make it impossible to evaluate e.g. an extend-protocol form that contained #= in a REPL, or via e.g. Cmd+Shift+X in CCW or C-x C-e in SLIME.

The semantics of #= are fairly difficult to understand fully, which I presume is one reason it hasn't been widely announced/discussed/documented. Adding more special cases isn't the answer IMO.

>> FWIW, I've never seen #= positively documented (i.e. it's always been discussed as "there's this thing, use it if you want, but you really shouldn't"). I don't believe it's needed to use any other feature of the language.
>
> This is true, but I think there's value to a kind of "universal escape
> hatch" for constructing forms intended for the compiler.

Macros are that escape hatch, definitionally so. (e.g. extend-protocol could be made to accept strings to specify types that need to be resolved via Class/forName, as is currently done with type hints; I think that'd be a very welcome enhancement ticket, if it doesn't exist already.) In contrast, #= is a deformation of the general understanding of the respective responsibilities of `read` and `eval`; a complecting form, you might say.

Cheers,

- Chas

Stuart Halloway

unread,
Feb 2, 2013, 4:32:12 PM2/2/13
to cloju...@googlegroups.com
I am eager to see *read-eval* changed, but not at the expense of delaying 1.5.

Alpha:  feature changes
Beta:   bug fixes only
RC:     lockdown before release, regressions and critical bug fixes only

The sooner we release 1.5, the sooner we can have a 1.6 alpha with *read-eval* behavior changed.


Cosmin Stejerean

unread,
Feb 2, 2013, 5:01:35 PM2/2/13
to cloju...@googlegroups.com
This type of security fix shouldn't be delayed by whatever improvements would otherwise go into 1.6. If 1.5 goes out without it then 1.5.1 should go out immediately after with the *read-eval* fix.

--
Cosmin Stejerean
http://offbytwo.com

Chouser

unread,
Feb 2, 2013, 5:22:31 PM2/2/13
to cloju...@googlegroups.com
On Sat, Feb 2, 2013 at 3:56 PM, Chas Emerick <ch...@cemerick.com> wrote:
>
> On Feb 2, 2013, at 11:34 AM, Marshall Bockrath-Vandegrift wrote:
>
> > On Sat, Feb 2, 2013 at 10:59 AM, Chas Emerick <ch...@cemerick.com> wrote:
> >
> >> Unless someone has something really clever up their sleeve, this is inevitable. The power of eval is that it is equivalent to (and reuses) all of the machinery used for "normal" code loading.
> >
> > What about just binding *read-eval* to true in clojure.core/load for
> > the scope of the call to clojure.lang.RT/load?
>
> Perhaps binding *read-eval* within RT/load would maintain the current vulnerability, but perhaps not; I haven't evaluated that in particular. In either case, I don't think that complicating things with different defaults depending on code loading modalities (i.e. RT/load vs. (eval (read ...))) would do anyone any favors. For example, that would make it impossible to evaluate e.g. an extend-protocol form that contained #= in a REPL, or via e.g. Cmd+Shift+X in CCW or C-x C-e in SLIME.

RT/load already evals after it reads, so a literal #= in there during
read time grants no more power than it would already exercise. In
short, RT/load or the functions that use it (c.c/load, c.c/load-file,
for starters) should be documented as unsafe for untrusted input. At
that point I see no harm in having them default to *read-eval* true.

c.c/read and c.c/read-string, on the other hand, are natural to use
for data interchange, and I agree defaulting *read-eval* to false in
those cases would be better.

I don't understand what you were saying about #= at the REPL etc. If
read has *read-eval* defaulting to false and an IDE or REPL would like
to assume the developer's input is safe, it needs a way to bind
*read-eval* to true. Surely that's not impossible.

> The semantics of #= are fairly difficult to understand fully, which I presume is one reason it hasn't been widely announced/discussed/documented. Adding more special cases isn't the answer IMO.

The semantics of the expression inside #= may be unusual and
undocumented, but the behaviour of *read-eval* is simple and
documented. I think what we want to do here is adjust the default to
be the best balance between safety and convenience.

> >> FWIW, I've never seen #= positively documented (i.e. it's always been discussed as "there's this thing, use it if you want, but you really shouldn't"). I don't believe it's needed to use any other feature of the language.
> >
> > This is true, but I think there's value to a kind of "universal escape
> > hatch" for constructing forms intended for the compiler.
>
> Macros are that escape hatch, definitionally so. (e.g. extend-protocol could be made to accept strings to specify types that need to be resolved via Class/forName, as is currently done with type hints; I think that'd be a very welcome enhancement ticket, if it doesn't exist already.) In contrast, #= is a deformation of the general understanding of the respective responsibilities of `read` and `eval`; a complecting form, you might say.

Macros are an escape hatch only after you've gone through a complete
compile and eval. #= was added because it was needed -- you're
suggesting that tagged literals and other bits of new compiler code
would be enough to make the compiler know longer require #= at all?

If that's the case, maybe we should actually remove it completely. If
it's in there but seldom used, it seems likely to be a future security
problem waiting to happen.

Stuart Halloway

unread,
Feb 2, 2013, 5:28:04 PM2/2/13
to cloju...@googlegroups.com
While I agree with the desire to flip *read-eval*'s default, I see this as an enhancement, not an urgent security fix.  Apps can be secure today, by following the documentation.  It would be difficult to argue that this snuck up on the community and became urgent during the 1.5 RC period.

That said, it would be nice to make sure that the build box is configured to do point releases from a branch, so that there can be 1.5.1 and 1.6 releases in parallel if necessary.

Stu


Paul Stadig

unread,
Feb 2, 2013, 6:18:07 PM2/2/13
to cloju...@googlegroups.com
On Sat, Feb 2, 2013 at 5:22 PM, Chouser <cho...@n01se.net> wrote:
> RT/load already evals after it reads, so a literal #= in there during
> read time grants no more power than it would already exercise. In
> short, RT/load or the functions that use it (c.c/load, c.c/load-file,
> for starters) should be documented as unsafe for untrusted input. At
> that point I see no harm in having them default to *read-eval* true.
>
> c.c/read and c.c/read-string, on the other hand, are natural to use
> for data interchange, and I agree defaulting *read-eval* to false in
> those cases would be better.

Given that *read-eval* is dynamically bound, I'm unclear how this
would really work. Would load bind *read-eval* to true? So someone
calling load would not be able to effect that, because if they bind
*read-eval* before calling load, then load is push going to push its
own binding. I don't think you'd want to alter the var root, since
that wouldn't play nice with multiple threads.

Same problem for read-string binding *read-eval* to false. If someone
did want to effect the binding they could not bind *read-eval* before
calling read-string. I guess *read-eval* would really have to become
an argument to load/read-string.

> #= was added because it was needed -- you're
> suggesting that tagged literals and other bits of new compiler code
> would be enough to make the compiler know longer require #= at all?
>
> If that's the case, maybe we should actually remove it completely. If
> it's in there but seldom used, it seems likely to be a future security
> problem waiting to happen.

Obviously *read-eval* can be used to execute arbitrary code, but I've
mostly understood and seen *read-eval* used for reading in specific
data types (i.e. #=(Integer. "1") or
#=(clojure.lang.PersistentArrayMap/create {1 2, 3 4})), in which case
tagged literals could certainly replace that use case.


Paul

Nelson Morris

unread,
Feb 2, 2013, 6:47:31 PM2/2/13
to cloju...@googlegroups.com
On Sat, Feb 2, 2013 at 4:28 PM, Stuart Halloway
<stuart....@gmail.com> wrote:
> .. Apps can be secure today, by
> following the documentation. ...

I remember being surprised when #= was mentioned in a talk at conj
2011. I tried to find the documentation today. I opened a repl with
1.5.0-RC4 and found:

user> (clojure.repl/doc read)
-------------------------
clojure.core/read
([] [stream] [stream eof-error? eof-value] [stream eof-error?
eof-value recursive?])
Reads the next object from stream, which must be an instance of
java.io.PushbackReader or some derivee. stream defaults to the
current value of *in* .
nil
user> (clojure.repl/doc read-string)
-------------------------
clojure.core/read-string
([s])
Reads one object from the string s
nil

I don't see #= mentioned on http://clojure.org/reader.
http://clojure.org/evaluation does not mention it. The enumeration of
evaluation contexts *almost* seems to be implying it shouldn't happen
elsewhere.

I find it mentioned under (clojure.repl/doc *read-eval*) but that
presupposes I know read/read-string looks at *read-eval*.

Is there someplace I have missed where I could have learned that
reading has this behaviour?

Cosmin Stejerean

unread,
Feb 4, 2013, 8:05:54 AM2/4/13
to cloju...@googlegroups.com

On Sat, Feb 2, 2013 at 4:28 PM, Stuart Halloway <stuart....@gmail.com> wrote:
I see this as an enhancement, not an urgent security fix.  Apps can be secure today, by following the documentation.  It would be difficult to argue that this snuck up on the community and became urgent during the 1.5 RC period.

In light of EDN this issue would be urgent even if adequate warnings were present in the documentation. There are no useful warnings however anywhere in the documentation. Even the doc string of *read-eval*, should someone happen to run into it, doesn't make the security implications obvious.

Michael Fogus

unread,
Feb 4, 2013, 8:17:27 AM2/4/13
to cloju...@googlegroups.com
> It would be difficult to argue that this snuck
> up on the community and became urgent during the 1.5 RC period.

Someone please correct me if I'm wrong, but the issue is not that it
snuck up on the community, but that it could sneak up on people who do
not know about it and have no easy way to find out about it. I
personally could wait for a 1.5 release, but then again I have no code
that relies on *read-eval*, so maybe I'm a bit cavalier about it. I
could live with a 1.5 release with the clojure.org docs updated to
address *read-eval* (and in general, which I'd be willing to help
with) as well as the relevant docstrings.

:F

Chouser

unread,
Feb 4, 2013, 8:58:52 AM2/4/13
to cloju...@googlegroups.com
On Mon, Feb 4, 2013 at 8:17 AM, Michael Fogus <mef...@gmail.com> wrote:
> I could live with a 1.5 release with the clojure.org docs updated to
> address *read-eval* (and in general, which I'd be willing to help
> with) as well as the relevant docstrings.

That sounds like a pretty good compromise -- would the powers that be
accept a patch to the read and read-string docstrings for 1.5 to warn
about *read-eval* and possibly even mention that the default may
change in a future release? That would give us time to decide the best
way to fix it and get the fix into 1.6 (or 1.5.1).

--Chouser

Chas Emerick

unread,
Feb 4, 2013, 9:04:34 AM2/4/13
to cloju...@googlegroups.com
Right: no one on this list needs this fix (although, accidents that shouldn't be possible can happen). The point is that the 90% of Clojure devs that aren't aware of implementation details are currently exposed. It's true that they've been exposed now for years, but that will be small comfort to them when sites and services are compromised (if it hasn't happened already). That's on us, whatever the vagaries of the Clojure release process.

I'm sorry that I didn't push this issue earlier for 1.5, or for 1.4, or for 1.3. As I've said, my concentrated interest is effectively new, for better or worse. I'm also sorry that no one else pushed the issue in this or previous releases. But, the history of a vulnerability is irrelevant to whether it should be addressed. We have been all been collectively recommending "use Clojure data for easy interchange" for years, and the pitch of that recommendation has only been growing, especially with the canonicalization of 'edn', readers and printers for different languages cropping up, etc. Meanwhile, #=, *read-eval*, and their security implications are narrowly known. We're all aware of recent events elsewhere; maybe changing a couple of docstrings here or there will make us feel better somehow, but it really is no better than doing nothing.

Further on that, the fact that #= is undocumented has been cited as being a point in favor for changing the default of *read-eval*, i.e. the change would not be a breaking one from the perspective of documented behaviour. Presumably that would change for 1.6 once the docstrings and other documentation for 1.5 are updated to describe what are currently just implementation details.

- Chas

Phil Hagelberg

unread,
Feb 4, 2013, 12:20:40 PM2/4/13
to cloju...@googlegroups.com

Michael Fogus writes:

>> It would be difficult to argue that this snuck
>> up on the community and became urgent during the 1.5 RC period.
>
> Someone please correct me if I'm wrong, but the issue is not that it
> snuck up on the community, but that it could sneak up on people who do
> not know about it and have no easy way to find out about it.

What's changed is that we had three high-profile vulnerabilities from
the same problem applied to YAML last month, one of which resulted in a
compromise which could easily have cascaded into an attack on virtually
every Ruby codebase in existence.

http://www.kalzumeus.com/2013/01/31/what-the-rails-security-issue-means-for-your-startup/

We're likely to see copycat attackers probing other systems for similar
vulnerabilities. While it's true nothing has changed in Clojure itself,
we no longer have an excuse for shipping with a bad default.

-Phil

Rich Hickey

unread,
Feb 4, 2013, 10:20:28 PM2/4/13
to cloju...@googlegroups.com

On Feb 1, 2013, at 1:29 AM, Andy Fingerhut wrote:

> A new ticket has gone from non-existent to garnering the most votes in JIRA in about 12 hours. There is a link to the Clojure group discussion thread in the ticket:
>
> http://dev.clojure.org/jira/browse/CLJ-1153
>

RC6 is on its way through maven-land, and addresses this.

Relevant commits:

https://github.com/clojure/clojure/commit/974a64c06917861b7557fd73154254195b2de548
https://github.com/clojure/clojure/commit/38a129f2631a75ed999b52d8e0440f730b00da1f

I strongly advise everyone to try it ASAP.

Rich

Nelson Morris

unread,
Feb 4, 2013, 10:56:13 PM2/4/13
to cloju...@googlegroups.com
I've only begun testing it but I did notice:

tester$ java -cp
~/.m2/repository/org/clojure/clojure/1.5.0-RC6/clojure-1.5.0-RC6.jar
clojure.main
Clojure 1.5.0-RC6
user=> (defn h [] "hi")
#'user/h
user=> (h)
"hi"
user=> (read-string "#=(clojure.lang.Namespace/remove user)")
#<Namespace user>
user=> (h)
IllegalStateException Attempting to call unbound fn: #'user/h
clojure.lang.Var$Unbound.throwArity (Var.java:43

Mike Drogalis

unread,
Feb 4, 2013, 11:03:29 PM2/4/13
to cloju...@googlegroups.com
This might be a good opportunity to aggregate all the tests everyone is throwing at this particular piece of code and turn it into a solid test suite.


Cosmin Stejerean

unread,
Feb 5, 2013, 1:20:49 AM2/5/13
to cloju...@googlegroups.com
The current whitelist still provides some interesting avenues for exploitation, mostly that allowing static method calls on the classes in the whitelist (and any of their subclasses) is bound to allow calling static methods with dangerous side effects. For example:

* Namespace/remove as pointed out by Nelson
* Integer/getInteger and friends which can read system properties

While I was testing this I ran into the fact that it is possible to construct arbitrary Java classes using the record reader syntax. From there it's just a matter of finding an interesting constructor.

* FileOutputStream will empty out the given file.
* com.sun.jndi.dns.DnsClient will resolve the hostname of every element in an array.
* sun.net.www.http.HttpClient will open a socket connection in the constructor, with a configurable timeout to make things worse.

I'm sure there might be more interesting ones.


On Mon, Feb 4, 2013 at 9:20 PM, Rich Hickey <richh...@gmail.com> wrote:
--
You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure-dev...@googlegroups.com.
To post to this group, send email to cloju...@googlegroups.com.
Visit this group at http://groups.google.com/group/clojure-dev?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.


Nelson Morris

unread,
Feb 5, 2013, 1:45:50 AM2/5/13
to cloju...@googlegroups.com
Additionally I noticed:

1) #=(var x) resolved x. An example of where this could be problematic
is if I was expecting a map back and used code that boiled down to
((read-string ..) :x). Then an attacker could call one arg functions.

2) Equivalent issues as 1) exists for compiled functions, ie
#=(clojure.core$eval.)

3) #=(SomeClass.) uses RT.classForName. I think that exposes class
loading with static initializers.

4) Static methods of any descendant of the white list can be called.

5) Combining 1) or 2) and 4) allows arbitrary code execution using:
(read-string "#=(clojure.core$eval/applyToHelper #=(var
clojure.core/eval) ((println 'hi)))")
(read-string "#=(clojure.core$eval/applyToHelper
#=(clojure.core$eval.) ((println 'hi)))")

Chas Emerick

unread,
Feb 5, 2013, 5:31:39 AM2/5/13
to cloju...@googlegroups.com
On Feb 2, 2013, at 5:22 PM, Chouser wrote:

>>>> FWIW, I've never seen #= positively documented (i.e. it's always been discussed as "there's this thing, use it if you want, but you really shouldn't"). I don't believe it's needed to use any other feature of the language.
>>>
>>> This is true, but I think there's value to a kind of "universal escape
>>> hatch" for constructing forms intended for the compiler.
>>
>> Macros are that escape hatch, definitionally so. (e.g. extend-protocol could be made to accept strings to specify types that need to be resolved via Class/forName, as is currently done with type hints; I think that'd be a very welcome enhancement ticket, if it doesn't exist already.) In contrast, #= is a deformation of the general understanding of the respective responsibilities of `read` and `eval`; a complecting form, you might say.
>
> Macros are an escape hatch only after you've gone through a complete
> compile and eval. #= was added because it was needed -- you're
> suggesting that tagged literals and other bits of new compiler code
> would be enough to make the compiler know longer require #= at all?
>
> If that's the case, maybe we should actually remove it completely. If
> it's in there but seldom used, it seems likely to be a future security
> problem waiting to happen.

Sorry, missed this reply in the stream.

The patch I offered up does eliminate the use of #=. However, it does not eliminate the use of read-string (needed for tagged literals), though the strings produced during compilation default to using print-method, not print-dup. Perhaps there's an edge case that makes that approach faulty in some way, but it seems non-breaking, at least with my codebases (and others that have reported positively; I'm not sure where Baishampayan's reported issue stands). I'd love to have feedback on my patch from someone that knows the compiler and its history, if only as an educational process.

I don't recall the specific history of #= (I think my memory is somewhat clearer w.r.t. print-dup), but again, unless there's an edge case I'm missing that can't readily be implemented using asm, it seems to not be necessary. Convenient, yes, but it's going to be very hard to keep that convenience from metastasizing into userland vulnerabilities, viz. Nelson's evaluation earlier this morning of Rich's changes.

- Chas

Stuart Halloway

unread,
Feb 5, 2013, 7:10:47 AM2/5/13
to cloju...@googlegroups.com
Rich: Thanks for your detailed approach to this, and for representing the needs of those whose code would be broken by casual "fixes" to this issue.

Chas: many people use #= as a building block, in much the same way that Clojure uses it internally.  Your patch would break all of them.

Mike: adding tests is certainly a good idea.  Probably it will be most efficient to work through a list and make sure we are agree on what we are testing first. :-)

All: considering the boundaries of the whitelist is helpful.  Thank you. It would facilitate analysis if someone would compile a list of possible issues.

Cheers,
Stu


Paul Stadig

unread,
Feb 5, 2013, 7:11:09 AM2/5/13
to cloju...@googlegroups.com
On Tue, Feb 5, 2013 at 1:45 AM, Nelson Morris <nmo...@nelsonmorris.net> wrote:
> Additionally I noticed:
>
> 1) #=(var x) resolved x. An example of where this could be problematic
> is if I was expecting a map back and used code that boiled down to
> ((read-string ..) :x). Then an attacker could call one arg functions.
>
> 2) Equivalent issues as 1) exists for compiled functions, ie
> #=(clojure.core$eval.)
>
> 3) #=(SomeClass.) uses RT.classForName. I think that exposes class
> loading with static initializers.

As does calling a static method #=(SomeClass/foo) even though the
actual method call fails the static initializer for the class is still
run.

> 4) Static methods of any descendant of the white list can be called.
>
> 5) Combining 1) or 2) and 4) allows arbitrary code execution using:
> (read-string "#=(clojure.core$eval/applyToHelper #=(var
> clojure.core/eval) ((println 'hi)))")
> (read-string "#=(clojure.core$eval/applyToHelper
> #=(clojure.core$eval.) ((println 'hi)))")

Hmm. Is it possible that Chas' patch could work? It seemed to work for
me, but maybe there are some corner cases that I didn't test. We don't
work with records, so it's possible that there's an issue with how
records are printed and read.


Paul

Paul Stadig

unread,
Feb 5, 2013, 7:28:54 AM2/5/13
to cloju...@googlegroups.com
On Tue, Feb 5, 2013 at 7:10 AM, Stuart Halloway
<stuart....@gmail.com> wrote:
> Rich: Thanks for your detailed approach to this, and for representing the
> needs of those whose code would be broken by casual "fixes" to this issue.

I think this could be taken the wrong way. I'm sure by using scare
quotes and the word casual that you don't mean Chas' patch is
ill-considered.

> Chas: many people use #= as a building block, in much the same way that
> Clojure uses it internally. Your patch would break all of them.

People who are using it for non user generated data can still bind
*read-eval* to true (as we would have to do in our own code base at
Sonian), and if they are using it to read user generated data, then
their code should be broken as a courtesy to them.

> Mike: adding tests is certainly a good idea. Probably it will be most
> efficient to work through a list and make sure we are agree on what we are
> testing first. :-)
>
> All: considering the boundaries of the whitelist is helpful. Thank you. It
> would facilitate analysis if someone would compile a list of possible
> issues.

I think Nelson's analysis means that Rich's patch does not prevent
arbitrary code execution, which means we should probably try for
something else. Unless we want to disallow clojure.lang.Fn from the
whitelist, but I think Chas discovered in his work that this breaks
Clojure.


Paul

Rich Hickey

unread,
Feb 5, 2013, 8:07:33 AM2/5/13
to cloju...@googlegroups.com
beta7 is now on its way through the Maven pipes. We will stick with beta designations until this is sorted out.

I wrapped up too late to explain everything yesterday, and had things left to finish this morning.

Here are the guiding principles:

The reader is not a secure facility and will not become one anytime soon without serious effort. So, we are just trying to make best use of what we have, and prevent people from putting themselves at unnecessary risk.

While #= is undocumented, it is not unused, and dramatically altering it would be a significant breaking change for many.

Most important - ***as long as read-eval behavior is governed by a dynamic binding***, no one should ever rely upon the safety of the default binding (however we set it), since a surrounding context might have altered it. Thus - if you want a safe read on a particular source YOU must EXPLICITLY ensure it for that call.

Given the above, the recent work does the following:

Establishes a default mode for *read-eval* substantially less powerful than full eval, but covering most existing data oriented usage. This is not intended to be safe, just safer for those ignoring all of this.

Provides 'safe-' variants of read and read-string, and points to them in read-* documentation. These are easier to understand and use than binding is for newcomers.

Binding *read-eval* to false kills all read eval.

Non-record use of record ctor syntax requires *read-eval* == true

*read-eval* is true in load (obviously you trust the code you load)

The *read-whitelist* provides additional control short of granting full eval

An all-Clojure EDN reader for interop remains a welcome contrib. Note that the use of it will also be explicit.

Rich

Chas Emerick

unread,
Feb 5, 2013, 8:54:31 AM2/5/13
to cloju...@googlegroups.com
On Feb 5, 2013, at 8:07 AM, Rich Hickey wrote:

> Most important - ***as long as read-eval behavior is governed by a dynamic binding***, no one should ever rely upon the safety of the default binding (however we set it), since a surrounding context might have altered it. Thus - if you want a safe read on a particular source YOU must EXPLICITLY ensure it for that call.

This is a key point that has been missing from the discussion so far.

> Provides 'safe-' variants of read and read-string, and points to them in read-* documentation. These are easier to understand and use than binding is for newcomers.

Given such a bifurcation, and that *read-eval* is true in load, and that read/read-string semantics are changing due to the new :default mode, perhaps read-* should be safe, and the new fns should be danger-read and danger-read-string (or something less silly but equally obvious). The point being that newcomers aren't going to be looking for safe* variants (this has been raised recently elsewhere, including @ http://nedbatchelder.com/blog/201302/war_is_peace.html).

- Chas

Paul Stadig

unread,
Feb 5, 2013, 9:13:23 AM2/5/13
to cloju...@googlegroups.com
On Tue, Feb 5, 2013 at 8:07 AM, Rich Hickey <richh...@gmail.com> wrote:
> While #= is undocumented, it is not unused, and dramatically altering it would be a significant breaking change for many.

I would like to see more data on this point. I ran the Sonian tests
against Chas' patch which defaults to turning off *read-eval* and
there were minimal changes necessary. We have 22k lines of source and
20k lines of tests, and it requires adding 5 binding forms around the
places we are reading data. These sites are easy to find as well. You
just search for places where you call read-*. It seems to me it would
be a minimal effort for an important security fix.

> Most important - ***as long as read-eval behavior is governed by a dynamic binding***, no one should ever rely upon the safety of the default binding (however we set it), since a surrounding context might have altered it. Thus - if you want a safe read on a particular source YOU must EXPLICITLY ensure it for that call.

This is important advice and should be taken to heart. You are right
that even with safe defaults one should be prudent and explicitly bind
*read-eval* to false. However, this whole issue is about safe
defaults.

> The *read-whitelist* provides additional control short of granting full eval

The default value for *read-eval* + *read-whitelist* still grants full eval.

user=> *read-eval*
:default
user=> (read-string "#=(clojure.core$eval/applyToHelper #=(var
clojure.core/eval) ((println \"pwned\")))")
pwned
nil

The default for *read-eval* is still unsafe, but now the situation
around read-*, safe-read-*, load, and *read-eval* is more complicated,
which I think is worse for inexperienced Clojure developers.

As a final note, I checked how records print and read with Chas'
patch, and they seem fine in the simple case:

user=> *read-eval*
false
user=> (defrecord A [x y])
user.A
user=> (def a (A. 1 2))
#'user/a
user=> (= a (read-string (pr-str a)))
true

There may be edge cases for reading in records, but people can bind
*read-eval* to true in those.

Also the default *read-eval* value with Chas' patch is safe:

user=> (read-string "#=(clojure.core$eval/applyToHelper #=(var
clojure.core/eval) ((println \"pwned\")))")
RuntimeException EvalReader not allowed when *read-eval* is false.
clojure.lang.Util.runtimeException (Util.java:219)

FWIW, I am -1 on Rich's changes.


Paul

Andy Fingerhut

unread,
Feb 5, 2013, 10:36:13 AM2/5/13
to cloju...@googlegroups.com
I don't pretend to understand everything that might break if 1.5.0 is not backwards compatible, but I agree with Chas that it seems far preferable to make the default read / read-string be the ones that internally bind *read-eval* to false, and create new unsafe-read / unsafe-read-string that use whatever value is bound to *read-eval* at the time they are called.

People's existing code using read / read-string would become safe. If their code breaks because they rely on unsafe behavior, the 1.5 release notes point them at the unsafe-* versions and why they should be very careful before they consider using them.

Andy

Cosmin Stejerean

unread,
Feb 5, 2013, 10:38:59 AM2/5/13
to cloju...@googlegroups.com
While we can tighten up the whitelist, it is bound to be a losing game. I would feel much more comfortable with read being safe by default, and perhaps unsafe-read added for those that need it. 

People relying on on the current unsafe behavior of read will find out rather quickly and work around it. Someone missing the fact that read is dangerous however will likely never notice until their application is compromised.




--
You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure-dev...@googlegroups.com.
To post to this group, send email to cloju...@googlegroups.com.
Visit this group at http://groups.google.com/group/clojure-dev?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.


Rich Hickey

unread,
Feb 5, 2013, 10:48:44 AM2/5/13
to cloju...@googlegroups.com

On Feb 5, 2013, at 9:13 AM, Paul Stadig wrote:

> On Tue, Feb 5, 2013 at 8:07 AM, Rich Hickey <richh...@gmail.com> wrote:
>> While #= is undocumented, it is not unused, and dramatically altering it would be a significant breaking change for many.
>
> I would like to see more data on this point. I ran the Sonian tests
> against Chas' patch which defaults to turning off *read-eval* and
> there were minimal changes necessary. We have 22k lines of source and
> 20k lines of tests, and it requires adding 5 binding forms around the
> places we are reading data. These sites are easy to find as well.

First, if you have to change your code, it's a breaking change. We don't have any breaking changes in 1.5 otherwise.

Second, while it might be 'easy' to find these sites, more nefarious are those cases where:

Programmer A goes and finds his sites that need read eval, establishes the binding.
Programmer B rests easy knowing his code that needs to be safe is safe with the defaults, does nothing.
C) Except - some of programmer B's calls are nested, or worse, later become nested, in Programmer A's eval contexts.

The more people need to do A, the more likely we'll see C.

A key point of this middle-ground default is that people will rarely if ever need to bind *read-eval* to true. That doesn't, and can't, eliminate the need for B to specify safety, but dramatically reduces the possibility of C. It also means that more experienced programmers can using binding *read-eval* false to establish even wider safe contexts than safe-read can supply, knowing e.g. read-oriented libraries are unlikely to establish bindings of their own.

> You
> just search for places where you call read-*. It seems to me it would
> be a minimal effort for an important security fix.
>

Iff you were getting an important security fix by changing the default - but you aren't:

>> Most important - ***as long as read-eval behavior is governed by a dynamic binding***, no one should ever rely upon the safety of the default binding (however we set it), since a surrounding context might have altered it. Thus - if you want a safe read on a particular source YOU must EXPLICITLY ensure it for that call.
>
> This is important advice and should be taken to heart.

It's not a matter of taking to heart. If true, the only route to safe programs is explicit safe-read when untrusted source.

> You are right
> that even with safe defaults one should be prudent and explicitly bind
> *read-eval* to false. However, this whole issue is about safe
> defaults.
>

No, it's not. It's about safe programs, which in this case will not fall out of 'safe' defaults. This entire discussion has been poisoned by presuming, and focusing on, an answer, rather than the problem.

>> The *read-whitelist* provides additional control short of granting full eval
>
> The default value for *read-eval* + *read-whitelist* still grants full eval.
>
> user=> *read-eval*
> :default
> user=> (read-string "#=(clojure.core$eval/applyToHelper #=(var
> clojure.core/eval) ((println \"pwned\")))")
> pwned
> nil
>

Fixed in beta8 - thanks for the report.

> The default for *read-eval* is still unsafe, but now the situation
> around read-*, safe-read-*, load, and *read-eval* is more complicated,
> which I think is worse for inexperienced Clojure developers.

Dynamic binding is complex. safe-read is not.

>
> There may be edge cases for reading in records, but people can bind
> *read-eval* to true in those.

I contest your presumption that a world in which binding *read-eval* true is easy, necessary and common, combined with inexperienced programmers not understanding binding and lulled into a sense of security due to the 'safe default', thereby failing to be explicitly safe, is a safer world. Defaults that can be made to lie are the absolute worst.

I encourage you all, as thought leaders in the Clojure community, to actually address the problem - unsafe reading, with a solution that actually addresses it - explicit safe reading, and to engage in the education and documentation effort.

Rich

Rich Hickey

unread,
Feb 5, 2013, 10:54:14 AM2/5/13
to cloju...@googlegroups.com
How does one make libraries then, that call read for you? Should there be safe and unsafe variants of every library? Will they compose? Dynamic binding exists precisely for this purpose. The bottom can't be something that hardwires a choice.

Rich

Chas Emerick

unread,
Feb 5, 2013, 11:11:06 AM2/5/13
to cloju...@googlegroups.com
This is unrelated to whether a choice should be hardwired or not; that is already manifest in master right now in safe-read and safe-read-string. My suggestion was to swap the naming for safer ergonomics, so that read and read-string would bind *read-eval* false, and e.g. dangerous-read and dangerous-read-string would not. The latter options would be "the bottom", and would clearly communicate the potential risks involved without hoping that people will carefully read, comprehend, and apply documentation.

- Chas

Tim McCormack

unread,
Feb 5, 2013, 11:13:48 AM2/5/13
to Clojure Dev
On Feb 5, 10:48 am, Rich Hickey <richhic...@gmail.com> wrote:
> No, it's not. It's about safe programs, which in this case will not fall out of 'safe' defaults. This entire discussion has been poisoned by presuming, and focusing on, an answer, rather than the problem.

I haven't seen a statement of the problem besides "*read-eval* is
dangerous". Can we agree on some behaviors, contracts, or invariants
that would hold true for the desired end-state? For instance:

* "(defn -main [v & args] (read-string v)) should not expose arbitrary
code eval."
* "Some dynamic var should exist which (unless overridden) can disable
#= in its scope."

- Tim McCormack

Paul Stadig

unread,
Feb 5, 2013, 11:19:38 AM2/5/13
to cloju...@googlegroups.com
On Tue, Feb 5, 2013 at 10:48 AM, Rich Hickey <richh...@gmail.com> wrote:
> while it might be 'easy' to find these sites, more nefarious are those cases where:
>
> Programmer A goes and finds his sites that need read eval, establishes the binding.
> Programmer B rests easy knowing his code that needs to be safe is safe with the defaults, does nothing.
> C) Except - some of programmer B's calls are nested, or worse, later become nested, in Programmer A's eval contexts.

I don't see how this is any different with the changes on master.
Programmer B is still resting easy because *read-eval* is bound to
:default with an approved white list. Programmer A still binds
*read-eval* to true. Situation C still results.

Dynamic bindings are not composable. This is the root of your example.
What may be the "bottom" of a call stack could soon be the middle, and
if you've already bound *read-eval* then can't compose.

> A key point of this middle-ground default is that people will rarely if ever need to bind *read-eval* to true. That doesn't, and can't, eliminate the need for B to specify safety, but dramatically reduces the possibility of C. It also means that more experienced programmers can using binding *read-eval* false to establish even wider safe contexts than safe-read can supply, knowing e.g. read-oriented libraries are unlikely to establish bindings of their own.

The changes on master are just as much about making a safer default
for *read-eval*, they just accomplish it by making *read-eval*
tri-state, adding a white list, and additional safe-read-* functions.
I agree with Cosmin that whitelists are a losing game.

This setup is more complex (more moving parts), which I think makes it
easier for an inexperienced Clojure developer to shoot himself in the
foot (if that's a goal).

> No, it's not. It's about safe programs, which in this case will not fall out of 'safe' defaults. This entire discussion has been poisoned by presuming, and focusing on, an answer, rather than the problem.

IMO, it *is* about safe defaults. You can't make people write safe
programs. You don't have control over that. You can, however,
establish 'safe' defaults and educate people how to write safe
programs.

>
>>> The *read-whitelist* provides additional control short of granting full eval
>>
>> The default value for *read-eval* + *read-whitelist* still grants full eval.
>>
>> user=> *read-eval*
>> :default
>> user=> (read-string "#=(clojure.core$eval/applyToHelper #=(var
>> clojure.core/eval) ((println \"pwned\")))")
>> pwned
>> nil
>>
>
> Fixed in beta8 - thanks for the report.

Thanks, but Nelson should get the credit for that. I just pasted his code.

> I encourage you all, as thought leaders in the Clojure community, to actually address the problem - unsafe reading, with a solution that actually addresses it - explicit safe reading, and to engage in the education and documentation effort.

Sure, We need to educate people, but I believe that is orthogonal to
having a safe default for *read-eval*. Bad situations can still result
if there's a 'safe' default for *read-eval*, but that doesn't justify
having an unsafe default.


Paul

Andy Fingerhut

unread,
Feb 5, 2013, 11:40:40 AM2/5/13
to cloju...@googlegroups.com
I hope I'm not poisoning the conversation. If so, I'm poisoning it by ignorance rather than malice, and my ignorance is hopefully still curable.

I can't control what library authors will do, but if we encourage them to make their default API's only use safe variants of read/read-string (with *read-eval* false) underneath the hood, then the safe variants will compose, and should avoid eval-related vulnerabilities individually as well as when composed.

If that makes the job of writing libraries that have unsafe variants in their APIs, and can be composed in various safe/unsafe combinations with other libraries, then I think that is a good thing. It should be harder to do that, and people creating such compositions should be sweating bullets on every line of code.

Andy

Chas Emerick

unread,
Feb 5, 2013, 12:03:13 PM2/5/13
to cloju...@googlegroups.com
On Feb 5, 2013, at 8:07 AM, Rich Hickey wrote:

> The reader is not a secure facility and will not become one anytime soon without serious effort. So, we are just trying to make best use of what we have, and prevent people from putting themselves at unnecessary risk.
>
> While #= is undocumented, it is not unused, and dramatically altering it would be a significant breaking change for many.

I wanted to quote these two bits and try to unpack them a bit, since I think the thread moved on past them too quickly.

I (and, I believe, many others) conceive of read and read-string as referentially transparent (and in the case of read-string, actually pure), in the Clojure mold. Those of us aware of the implementation details know that that is not the case, due to *read-eval* and #= (leaving aside for the moment what we inherit from the JVM in the form of class initialization and such).

To Tim McCormack's point of there being a lack of a clear problem statement, I'd offer "read is broadly side-effecting, and should not be". This is what my patch fundamentally aimed to accomplish, while leaving the backdoor of *read-eval* in place for people to sanely adapt existing codebases. Perhaps the patch fails in some way (I'm always open to feedback as to why and how it could be improved), but I think it's a valid objective that would definitively close the door on any reader-related vulnerabilities.

Unfortunately (IMO), it seems that that problem statement is not shared, and that the ship has permanently sailed w.r.t. read being side-effecting. I think it's worth exploring why it needs to be that way; if the reason is because "doing otherwise would break code" (that uses an undocumented facility) rather than the necessary changes being a bunch of work, then I think that that's a poor decision, especially for the broader community.

Finally, re: what's on master now as 1.5.0-beta8: the whitelist allows for static method invocations against Maps and Collections. So:

https://gist.github.com/cemerick/4715727

...perhaps contrived, perhaps not, but the sort of thing that slips past broad whitelists all day long.

Cheers,

- Chas

Rich Hickey

unread,
Feb 5, 2013, 12:05:13 PM2/5/13
to cloju...@googlegroups.com
There should be no sweating bullets. Safety of the application should be in application code, libraries should be agnostic and thus most reusable. A library usually has no idea what it is reading or where it is coming from, why should it be in charge of that decision? By being free of bindings and hardwirings, apps can use the knobs.

Imagine if libraries that printed hardwired *print-length*? Or handled their own error reporting and logging?

There is a reason these things are left to higher layers.

Rich

Kevin Downey

unread,
Feb 5, 2013, 12:20:12 PM2/5/13
to cloju...@googlegroups.com


On Tuesday, February 5, 2013 8:19:38 AM UTC-8, Paul Stadig wrote:
On Tue, Feb 5, 2013 at 10:48 AM, Rich Hickey <richh...@gmail.com> wrote:
> while it might be 'easy' to find these sites, more nefarious are those cases where:
>
> Programmer A goes and finds his sites that need read eval, establishes the binding.
> Programmer B rests easy knowing his code that needs to be safe is safe with the defaults, does nothing.
> C) Except - some of programmer B's calls are nested, or worse, later become nested, in Programmer A's eval contexts.

I don't see how this is any different with the changes on master.
Programmer B is still resting easy because *read-eval* is bound to
:default with an approved white list. Programmer A still binds
*read-eval* to true. Situation C still results.

I think that Rich's point with :default is because :default is still permissive that programmer A is unlikely to need to change the default, and hard code a binding "at the bottom" leaving programmer B the ability to control the behavior with binding from the top of the stack.
 

Nelson Morris

unread,
Feb 5, 2013, 5:11:35 PM2/5/13
to cloju...@googlegroups.com
>>>> The *read-whitelist* provides additional control short of granting full eval
>>>
>>> The default value for *read-eval* + *read-whitelist* still grants full eval.
>>>
>>> user=> *read-eval*
>>> :default
>>> user=> (read-string "#=(clojure.core$eval/applyToHelper #=(var
>>> clojure.core/eval) ((println \"pwned\")))")
>>> pwned
>>> nil
>>>
>>
>> Fixed in beta8 - thanks for the report.
>
> Thanks, but Nelson should get the credit for that. I just pasted his code.

New one for -beta8

Input:

(read-string "#=(clojure.lang.PersistentTreeSet/create
#=(clojure.core$apply.) (((do (println \"hi\") 3))
#=(clojure.core$eval.)))")

Output:

hi
hi
#{((do (println "hi") 3)) #<core$eval clojure.core$eval@46c5954b>}

-
Nelson Morris

Rich Hickey

unread,
Feb 5, 2013, 8:10:12 PM2/5/13
to cloju...@googlegroups.com
Beta9 drops the whitelist approach, and in fact all approaches around making the existing reader 'safe'. The existing reader is designed to empower developers to read code and leverage the platform, and can do all kinds of things you never want to be possible when accepting input from untrusted others. Some variant or mode of it which can no longer do those things is in fact a different reader. *read-eval* was an arbitrary and somewhat broken split point. Associating safety with its default value is spurious.

So Beta9 has read-edn and read-edn-string, based upon a reader that cannot do any of those things, and has no modes. It only reads the edn subset of Clojure data, and the edn spec [1] should help make it clear exactly what is accepted, in addition to ensuring compliance with the many other implementaitons. That reader can be subject to whatever hardening effort we want to apply, independent of, and without compromising, the flexibility and power of the existing reader.

The reader has been restored as it was, with all of the same defaults.

Code that accepts untrusted input should use read-edn. The docs for read and read-string now say as much.

With 1.5, most reading for interop, config files etc will use read-edn, and read will remain what it has always been, a power tool.

This is still work in progress, please evaluate.

Thanks,

Rich

[1] http://edn-format.org

Tom Hickey

unread,
Feb 5, 2013, 8:44:26 PM2/5/13
to cloju...@googlegroups.com
Shouldn't the calls to read in the first three arities of read-edn be self calls?

Mikera

unread,
Feb 5, 2013, 9:02:51 PM2/5/13
to cloju...@googlegroups.com
+1 

I like the split into different functions.

I'd still rather see "read" be the safe-by-default version, and require the user to ask for "read-code" or "read-unsafe" explicitly if they want *read-eval* style behaviour.

It's a social thing rather than a technical thing: Newcomers without awareness of the risks are much more likely to pick "read" by accident, especially given the fact that we are going to be shunting an increasingly large amount of data around using edn. People are human and don't always read docstrings with all the care and attention that they ideally should.

Basically, I'd much rather see existing code break than be unknowingly at risk (both for myself and others). Clojure isn't Java: we have a relatively small number of practitioners, a higher level of skill and enthusiasm and a relatively tiny installed code base. We should therefore be relatively more willing to break minor things (with deliberate care and foresight) if it means we can evolve towards a better design in the long run.

Mike Drogalis

unread,
Feb 5, 2013, 9:10:08 PM2/5/13
to cloju...@googlegroups.com
Very minor, but why does EdnReader import classes from java.lang? Lines 16-25 from commit 53201d8c38a984401ae9d70190cf1959902edf64


--

Rich Hickey

unread,
Feb 5, 2013, 9:28:38 PM2/5/13
to cloju...@googlegroups.com

On Feb 5, 2013, at 8:44 PM, Tom Hickey wrote:

> Shouldn't the calls to read in the first three arities of read-edn be self calls?
>

Doh - fix coming, thanks.

Rich Hickey

unread,
Feb 5, 2013, 9:36:03 PM2/5/13
to cloju...@googlegroups.com

On Feb 5, 2013, at 9:02 PM, Mikera wrote:

> +1
>
> I like the split into different functions.
>
> I'd still rather see "read" be the safe-by-default version, and require the user to ask for "read-code" or "read-unsafe" explicitly if they want *read-eval* style behaviour.
>
> It's a social thing rather than a technical thing: Newcomers without awareness of the risks are much more likely to pick "read" by accident, especially given the fact that we are going to be shunting an increasingly large amount of data around using edn. People are human and don't always read docstrings with all the care and attention that they ideally should.
>
> Basically, I'd much rather see existing code break than be unknowingly at risk (both for myself and others). Clojure isn't Java: we have a relatively small number of practitioners, a higher level of skill and enthusiasm and a relatively tiny installed code base. We should therefore be relatively more willing to break minor things (with deliberate care and foresight) if it means we can evolve towards a better design in the long run.

I'm sorry, I just don't think we should readily discard decades of tradition wherein "read" is the function a Lisp uses to read its code, the R in REPL etc.

Rich

Herwig Hochleitner

unread,
Feb 5, 2013, 9:46:47 PM2/5/13
to cloju...@googlegroups.com
Should read-edn take allowed reader tags as an argument? Because otherwise we would have the same problem with spuriously bound eval-able tags all over again, wouldn't we?

Opinion: In face of explicit safe versions, that will be promoted, I'm fine with the default staying what it is for now. I'd like to see *read-eval* scheduled for removal in clojure 2.0, maybe replaced by an #eval reader tag.


2013/2/6 Rich Hickey <richh...@gmail.com>

Mikera

unread,
Feb 5, 2013, 9:51:58 PM2/5/13
to cloju...@googlegroups.com
I'm not suggesting that "read" shouldn't read code in an "R" sense. 

I just think it shouldn't do any "E", particularly of arbitrary code. *read-eval* is a nasty little bit of complecting IMHO, and this is why we are having a tricky design discussion now.

But what do I know? Maybe I just haven't done enough Common Lisp to fully appreciate the usefulness of reader macros :-)

Rich Hickey

unread,
Feb 5, 2013, 10:00:22 PM2/5/13
to cloju...@googlegroups.com

On Feb 5, 2013, at 9:46 PM, Herwig Hochleitner wrote:

> Should read-edn take allowed reader tags as an argument? Because otherwise we would have the same problem with spuriously bound eval-able tags all over again, wouldn't we?

That came up on IRC. One option we have now is to give read-edn a different arity overloading than read, at the cost of more difficult porting. Most people don't leverage all of the signatures of read. I'm still not sure where tags would go.

That said, I don't think any read handlers should ever be something unsafe for reading. In any case, it's nothing like being able to send e.g. an arbitrary classname.

>
> Opinion: In face of explicit safe versions, that will be promoted, I'm fine with the default staying what it is for now. I'd like to see *read-eval* scheduled for removal in clojure 2.0, maybe replaced by an #eval reader tag.
>

By *read-eval* do you mean just the flag, or #= and its ilk (which, btw, includes #ajava.ClassName[args])?

Herwig Hochleitner

unread,
Feb 5, 2013, 10:00:53 PM2/5/13
to cloju...@googlegroups.com
Should read-edn take allowed reader tags as an argument? Because otherwise we would have the same problem with spuriously bound eval-able tags all over again, wouldn't we?

I'm sorry, looking at the code, I see now that EdnReader doesn't do reader tags yet.
That would be more of a feature request then ;-) 

Herwig Hochleitner

unread,
Feb 5, 2013, 10:21:54 PM2/5/13
to cloju...@googlegroups.com
2013/2/6 Rich Hickey <richh...@gmail.com>

> Should read-edn take allowed reader tags as an argument? Because otherwise we would have the same problem with spuriously bound eval-able tags all over again, wouldn't we?

That came up on IRC. One option we have now is to give read-edn a different arity overloading than read, at the cost of more difficult porting. Most people don't leverage all of the signatures of read. I'm still not sure where tags would go.

How about a default of safe reader tags like #uuid #date a.s.o., for the compatible arities to read.
clojure.core/default-reader-tags 
 
That said, I don't think any read handlers should ever be something unsafe for reading. In any case, it's nothing like being able to send e.g. an arbitrary classname.

They shouldn't, but every handler has to be audited before facing untrusted data, which I presumed to be your former point.
 
By *read-eval* do you mean just the flag, or #= and its ilk (which, btw, includes #ajava.ClassName[args])?

Resting on the case that a reader tag must be presumed eval until proven otherwise, my proposal was mainly about removing #=, or advertising it as #eval if desired.
For read-edn, the #classname[arg] case is subsumed by the resolution mechanism for reader tags and its security implications. 

In case #= is removed, the flag for the default reader might be better called *classes-as-reader-tags* or something. Are there others dangerous reader mechanisms like that?

Rich Hickey

unread,
Feb 5, 2013, 10:24:29 PM2/5/13
to cloju...@googlegroups.com

On Feb 5, 2013, at 9:51 PM, Mikera wrote:

> On Wednesday, 6 February 2013 10:36:03 UTC+8, Rich Hickey wrote:
>
> On Feb 5, 2013, at 9:02 PM, Mikera wrote:
>
> > +1
> >
> > I like the split into different functions.
> >
> > I'd still rather see "read" be the safe-by-default version, and require the user to ask for "read-code" or "read-unsafe" explicitly if they want *read-eval* style behaviour.
> >
> > It's a social thing rather than a technical thing: Newcomers without awareness of the risks are much more likely to pick "read" by accident, especially given the fact that we are going to be shunting an increasingly large amount of data around using edn. People are human and don't always read docstrings with all the care and attention that they ideally should.
> >
> > Basically, I'd much rather see existing code break than be unknowingly at risk (both for myself and others). Clojure isn't Java: we have a relatively small number of practitioners, a higher level of skill and enthusiasm and a relatively tiny installed code base. We should therefore be relatively more willing to break minor things (with deliberate care and foresight) if it means we can evolve towards a better design in the long run.
>
> I'm sorry, I just don't think we should readily discard decades of tradition wherein "read" is the function a Lisp uses to read its code, the R in REPL etc.
>
> Rich
>
> I'm not suggesting that "read" shouldn't read code in an "R" sense.
>

edn is a subset of code. So if limited to edn it won't read code.

> I just think it shouldn't do any "E", particularly of arbitrary code. *read-eval* is a nasty little bit of complecting IMHO, and this is why we are having a tricky design discussion now.
>

Many people are saying things like that, but the risks come from integrating Java (a complecting essential to Clojure's success :), and the fact that the namespaces are open and constructors call code, as do class initializers. Java data involves code. Older Lisps could use only lists, we can't and get the integration we value. For instance, programs that write programs often need to embed arbitrary unknown objects into code. I get the feeling a lot of people ready to change things are not leveraging many of Clojure's capabilities in this area.

I don't see it as tricky design at this point. People have been using read for something for which it is not suited. As soon as you stop trying to force it into that role (front-end to the internet), it's no longer a problem. The only thing that was tricky was trying to pretend something was other than it is.

Rich

Rich Hickey

unread,
Feb 5, 2013, 10:25:42 PM2/5/13
to cloju...@googlegroups.com
They will be re-enabled, hacked out in making EdnReader from LispReader.

Mikera

unread,
Feb 5, 2013, 10:29:59 PM2/5/13
to cloju...@googlegroups.com
FWIW, here's a constructive suggestion that might make sense for 1.5 given where we are right now

1. Recognise that doing "E" in "R" is a bad idea and plan to move to a non-evaling read in 1.6
2. Leave *read-eval* with it's current dangerous default (true), but deprecate it. 
3. Print a warning ("Deprecated! Dangerous!") if "read" is used with *read-eval* set to true. This will help people patch security holes in the short term and prepare for a move to 1.6
4. Existing code will still work

For the future 1.6 release, we can be much more clean in how we separate things out, e.g.:

read : Reads data only, no evaluation. Works for most "normal" Clojure code.
read-edn : Reads edn data, with whatever tagged value handling is chosen
read-clojure : Read with support for #=(...), other dangerous stuff that might be in general Clojure code. Power users / the Clojure REPL will use this.


Herwig Hochleitner

unread,
Feb 5, 2013, 10:42:32 PM2/5/13
to cloju...@googlegroups.com
FWIW, here's a constructive suggestion that might make sense for 1.5 given where we are right now

1. Recognise that doing "E" in "R" is a bad idea and plan to move to a non-evaling read in 1.6

I agree with Rich's point on this one: we won't get a non-evaling read just by removing #=
And by doing more than that, we would take away vital capabilities.
 
2. Leave *read-eval* with it's current dangerous default (true), but deprecate it.  
3. Print a warning ("Deprecated! Dangerous!") if "read" is used with *read-eval* set to true. This will help people patch security holes in the short term and prepare for a move to 1.6
4. Existing code will still work

At this point the conversation would be about a possible reason, other than security, to phase out *read-eval*.
Maybe a push towards more AOT compilation? ;-)
 
For the future 1.6 release, we can be much more clean in how we separate things out, e.g.:

read : Reads data only, no evaluation. Works for most "normal" Clojure code.
read-edn : Reads edn data, with whatever tagged value handling is chosen
read-clojure : Read with support for #=(...), other dangerous stuff that might be in general Clojure code. Power users / the Clojure REPL will use this.

In the name of sane defaults, I'll have to formally agree on this one. I must say though, that I like the secret super power, clojure's defaults grant my config files ;-)

Herwig Hochleitner

unread,
Feb 5, 2013, 10:59:05 PM2/5/13
to cloju...@googlegroups.com
2013/2/6 Mikera <mike_r_...@yahoo.co.uk>
For the future 1.6 release, we can be much more clean in how we separate things out, e.g.:

read : Reads data only, no evaluation. Works for most "normal" Clojure code.
read-edn : Reads edn data, with whatever tagged value handling is chosen
read-clojure : Read with support for #=(...), other dangerous stuff that might be in general Clojure code. Power users / the Clojure REPL will use this.

Hm .. on the other hand, read as proposed here, could be folded into the no-config arity of read-edn (which btw should include #arraymap and other stuff that can emerge while print-duping "normal" clojure code)
And since code accepting untrusted data has to be audited anyway, 'read-clojure might just aswell stay 'read.

Mikera

unread,
Feb 6, 2013, 12:28:08 AM2/6/13
to cloju...@googlegroups.com
On Wednesday, 6 February 2013 11:42:32 UTC+8, Bendlas wrote:
FWIW, here's a constructive suggestion that might make sense for 1.5 given where we are right now

1. Recognise that doing "E" in "R" is a bad idea and plan to move to a non-evaling read in 1.6

I agree with Rich's point on this one: we won't get a non-evaling read just by removing #=
And by doing more than that, we would take away vital capabilities.

If we went down this route, we would need to remove all eval-like activity from "read" in the long run, including arbitrary Java constructors and suchlike. Loading namespaces also falls into this category, and I guess some other stuff.

These are potentially just as dangerous as running #= at read time. Not saying they don't have a place, just that they shouldn't be executed by the standard reader, and that deferring this stuff to "eval" time is a much saner design. I'd expect the reader to do sane things with Java object constructors, like read "#my.crazy.Object["foo"]" and produce '(new my.crazy.Object "foo"). 

Ditto for #= - this could be executed at eval-time as well, although I recognise that there might be some subtle issues regarding the order of macro expansion that would need to be handled

My personal view is that the ideal design would be for "read" would be to produce a data structure that is a composition of known allowable types (lists, strings, numbers, vectors, sets, maps, symbols, keywords, metadata, *unevaluated* special macros/tagged literals), i.e. whatever we decide formally constitutes a valid Clojure s-expression (and hence a strict superset of stuff like edn).

If we do all this then:
1. read becomes safe for untrusted code
2. (eval (read x)) can be made to work as before, just with some stuff moved from read to eval
3. (read x) can be equal to (read (print (read x))) - a very nice property but can't be preserved if we allow arbitrary objects to creep into the output of the reader. 
4. We still get to keep all the interop with Java / tagged edn literals. We've just pushed it out of the core reader.

Sounds like the boat has probably sailed though..... so I'll shut up now unless our BDFL decides there is some actual potential in all this.
 
 
2. Leave *read-eval* with it's current dangerous default (true), but deprecate it.  
3. Print a warning ("Deprecated! Dangerous!") if "read" is used with *read-eval* set to true. This will help people patch security holes in the short term and prepare for a move to 1.6
4. Existing code will still work

At this point the conversation would be about a possible reason, other than security, to phase out *read-eval*.
Maybe a push towards more AOT compilation? ;-)

My main use case here is distributed code execution with remote clients. I'd love to know that the default "read" is 100% safe, can read arbitrary Clojure code and is ideally also guaranteed O(n) in the size of input. Code can then be read, analysed and transformed before determining that it is safe for execution (or transmission to the rest of a cluster, etc.)

Technically, I'd be equally happy with a "read-as-data" or some other "safe" reader function for this purpose. But I know that sooner or later some junior developer is going to make a mistake and use "read" instead, which is why I'd prefer that "read" should be the name of the reading function that is safe by default.

Note that "read-edn" doesn't cut it: I'd like to be able to read any syntactically valid Clojure code even if I subsequently decide that it is unsafe.
 
 
For the future 1.6 release, we can be much more clean in how we separate things out, e.g.:

read : Reads data only, no evaluation. Works for most "normal" Clojure code.
read-edn : Reads edn data, with whatever tagged value handling is chosen
read-clojure : Read with support for #=(...), other dangerous stuff that might be in general Clojure code. Power users / the Clojure REPL will use this.
On further reflection, I don't think you need a separate "read-clojure" if you are willing to push the dangerous stuff from "read" into "eval".
 

In the name of sane defaults, I'll have to formally agree on this one. I must say though, that I like the secret super power, clojure's defaults grant my config files ;-)

Wait until the evil villain also gets into your config files :-) 

Rich Hickey

unread,
Feb 6, 2013, 8:46:23 AM2/6/13
to cloju...@googlegroups.com

On Feb 6, 2013, at 12:28 AM, Mikera wrote:

> On Wednesday, 6 February 2013 11:42:32 UTC+8, Bendlas wrote:
> FWIW, here's a constructive suggestion that might make sense for 1.5 given where we are right now
>
> 1. Recognise that doing "E" in "R" is a bad idea and plan to move to a non-evaling read in 1.6
>
> I agree with Rich's point on this one: we won't get a non-evaling read just by removing #=
> And by doing more than that, we would take away vital capabilities.
>
> If we went down this route, we would need to remove all eval-like activity from "read" in the long run, including arbitrary Java constructors and suchlike. Loading namespaces also falls into this category, and I guess some other stuff.
>
> These are potentially just as dangerous as running #= at read time. Not saying they don't have a place, just that they shouldn't be executed by the standard reader, and that deferring this stuff to "eval" time is a much saner design. I'd expect the reader to do sane things with Java object constructors, like read "#my.crazy.Object["foo"]" and produce '(new my.crazy.Object "foo").
>

The above demonstrates you don't understand the point of read, and shouldn't be proposing how it ought to work.

Rich

Bronsa

unread,
Feb 6, 2013, 9:21:23 AM2/6/13
to cloju...@googlegroups.com
This looks fine to me, but for two (minor?) issues:

1) the lack of regex literals
While it's understandable why this is so (EDN is designed to be a data format, and regexes are not included because of the different implementations in different langauges) I find that this would limit the usefulness of read-edn in those use-cases where it would actually be used most: eg reading clojure data from clojure for configurations, where it's not unusual at all to find regex literals desiderable.

2) the different way read-edn handles symbol-termination
In the clojure reader @, `, ~, % are considered symbol-terminators, while in the EDN reader they are not.
This may lead to confusion and would make the EDN reader effectively not a subset of the clojure reader since they would read the same string in different ways
eg. (read-string "foo%bar") ;=> foo vs (read-edn-string "foo%bar") ;=>foo%bar

I'd like to see both of those two issues addressed, even though the current behaviour is so by design.

Thanks,
Nicola Mometto

2013/2/6 Rich Hickey <richh...@gmail.com>

Rich Hickey

unread,
Feb 7, 2013, 6:02:27 PM2/7/13
to cloju...@googlegroups.com
Beta11 on its way. System property can be used to set default value of *read-eval*:

user=> (doc *read-eval*)
-------------------------
clojure.core/*read-eval*
Defaults to true (or value specified by system property, see below)
***This setting implies that the full power of the reader is in play,
including syntax that can cause code to execute. It should never be
used with untrusted sources. See also: read-edn.***

When set to logical false in the thread-local binding,
the eval reader (#=) and record/type literal syntax are disabled in read/load.
Example (will fail): (binding [*read-eval* false] (read-string "#=(* 2 21)"))

The default binding can be controlled by the system property
'clojure.read.eval' System properties can be set on the command line
like this:

java -Dclojure.read.eval=false ...

The system property can also be set to 'unknown' via
-Dclojure.read.eval=unknown, in which case the default binding
is :unknown and all reads will fail in contexts where *read-eval*
has not been explicitly bound to either true or false. This setting
can be a useful diagnostic tool to ensure that all of your reads
occur in considered contexts. You can also accomplish this in a
particular scope by binding *read-eval* to :unknown

---------

Also tagged literals, new sigs for read-edn-*.

Rich

Mike Drogalis

unread,
Feb 7, 2013, 8:13:23 PM2/7/13
to cloju...@googlegroups.com
I like this route. I feel like it got the best of what everyone wanted. Well done, Rich.


Steve Miner

unread,
Feb 7, 2013, 9:53:00 PM2/7/13
to cloju...@googlegroups.com
Looking at beta11. Doc for read-edn says: " :readers - a map of tag symbols to data-reader functions to be considered before *default-data-readers*. When not supplied, only the *default-data-readers* will be used." That should be default-data-readers without the *earmuffs*.



Aaron Bedra

unread,
Feb 8, 2013, 12:00:20 AM2/8/13
to cloju...@googlegroups.com
Very happy with this. I think it has something for everyone and puts us in the right place. I especially like the unknown setting since it will allow us to audit our applications to find out where each instance of read is happening and allow for a decision to be made.

-Aaron

Rich Hickey

unread,
Feb 8, 2013, 8:16:21 AM2/8/13
to cloju...@googlegroups.com

On Feb 7, 2013, at 9:53 PM, Steve Miner wrote:

> Looking at beta11. Doc for read-edn says: " :readers - a map of tag symbols to data-reader functions to be considered before *default-data-readers*. When not supplied, only the *default-data-readers* will be used." That should be default-data-readers without the *earmuffs*.
>

Will fix - thanks.

Rich Hickey

unread,
Feb 8, 2013, 8:23:46 AM2/8/13
to cloju...@googlegroups.com

On Feb 6, 2013, at 9:21 AM, Bronsa wrote:

> This looks fine to me, but for two (minor?) issues:
>
> 1) the lack of regex literals
> While it's understandable why this is so (EDN is designed to be a data format, and regexes are not included because of the different implementations in different langauges) I find that this would limit the usefulness of read-edn in those use-cases where it would actually be used most: eg reading clojure data from clojure for configurations, where it's not unusual at all to find regex literals desiderable.
>
> 2) the different way read-edn handles symbol-termination
> In the clojure reader @, `, ~, % are considered symbol-terminators, while in the EDN reader they are not.
> This may lead to confusion and would make the EDN reader effectively not a subset of the clojure reader since they would read the same string in different ways
> eg. (read-string "foo%bar") ;=> foo vs (read-edn-string "foo%bar") ;=>foo%bar
>
> I'd like to see both of those two issues addressed, even though the current behaviour is so by design.

#2 is not the design, and beta11 fixes that. % should be a constituent character in Clojure, and @,`,~ are not allowed in edn symbols, and now not allowed by read-edn.

Thanks for the report,

Rich

Jean Niklas L'orange

unread,
Feb 8, 2013, 10:45:39 AM2/8/13
to cloju...@googlegroups.com
On 8 February 2013 06:00, Aaron Bedra <aaron...@gmail.com> wrote:
Very happy with this. I think it has something for everyone and puts us in the right place. 

I'm also very happy with the solution, and would like to thank Rich for finding it. It's not easy to satisfy every person in a community when a case like this pops up, yet here we are with a solution where everyone seem to be content with the changes.

--
Regards,
Jean Niklas L'orange

Tim McCormack

unread,
Feb 18, 2013, 9:00:44 PM2/18/13
to cloju...@googlegroups.com
The command-line eval feature is broken under -Dclojure.read.eval=unknown in beta13 and RC16 (not sure which is newer):

$ java -Dclojure.read.eval=unknown -jar ~/.m2/repository/org/clojure/clojure/1.5.0-RC16/clojure-1.5.0-RC16.jar -e "(+ 1 2)"
Exception in thread "main" java.lang.RuntimeException: Reading disallowed - *read-eval* bound to :unknown

 - timmc
Reply all
Reply to author
Forward
0 new messages