Pass through pre-constructed data structures?

Showing 1-13 of 13 messages
Pass through pre-constructed data structures? Peter Taoussanis 4/22/14 12:08 AM
Hi all, just noticed some behaviour that surprised me: 

`(set <preexisting-set>)` and `(vector <preexisting-vector>)` actually reconstruct their input rather than just passing them through.

Is there a reason we couldn't/shouldn't extend PersistentHashSet/PersistentVector to do so? Seems like this'd be a simple change that buys us some performance for nothing.

Thanks, cheers! :-)
Re: Pass through pre-constructed data structures? Alex Miller 4/22/14 7:56 AM
Seems like this would need to be a change in set and vector right? 


--
You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure-dev...@googlegroups.com.
To post to this group, send email to cloju...@googlegroups.com.
Visit this group at http://groups.google.com/group/clojure-dev.
For more options, visit https://groups.google.com/d/optout.

Re: Pass through pre-constructed data structures? Peter Taoussanis 4/22/14 8:06 AM

Seems like this would need to be a change in set and vector right? 

Sure, okay - can I create an issue for this & submit a patch?

Re: Pass through pre-constructed data structures? gfredericks 4/22/14 8:23 AM
What would be the behavior when passed a specialized data structure? E.g., (set (sorted-set 1 2 3))?

Gary Fredericks
(803)-295-0195
frederi...@gmail.com
gfredericks.com


On Tue, Apr 22, 2014 at 10:06 AM, Peter Taoussanis <ptaou...@gmail.com> wrote:

Seems like this would need to be a change in set and vector right? 

Sure, okay - can I create an issue for this & submit a patch?

--
You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure-dev...@googlegroups.com.
To post to this group, send email to cloju...@googlegroups.com.
Visit this group at http://groups.google.com/group/clojure-dev.
For more options, visit https://groups.google.com/d/optout.

Re: Pass through pre-constructed data structures? Peter Taoussanis 4/22/14 8:56 AM

What would be the behavior when passed a specialized data structure? E.g., (set (sorted-set 1 2 3))?

I propose to only pass through types that would be unaffected by a rewrapping. So a PersistentTreeSet would get reconstructed as a HashSet in this case. I.e. no semantics would be changed by the optimization.
Re: Pass through pre-constructed data structures? Jean Niklas L'orange 4/22/14 9:51 AM
A better example is perhaps self-designed data structures, which extends PersistentSet or PersistentVector: I wouldn't be too surprised if someone out there has extended PersistentVector, and want `vec` to convert the specialized structure back to the standard PersistentVector.

I would thus suggest that this optimisation checks that the data structure passed in is exactly of the type PersistentSet/PersistentVector, and not a subtype of them.
--
Regards,
Jean Niklas L'orange
Re: Pass through pre-constructed data structures? Aphyr Null 4/24/14 12:26 PM
On Tuesday, April 22, 2014 9:51:16 AM UTC-7, Jean Niklas L'orange wrote:
I would thus suggest that this optimisation checks that the data structure passed in is exactly of the type PersistentSet/PersistentVector, and not a subtype of them.

Hmm, I was actually going to suggest the opposite; that wherever (vec? x) returns true, (vec x) should be exactly x. We have explicit constructors (hash-set) and (sorted-set), (vector), (array-map) and (hash-map), etc; so I think (set) should simply ensure that the thing is a set, and (apply hash-set x) or (into (hash-set) x) should be used to make a thing explicitly a hash set. Also means I don't have to keep writing (if (set? x) x (set x)). :)
Re: Pass through pre-constructed data structures? Alex Miller 4/24/14 12:57 PM
I think I'm more in agreement with Kyle that "set" only needs to guarantee that the return is an IPersistentSet, not that it's a PersistentHashSet. It would of course be interesting to know whether code exists that relies on this (instead of more properly calling hash-set).

Another benefit is that the alternate interpretation would require you to compare the type of coll to PersistentHashSet in the code and generally that's a good sign something we are not programming to abstractions.

The set impl could then be: 

(if (set? coll) coll (clojure.lang.PersistentHashSet/create (seq coll))))


--
You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure-dev...@googlegroups.com.
To post to this group, send email to cloju...@googlegroups.com.
Visit this group at http://groups.google.com/group/clojure-dev.
For more options, visit https://groups.google.com/d/optout.

Re: Pass through pre-constructed data structures? Peter Taoussanis 4/26/14 12:53 AM
Hey Alex,
 
I think I'm more in agreement with Kyle that "set" only needs to guarantee that the return is an IPersistentSet, not that it's a PersistentHashSet.

Given that this is a core fn, my thinking was just to suggest the most conservative possible move. I can imagine that some folks may be incorrectly depending on `set` to return a PersistentHashSet specifically.

I'd be happy with a `set?` passthrough myself if that seems reasonable.

Have opened an issue with the relevant patches: http://dev.clojure.org/jira/browse/CLJ-1410

This my first Clojure submission, so please double-check for possible mistakes. Thanks a lot, cheers! :-)
Re: Pass through pre-constructed data structures? miner 4/26/14 7:20 AM
I'd like to see some benchmarks.  I have outsmarted myself in the past by optimizing special cases only to find out that the benefit was negligible and the cost was higher than expected for general usage.  Also, the JIT has surprised me -- it's smarter than I thought most of the time, and extra branches can just get in the way of better compilation.

It would also be interesting to know how often (set already-a-set) was called in some existing projects.  In my code, it would be rare, but that's just me.

Just a suggestion about the proposed patches in CLJ-1410: I would not change the doc strings.  I would like to leave some room for the implementation to do what it thinks is best without demanding that the result is identical to an existing set.  That's just a performance optimization, not a language guarantee.  IMHO, of course.

Steve Miner


Re: Pass through pre-constructed data structures? Peter Taoussanis 4/26/14 7:56 AM
Hi Steve, thanks for the input!

I'd like to see some benchmarks.  I have outsmarted myself in the past by optimizing special cases only to find out that the benefit was negligible and the cost was higher than expected for general usage.  

Set/vector construction don't appear to be particularly well-suited to JIT optimization:
(let [coll #{:a :b :c :d :e :f :g :h :i :j :k}]
  (time (dotimes [_ 100000] (set coll)))                       ; ~480ms
  (time (dotimes [_ 100000] (if (set? coll) coll (set coll)))) ; ~11ms
  )

Will a difference like this be significant? Depends on the context, I guess. I tend to deal with large collections, and these tend to exaggerate the effects of unnecessary reconstruction. Spotted this when a tight loop was performing worse than expected.

It would also be interesting to know how often (set already-a-set) was called in some existing projects.  In my code, it would be rare, but that's just me.

Anecdotally, I count around 36 cases in the ~100k lines I've got on hand. Of those maybe 3-4 that matter. That may or may not be representative though.

To clarify: I don't think this is a big deal either way (and it's easy to work around once you're aware of it).

Just a suggestion about the proposed patches in CLJ-1410: I would not change the doc strings.  I would like to leave some room for the implementation to do what it thinks is best without demanding that the result is identical to an existing set.

Sure, wasn't sure what the preference there would be.

Cheers :-)
Re: Pass through pre-constructed data structures? Jean Niklas L'orange 4/26/14 8:00 AM
On 26 April 2014 09:53, Peter Taoussanis <ptaou...@gmail.com> wrote:
Hey Alex,
 
I think I'm more in agreement with Kyle that "set" only needs to guarantee that the return is an IPersistentSet, not that it's a PersistentHashSet.

Given that this is a core fn, my thinking was just to suggest the most conservative possible move. I can imagine that some folks may be incorrectly depending on `set` to return a PersistentHashSet specifically.

That was my thought as well, but if it's okay to change the semantics of the implementation, I'd agree with Kyle and Alex that it is better to have `set`/`vec` short-circuit if the input collection already satisfies `set?`/`vector?`.

--
Regards,
Jean Niklas L'orange
Re: Pass through pre-constructed data structures? Alex Miller 4/26/14 8:13 AM
I agree with Steve's comments on this ticket but I'd recommend moving this discussion to the ticket comments at this point.

As a general rule of thumb, any change made for performance reasons should demonstrate repeatable benefits in the ticket.



--
You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure-dev...@googlegroups.com.
To post to this group, send email to cloju...@googlegroups.com.
Visit this group at http://groups.google.com/group/clojure-dev.
For more options, visit https://groups.google.com/d/optout.