Transient Data Structures

228 visualizações
Pular para a primeira mensagem não lida

Rich Hickey

não lida,
3 de ago. de 2009, 17:25:0303/08/2009
para Clojure
I've been doing some work on Transient Data Structures. You can read
about them here:

http://clojure.org/transients

Feedback welcome,

Rich

luke

não lida,
3 de ago. de 2009, 17:52:0803/08/2009
para Clojure
Interesting.

So I take it that these are (or should be) entirely a speed
optimization? i.e, most the time, you'd want to write your code using
the normal persistent data structures, and only go back and implement
this within specific functions if they're a bottleneck? Sounds great.

And for a somewhat crazy thought.. it seems like using them would be
as easy as adding an exclamation point to all the modification
functions. So you could easily wrap an entirely functional code block
in a transform-to-transient macro that translates the functions to
their transient counterparts, and gain all the performance benefits?

Thanks,
-Luke

Vagif Verdi

não lida,
3 de ago. de 2009, 18:08:4303/08/2009
para Clojure
On Aug 3, 1:52 pm, luke <luke.vanderh...@gmail.com> wrote:
> So you could easily wrap an entirely functional code block
> in a transform-to-transient macro that translates the functions to
> their transient counterparts, and gain all the performance benefits?

I do not think it would be that easy. Transient mode cannot be used
with lists and sequences, only with some particular data structures,
like vectors. Your macro would need to know at compile time types of
variables.

I think this is the case when dynamic nature of clojure prevents us
from teaching a compiler to do sophisticated performance optimizations
and forces a programmer to do them manually instead.

Rich Hickey

não lida,
3 de ago. de 2009, 18:14:2503/08/2009
para Clojure


On Aug 3, 5:52 pm, luke <luke.vanderh...@gmail.com> wrote:
> Interesting.
>
> So I take it that these are (or should be) entirely a speed
> optimization? i.e, most the time, you'd want to write your code using
> the normal persistent data structures, and only go back and implement
> this within specific functions if they're a bottleneck? Sounds great.
>

Yes, that's the idea.

> And for a somewhat crazy thought.. it seems like using them would be
> as easy as adding an exclamation point to all the modification
> functions. So you could easily wrap an entirely functional code block
> in a transform-to-transient macro that translates the functions to
> their transient counterparts, and gain all the performance benefits?
>

That might be possible, I guess it depends on the structure of the
code. The interfaces for transient creation/persistent return are all
generic at least.

Rich

CuppoJava

não lida,
3 de ago. de 2009, 19:27:3103/08/2009
para Clojure
Hi Rich,
This is a very useful addition thanks. I personally find the O(1)
transformation to and back most useful.

I have a question about capturing the return values of conj! and
assoc!.

in this code:
(let [v (transient [])
v2 (conj! v 0)])

v2 is the captured return value from conj!, which you're supposed to
use for future operations. But what does v point to now?
-Patrick

Mark Engelberg

não lida,
3 de ago. de 2009, 20:06:1103/08/2009
para clo...@googlegroups.com
So if you want to make 10 changes to a vector, would it be worthwhile
to turn it into a transient, make the 10 changes, and then turn it
back to persistent? If no, then 100 changes? 1000?

In other words, how much overhead is there in the transformation back
and forth, and therefore, about how much transient work does it take
before it becomes worth doing?

Rich Hickey

não lida,
3 de ago. de 2009, 20:29:2203/08/2009
para Clojure
I'm not promising it points to anything useful. It is an
implementation detail of the current version that (conj! v x) returns
v. You should consider each operation to destroy the transient
argument and return another.

Rich

Rich Hickey

não lida,
3 de ago. de 2009, 20:39:0903/08/2009
para Clojure
If you are going to make 10 changes once ever in an application, then
don't bother making your code uglier. If you are going to make 10
changes in a function that gets called a lot, it will not be slower to
use transients. It may not be faster depending on the locality of the
changes (i.e. 10 evenly distributed assocs in a large vector would be
a wash, 10 conj! calls definitely much faster.

In short, the O(1) overhead is less than the cost of even a single
edit. So, e.g. into/vec/vector now use transients unconditionally if
possible.

Rich

Mark Engelberg

não lida,
3 de ago. de 2009, 20:50:1403/08/2009
para clo...@googlegroups.com
On Mon, Aug 3, 2009 at 5:39 PM, Rich Hickey<richh...@gmail.com> wrote:
> In short, the O(1) overhead is less than the cost of even a single
> edit. So, e.g. into/vec/vector now use transients unconditionally if
> possible.

Excellent. I'm glad to hear that the core functions will use
transients so I don't have to make case-by-case decisions as to
whether to rewrite with transients things that are easily expressed
with those functions. (Although you didn't mention it, I assume assoc
with a sequence of changes would also be something that now uses
transients).

John Harrop

não lida,
3 de ago. de 2009, 19:58:2603/08/2009
para clo...@googlegroups.com
The site says: "Note in particular that transients are not designed to be bashed in-place. You must capture and use the return value in the next call."

So v is not safe to use. Probably either using it throws an exception or else it points to a node in the implementation of v2, but not necessarily the correct "head" or "root" node, so that using it will have questionable effects. (Common Lisp has similar cases. Let l be a list, s the results of calling sort on l, and examine l. It's likely to now be a tail of, but not the entirety of, s, because it points to the same cons it always did and sort moved that cons so it isn't the head of the list any more, while returning the head cons. This doesn't happen with Clojure's sort because Clojure's sort doesn't mutate; in Clojure, l stays pointing to the unsorted list and s points to a sorted version. But these new transient mutators may behave more like Common Lisp's sequence mutators in this regard. Rich?)

Andy Fingerhut

não lida,
3 de ago. de 2009, 20:14:2303/08/2009
para Clojure
I like it. I can see significant use of these to help speed up some
of the benchmark programs I've been hacking on:

git://github.com/jafingerhut/clojure-benchmarks.git

and more importantly, that means they can be good in optimizing useful
code, too :-)

I was pondering this question "If a pure function mutates some local
data in order to produce an immutable return value, is that ok?" that
Rich posed just a day or three ago, when thinking about whether there
was a way to automatically determine whether a function is pure /
"referentially transparent" or not (at least for a useful subset of
actual Clojure code). In particular, Clojure's str function does this
already with StringBuilder.append:

(defn str
;; doc string deleted for brevity
([] "")
([#^Object x]
(if (nil? x) "" (. x (toString))))
([x & ys]
((fn [#^StringBuilder sb more]
(if more
(recur (. sb (append (str (first more)))) (next more))
(str sb)))
(new StringBuilder #^String (str x)) ys)))

Very nice to see the idea generalized to Clojure data structures, too,
and the safety nets built into it in case someone forgets to call
"persistent!" on a data structure before using it in another thread.

Thanks,
Andy

Garth Sheldon-Coulson

não lida,
3 de ago. de 2009, 23:48:1603/08/2009
para clo...@googlegroups.com
Holy smokes. This is great. It's been said before and it'll be said again: Rich really knows what he's doing.

This will make number crunching in Clojure even more attractive.

Question: Suppose I want to write code that will run on 1.0 but take advantage of transients if available. Is there a more appropriate thing to do than the following?

(if (resolve 'transient)
  (transient code)
  (persistent code))

Garth

Garth Sheldon-Coulson

não lida,
3 de ago. de 2009, 23:54:1403/08/2009
para clo...@googlegroups.com
On second thought, I see that my code snippet wouldn't work. The following analogous code throws an exception.

(if (resolve 'foo) (foo 1) 1)

I also see that I can't use try-catch to catch symbol resolution exceptions.

Does this mean I have to test for the Clojure version before runtime? In an ant build file? Is there a better way?

Garth

Laurent PETIT

não lida,
4 de ago. de 2009, 07:54:2404/08/2009
para clo...@googlegroups.com
Looks very interesting !

One question: wouldn't seem more natural to have transient named transient! and persistent! named persistent ?

I see a call to transient as "Enter the mutable world", so it seems to me (transient! []) conveys more this meaning than (transient []).

I see a call to persistent! as "Enter back the immutable world", so (persistent v) seems more interesting than (persistent! v) ?

And also, there may be the use case where some pure functions would protect their arguments by calling persistent! on them :

This:
(defn some-fn [v]
  (let [v (persistent v)] ...)

looks better in a pure function than this:
(defn some-fn [v]
  (let [v (persistent! v)] ...)
where the ! catches the eye ...

HTH,

--
Laurent


2009/8/3 Rich Hickey <richh...@gmail.com>

Rich Hickey

não lida,
4 de ago. de 2009, 08:56:4704/08/2009
para clo...@googlegroups.com
On Tue, Aug 4, 2009 at 7:54 AM, Laurent PETIT<lauren...@gmail.com> wrote:
> Looks very interesting !
>
> One question: wouldn't seem more natural to have transient named transient!
> and persistent! named persistent ?
>
> I see a call to transient as "Enter the mutable world", so it seems to me
> (transient! []) conveys more this meaning than (transient []).
>
> I see a call to persistent! as "Enter back the immutable world", so
> (persistent v) seems more interesting than (persistent! v) ?
>
> And also, there may be the use case where some pure functions would protect
> their arguments by calling persistent! on them :
>
> This:
> (defn some-fn [v]
>   (let [v (persistent v)] ...)
>
> looks better in a pure function than this:
> (defn some-fn [v]
>   (let [v (persistent! v)] ...)
> where the ! catches the eye ...
>

The transient function has no side effects, the persistent! function
does, thus the names.

Rich

Laurent PETIT

não lida,
4 de ago. de 2009, 09:02:2704/08/2009
para clo...@googlegroups.com
ok...

2009/8/4 Rich Hickey <richh...@gmail.com>

Sean Devlin

não lida,
4 de ago. de 2009, 09:19:4504/08/2009
para Clojure
Rich,

First of all, thank you for informing the community about this before
you push it into Clojure 1.1/2.0. Developers are people, and it takes
time for us to adjust to change. Advance warning helps a lot.

As for the changes themselves, I don't know yet. Now, I've only
thought about this briefly, and you have been working on this a long
time. I don't have your experience. I'm sure there a ton of details
I don't see, but the following is my "gut" response.

I don't like this addition. I absolutely love that every data
structure is persistent in Clojure, and that I don't have to think
about sharing, etc. You make a great argument in your "Clojure for
Lispers" videos about why persistent data structures are required, and
human understanding is not enough. This change seems to be a step
backwards.

The above argument is not perfect. I will admit that using Java
interaction, it is possible to deal with mutable data structures
anyway. I do have some code the relies on Java classes, and I mutate
the object sequentially. In those cases I wrap all the changes inside
of one higher level function and return the result. So maybe I am
doing some type of transient thing already.

As a second point, I don't like the introduction of assoc!, conj!,
etc. It just strikes me as another bug to have (oh, right, I need
assoc! not assoc...).

With all this being said, I'm looking forward to your final version.
Your speedup is impressive, and I know parts of my code (lists of hash
maps) that could use it. You've done some pretty bad ass stuff with
Clojure so far, and I think there is a chance that you could pull this
off beautifully.

Good luck.

harrison clarke

não lida,
4 de ago. de 2009, 10:29:5904/08/2009
para Clojure
you can't really use this as regular old mutable data.
if you store it somewhere, and then you change it, the original will
break.
i don't really see how you could use this in a non-functional way and
still
have it work at all.

like, you probably won't actually be let-binding them very often.
seems
like the sort of thing that you'll have a block of code that looks
like:
(-> []
(transient)
(conj! stuff)
(assoc! stuff)
(persistent!))

in fact, i'll probably end up writing a macro that does just that.

Boris Mizhen - 迷阵

não lida,
4 de ago. de 2009, 11:51:3604/08/2009
para clo...@googlegroups.com
> As a second point, I don't like the introduction of assoc!, conj!,
> etc.  It just strikes me as another bug to have (oh, right, I need
> assoc! not assoc...).
At least you will get a very clear error, I think it's possible to get
a compile time error if you are dealing with a local.
It will be runtime if you pass a parameter, but you should not, right ? :)

I would advocate a VERY STRONG WARNING about these structures,
otherwise people will litter their code ...
Maybe a compiler warning when a local transient is returned from a
function (with some way to locally suppress it?)

Kudos for making them single threaded!

Boris

Howard Lewis Ship

não lida,
4 de ago. de 2009, 12:35:4104/08/2009
para clo...@googlegroups.com
These aren't criticisms, just trying to understand this better.

So assoc! etc. return a new instance of the transient? Or do they
return the same transient? If they return a new instance, what
exactly is the advantage (though, obviously there is one, from the
transcript). If they return the same instance do you prevent the
calling code from invoking further changes or not?

I like the idea of restricting the transient to just the originating thread.

Does this mean we'll need a reduce! and map!, etc? Would there be an
advantage to those? Is there an intersection (or potential danger)
between transient structures and laziness?
--
Howard M. Lewis Ship

Creator of Apache Tapestry
Director of Open Source Technology at Formos

Rich Hickey

não lida,
4 de ago. de 2009, 13:03:2304/08/2009
para Clojure


On Aug 4, 12:35 pm, Howard Lewis Ship <hls...@gmail.com> wrote:
> These aren't criticisms, just trying to understand this better.
>
> So assoc! etc. return a new instance of the transient?

They return the next value of the transient.

> Or do they
> return the same transient?

I'm not promising that they do.

>  If they return a new instance, what
> exactly is the advantage (though, obviously there is one, from the
> transcript).

Right.

> If they return the same instance do you prevent the
> calling code from invoking further changes or not?
>

This one I don't understand, you can use the return value, don't reuse
the argument. If your code is structured functionally and non-
persistently that is how it will be anyway.

> I like the idea of restricting the transient to just the originating thread.
>
> Does this mean we'll need a reduce!

No, reduce farms out the work to the reducing function and doesn't
create a composite data structure itself. The reducing fn might use a
transient but that's an implementation detail of it.

> and map!, etc?

No, map returns a sequence.

> Would there be an
> advantage to those? Is there an intersection (or potential danger)
> between transient structures and laziness?
>

Transient data structures should never be part of the public interface
of anything. They are just an implementation detail, an optimization,
e.g. into/vec/vector now use transients, but aren't now into!/vec!/
vector! - the transients are inside.

Hope that helps,

Rich

Howard Lewis Ship

não lida,
4 de ago. de 2009, 13:06:4504/08/2009
para clo...@googlegroups.com
It does, pretty much what I expected. Care to summarize why they are
so much faster than persistent types (for those of us too lazy/busy to
read the source)?

Rich Hickey

não lida,
4 de ago. de 2009, 13:23:4204/08/2009
para clo...@googlegroups.com

Fully persistent modifications require full path copies, while
transient modifications copy nodes as needed, and once a copy has been
made, subsequent modifications affecting that node can be made in
place.

Rich

John Newman

não lida,
4 de ago. de 2009, 13:32:5504/08/2009
para clo...@googlegroups.com

I'm a noob, so this is probably a dumb question but, how does this work with closures?  Can transients be closed over?

Like,

(defn make-transient-counter [init-val]
  (let [acounter (transient [init-val])]
    (fn [add-val] (assoc! acounter 0 (+ add-val (nth acounter 0))))))

Is that possible?  And if so, what are the thread-safety implications (if any)?
...
Oh, I just re-read the description page where Rich says,

Not persistent, so you can't hang onto interim values or alias

So, would the above code throw an exception?

Thanks,

--
John

Rich Hickey

não lida,
4 de ago. de 2009, 13:39:2904/08/2009
para clo...@googlegroups.com

No, it wouldn't (currently), but there is no reason to do that. There
are the reference types for sharing things into unknown contexts, with
multithreaded semantics. Refs, atoms etc.

Rich

John Harrop

não lida,
4 de ago. de 2009, 16:31:3104/08/2009
para clo...@googlegroups.com
What about things like:

(persistent!
  (reduce
    (fn [x [i v]] (assoc! x i v))
    (transient (vec (repeat 0 (reduce max (map first coll-of-index-val-pairs)))))
    coll-of-index-val-pairs))

which is just the transientification of

(reduce
  (fn [x [i v]] (assoc x i v))
  (vec (repeat 0 (reduce max (map first coll-of-index-val-pairs))))
  coll-of-index-val-pairs))

and converts a sparse vector into a dense one.

Are sets and maps going to be transientable? Sets are often subjected to mass additions or filterings, and maps to mass additions.

P.S. anyone know if gmail is ever going to fix this thing so that pagedown and shift-pagedown and pageup and shift-pageup work correctly? Shift-downarrow to trim huge amounts of quoted text gets awkward fast. I know I could use outlook or whatever but I like having access to this group, and other stuff, from any computer with what's new/read/unread staying synchronized and all the messages I ever chose to keep available to re-examine, and I don't like outlook.

Rich Hickey

não lida,
4 de ago. de 2009, 17:50:1904/08/2009
para Clojure


On Aug 4, 4:31 pm, John Harrop <jharrop...@gmail.com> wrote:
> On Tue, Aug 4, 2009 at 1:39 PM, Rich Hickey <richhic...@gmail.com> wrote:
>
> > On Tue, Aug 4, 2009 at 1:32 PM, John Newman<john...@gmail.com> wrote:
>
> > > I'm a noob, so this is probably a dumb question but, how does this work
> > with
> > > closures?  Can transients be closed over?
>
> > > Like,
>
> > >> (defn make-transient-counter [init-val]
> > >>   (let [acounter (transient [init-val])]
> > >>     (fn [add-val] (assoc! acounter 0 (+ add-val (nth acounter 0))))))
>
> > > Is that possible?  And if so, what are the thread-safety implications (if
> > > any)?
> > > ...
> > > Oh, I just re-read the description page where Rich says,
>
> > >> Not persistent, so you can't hang onto interim values or alias
>
> > > So, would the above code throw an exception?
>
> > No, it wouldn't (currently), but there is no reason to do that. There
> > are the reference types for sharing things into unknown contexts, with
> > multithreaded semantics. Refs, atoms etc.
>
> What about things like:
>
> (persistent!
>   (reduce
>     (fn [x [i v]] (assoc! x i v))
>     (transient (vec (repeat 0 (reduce max (map first
> coll-of-index-val-pairs)))))
>     coll-of-index-val-pairs))
>

Yes, that's completely fine intended usage, as the return value of the
reducing fn becomes an argument to the next call.

> which is just the transientification of
>

Nice word - transientification.

Rich

John Harrop

não lida,
4 de ago. de 2009, 22:13:5004/08/2009
para clo...@googlegroups.com
Thanks.

Of course, this makes me think of ways to possibly speed up other operations. For example:

(defn vmap* [fun vect]
  (let [c (count vect)]
    (loop [out (transient []) i 0]
      (if (= i c)
        out
        (recur (conj! out (fun (nth vect i))) (inc i))))))

(defn vmap [fun vect]
  (persistent! (vmap* fun vect)))

Vector in, vector out, conj'd up using a transient.

And, of course:

(defn vpmap [fun vect]
  (loop [out (vmap* #(future (fun %)) vect) i (dec (count vect))]
    (let [o2 (assoc! out i @(nth out i))]
      (if (zero? i)
        (persistent! o2)
        (recur o2 (dec i)))))

Note that this last manipulates a transient vector in a single thread, though other threads (from the agent pool) calculate a bunch of futures that are stored in it. The transient vector of futures is generated using the vmap* from above, and then the futures are replaced with their values in-place by the loop in vpmap, before this persistentizes and returns that vector. The future-dereferencing loop works backwards from the end of the vector in case zero? is a quicker test than a general equality test. This is likely on hardware that implements an equality test as a subtraction followed by a zero test, because it eliminates the subtraction. On the other hand, hardware with a 1-cycle equality test of 32-bit ints is plausible, as is hardware that optimizes forward traversal of vectors, so I can't vouch for that being an optimization. The first loop has to go forward to conj up the output vector without reversing it, though.

Rich Hickey

não lida,
5 de ago. de 2009, 09:05:3805/08/2009
para clo...@googlegroups.com

Mapping into vectors and similar ops are coming, although nothing like
vmap* would ever be exposed.

> And, of course:
> (defn vpmap [fun vect]
>   (loop [out (vmap* #(future (fun %)) vect) i (dec (count vect))]
>     (let [o2 (assoc! out i @(nth out i))]
>       (if (zero? i)
>         (persistent! o2)
>         (recur o2 (dec i)))))
> Note that this last manipulates a transient vector in a single thread,
> though other threads (from the agent pool) calculate a bunch of futures that
> are stored in it. The transient vector of futures is generated using the
> vmap* from above, and then the futures are replaced with their values
> in-place by the loop in vpmap, before this persistentizes and returns that
> vector. The future-dereferencing loop works backwards from the end of the
> vector in case zero? is a quicker test than a general equality test. This is
> likely on hardware that implements an equality test as a subtraction
> followed by a zero test, because it eliminates the subtraction. On the other
> hand, hardware with a 1-cycle equality test of 32-bit ints is plausible, as
> is hardware that optimizes forward traversal of vectors, so I can't vouch
> for that being an optimization. The first loop has to go forward to conj up
> the output vector without reversing it, though.
>

There is already a very nice pvmap in the par branch that uses
ForkJoin. I strongly recommend against trying to write parallel ops
with transients - that's not what they are for.

What's nice about pvmap is that, just like transients, it also takes,
manipulates, and returns ordinary Clojure vectors (unlike the old
parallel lib which copied into and out of ForkJoin's ParallelArrays).
The goal is to make using the normal data structures as powerful as
possible, and not needing to switch to something else for performance.

Rich

Stu Hood

não lida,
5 de ago. de 2009, 19:48:5705/08/2009
para clo...@googlegroups.com
I really, really like this feature. My only complaint is that you have to use different names for the modifying functions. If the function signatures will be identical to their non-transient variants, then I guess the primary arguments would be:
 * Clojure convention for names of functions with side-effects,
 * An additional interface check in conj/assoc?

But if after calling (conj v 1), you can't use the 'v' reference anymore, then did you really cause a side effect? Its another tree falling in the woods situation.

Thanks,
Stu

Rich Hickey

não lida,
5 de ago. de 2009, 20:09:1405/08/2009
para Clojure


On Aug 5, 7:48 pm, Stu Hood <stuh...@gmail.com> wrote:
> I really, really like this feature. My only complaint is that you have to
> use different names for the modifying functions. If the function signatures
> will be identical to their non-transient variants, then I guess the primary
> arguments would be:
>  * Clojure convention for names of functions with side-effects,
>  * An additional interface check in conj/assoc?
>
> But if after calling (conj v 1), you can't use the 'v' reference anymore,
> then did you really cause a side effect? Its another tree falling in the
> woods situation.
>

Not at all. The normal persistent functions, e.g. conj/assoc, make a
promise that the thing they return *can* be held onto, indefinitely,
and immutably, and reused, thus 'persistent'. Calling an operation
that does not make those guarantees the same thing would be destroying
that promise. The names have to be different.

Rich

Luc Prefontaine

não lida,
5 de ago. de 2009, 22:10:2005/08/2009
para clo...@googlegroups.com
I like this very much... that's the kind of clever optimizations that preserves Clojure principles and
can yield significant performance increases. This could also help dealing with performance critics
in these small mutable languages "benchmarks" that newbies attempt to clone in Clojure.

Thank's Rich !

Luc
Luc Préfontaine

Armageddon was yesterday, today we have a real problem...

Eric

não lida,
5 de ago. de 2009, 22:56:3405/08/2009
para Clojure
On Aug 3, 8:39 pm, Rich Hickey <richhic...@gmail.com> wrote:
>
> In short, the O(1) overhead is less than the cost of even a single
> edit. So, e.g. into/vec/vector now use transients unconditionally if
> possible.
>
> Rich

Are we going to feel a big performance boost once all of the core
functions are rewritten using transients? I can't wait.

Are hashmaps next? I can think of a few places in my code that would
definitely benefit from them.

Eric

Rich Hickey

não lida,
6 de ago. de 2009, 07:53:4306/08/2009
para Clojure


On Aug 5, 10:10 pm, Luc Prefontaine <lprefonta...@softaddicts.ca>
wrote:
> I like this very much... that's the kind of clever optimizations that
> preserves Clojure principles and
> can yield significant performance increases. This could also help
> dealing with performance critics
> in these small mutable languages "benchmarks" that newbies attempt to
> clone in Clojure.
>
> Thank's Rich !
>

You're welcome!

And special thanks to Christophe Grand, who (quickly!) applied the
same technique to the hash maps and contributed that yesterday. So
now, in the master branch, vectors and hash maps support transients.
Everyone please try them out (where appropriate :).

Rich

Andy Fingerhut

não lida,
6 de ago. de 2009, 13:40:4306/08/2009
para Clojure
Thank you, Christophe! I've been wanting to try those out.

I made changes to 3 lines of my Clojure program for the k-nucleotide
benchmark, which spends most of its time in a function tally-dna-subs-
with-len that creates a hash map counting the number of times that
each of a bunch of length k strings occurs in a long string. Source
here, if you're curious:

http://github.com/jafingerhut/clojure-benchmarks/blob/38e1f592ca3befe94a0674a5f5a43d952cd369b3/knuc/knucleotide.clj-7.clj

It went from about 19 minutes down to about 12 minutes. Excellent
improvement for a small change to the code. That brings it down to
about 7.7 times the run time of the Java version from the language
shootout web site.

Andy

Patrick Sullivan

não lida,
7 de ago. de 2009, 04:07:1907/08/2009
para Clojure
Testing Transient w/Hashmaps (Thanks Cristophe!) and it seems like the
object won't store more then 8 keys. At first I thought it was my
frequency function
that was rolling it up, but then I simply tried creating a transient
object and manually assoc! ing a bunch of items into it. After the
8th it seemed to stop taking new
keys.

Am I doing something silly here or is this a bug?

~Patrick Sullivan

Christophe Grand

não lida,
7 de ago. de 2009, 09:38:4407/08/2009
para clo...@googlegroups.com
Hi Patrick !

Can you post some code. here is what I get:
user=> (-> {} transient (assoc! :a 1) (assoc! :b 2) (assoc! :c 3) (assoc! :d 4)
(assoc! :e 5) (assoc! :f 6) (assoc! :g 7) (assoc! :h 8) (assoc! :i 9) persistent!)
{:a 1, :c 3, :b 2, :f 6, :g 7, :d 4, :e 5, :i 9, :h 8}
user=> (persistent! (reduce #(assoc! %1 (str "k" %2) %2) (transient {}) (range 2
0)))
{"k0" 0, "k1" 1, "k2" 2, "k3" 3, "k4" 4, "k5" 5, "k10" 10, "k6" 6, "k11" 11, "k7"
 7, "k12" 12, "k8" 8, "k13" 13, "k9" 9, "k14" 14, "k15" 15, "k16" 16, "k17" 17,
 "k18" 18, "k19" 19}

Christophe
--
Professional: http://cgrand.net/ (fr)
On Clojure: http://clj-me.blogspot.com/ (en)

tmountain

não lida,
7 de ago. de 2009, 10:07:2107/08/2009
para Clojure
This is awesome. I'm curious if support for maps is planned in
addition to vectors? A lot of my code makes heavy use of maps, and it
would be great to get a performance boost.

Travis

tmountain

não lida,
7 de ago. de 2009, 10:11:5807/08/2009
para Clojure
Oops, only saw the first page of this thread (still getting used to
Google Groups). I apologize for missing this one.

"And special thanks to Christophe Grand, who (quickly!) applied the
same technique to the hash maps and contributed that yesterday. So
now, in the master branch, vectors and hash maps support transients.
Everyone please try them out (where appropriate :). "

Patrick Sullivan

não lida,
7 de ago. de 2009, 10:15:0807/08/2009
para Clojure
I don't have the EXACT code handy to c/p (at work now) but I did
something like the following.
(apologies for doing it such an iterative looking way, never got
comfortable with -> ;-))

(def transhashmap (transient {})
(assoc transhashmap "a" 1)
(assoc transhashmap "b" 2)
etc

Then when I did (count transhashmap) I never got higher than 8.
Perhaps something about the way it is handling strings/characters as a
hash is broken? I know my original code that screwed me up was taking
a long text and breaking it down into wordcounts.

~Patrick

Patrick Sullivan

não lida,
7 de ago. de 2009, 10:18:0607/08/2009
para Clojure
Err assoc! obviously ;-)

(Sorry for the double post, didn't want to confuse Cristophe).

On Aug 7, 8:15 am, Patrick Sullivan <wizardofwestma...@gmail.com>
wrote:

John Newman

não lida,
7 de ago. de 2009, 10:20:3307/08/2009
para clo...@googlegroups.com
(def transhashmap (transient {}) 
(assoc transhashmap "a" 1) 
(assoc transhashmap "b" 2) 
etc

Isn't that what Rich was talking about, about not bashing in place?
--
John

AlexK

não lida,
7 de ago. de 2009, 09:30:0007/08/2009
para Clojure


On 7 Aug., 10:07, Patrick Sullivan <wizardofwestma...@gmail.com>
wrote:
>
> Am I doing something silly here or is this a bug?

You probably are using conj! for the side-effect, but after growing
the hashmap to size 8 conj! returns a different map.

user> (def foo (transient {1 1 2 2 3 3 4 4 5 5 6 6 7 7 8
8}))
#'user/
foo
user> (class
foo)
clojure.lang.PersistentArrayMap
$TransientArrayMap
user> (def new-foo (conj! foo {9
9}))
#'user/new-
foo
user> (class
foo)
clojure.lang.PersistentArrayMap
$TransientArrayMap
user> (class new-
foo)
clojure.lang.PersistentHashMap
$TransientHashMap
user> (identical? foo new-
foo)
false
user> (def foo (conj! foo {9
9}))
#'user/
foo
user> (persistent!
foo)
{1 1, 2 2, 3 3, 4 4, 5 5, 6 6, 7 7, 8 8, 9
9}

remember to capture the return value of the modifying function calls.

Alex

Patrick Sullivan

não lida,
7 de ago. de 2009, 10:30:3207/08/2009
para Clojure
Ah hah, yeah I'm dumb, thanks to you and AlexK for catching my
silliness.

Funny how when I'm using normal clojure persistant structs I don't
think about doing it the right way twice, but when doing it as a
transient I slip into old imperative habits *headslap*

~Patrick

Christophe Grand

não lida,
10 de ago. de 2009, 14:15:1510/08/2009
para clo...@googlegroups.com
Hi Andy,


On Thu, Aug 6, 2009 at 7:40 PM, Andy Fingerhut <andy_fi...@alum.wustl.edu> wrote:
Thank you, Christophe!  I've been wanting to try those out.

I made changes to 3 lines of my Clojure program for the k-nucleotide
benchmark, which spends most of its time in a function tally-dna-subs-
with-len that creates a hash map counting the number of times that
each of a bunch of length k strings occurs in a long string.  Source
here, if you're curious:

http://github.com/jafingerhut/clojure-benchmarks/blob/38e1f592ca3befe94a0674a5f5a43d952cd369b3/knuc/knucleotide.clj-7.clj

It went from about 19 minutes down to about 12 minutes.  Excellent
improvement for a small change to the code.  That brings it down to
about 7.7 times the run time of the Java version from the language
shootout web site.


Could you try my "leafless" branch; http://github.com/cgrand/clojure/tree/leafless ?

Thanks,

Christophe

samppi

não lida,
10 de ago. de 2009, 23:08:5010/08/2009
para Clojure
Excellent, excellent. But I'm wondering, is it planned (or feasible)
for structmap transients to be supported too? I often use and "modify"
protean structmaps in loops, and I'd love to know if the concept of
transients can be applied to them.

Andy Fingerhut

não lida,
11 de ago. de 2009, 06:28:3111/08/2009
para Clojure
On Aug 10, 11:15 am, Christophe Grand <christo...@cgrand.net> wrote:
> Hi Andy,
>
> On Thu, Aug 6, 2009 at 7:40 PM, Andy Fingerhut <
>
>
>
> andy_finger...@alum.wustl.edu> wrote:
> > Thank you, Christophe!  I've been wanting to try those out.
>
> > I made changes to 3 lines of my Clojure program for the k-nucleotide
> > benchmark, which spends most of its time in a function tally-dna-subs-
> > with-len that creates a hash map counting the number of times that
> > each of a bunch of length k strings occurs in a long string.  Source
> > here, if you're curious:
>
> >http://github.com/jafingerhut/clojure-benchmarks/blob/38e1f592ca3befe...
>
> > It went from about 19 minutes down to about 12 minutes.  Excellent
> > improvement for a small change to the code.  That brings it down to
> > about 7.7 times the run time of the Java version from the language
> > shootout web site.
>
> Could you try my "leafless" branch;http://github.com/cgrand/clojure/tree/leafless?
>
> Thanks,
>
> Christophe

And Christophe's latest improvements to transient support for maps
improves the running time of my k-nucleotide benchmark program from
about 12 minutes to about 9.5 minutes. Very nice stuff. It isn't in
Rich's clojure repository yet, but you can get the changes from
Christophe's github repo if you want to try it out.

Thanks,
Andy

Christophe Grand

não lida,
11 de ago. de 2009, 07:58:0311/08/2009
para clo...@googlegroups.com

Great news!
If anyone is willing to try, I merged the "leafless" branch in the master branch, so pull master. Plus, in master, you'll get transient support for hash sets.
http://github.com/cgrand/clojure/tree/master

Christophe
Responder a todos
Responder ao autor
Encaminhar
0 nova mensagem