On Thu, Nov 1, 2012 at 8:27 PM, Wolodja Wentland <
bab...@gmail.com> wrote:
> It seems to me as if we are currently figuring out which (boilerplate?)
> functions are missing in reducers.clj and that we will have a nice and
> well-integrated library at the end.
To be fair, it's in beta and it's open source; so if anyone thinks it
needs something they can always volunteer. :)
>> > P.S. Would it be possible to have something like fold-into-vec in clojure.reducers?
>>
>> Don't forget fold-into-map and fold-into-map-with, but both of those
>> will likely require a better merge/merge-with function for maps. :(
>
> Oh, fold-into-map and fold-into-map-with would be wonderful and I tried to
> implement the former along the lines of fold-into-vec, but the performance was
> abysmal. I am now using fold-into-vec + r/map with zipmap which is better, but
> I wouldn't consider that optimal.
The bottleneck with fold-into-map is that merge just doesn't scale up
well and you end up doing a significant amount of merging.
Currently a merge operation takes elements one at a time from one map,
and adds them one at a time to another.
This is incredibly wasteful for large maps as you already have two
"pre-sorted" tries, and can pair each branch together recursively. If
each branch node stored the number of child nodes, then you can assign
different threads to work on different branches as well. This would be
perfect for reducers, but from a quick look it didn't appear that any
of the key internals were exposed to be taken advantage of.
Unfortunately I haven't discovered a way to facilitate more efficient
merging without modifying the actual Clojure core source. I guess this
is my next step.
If you're interesting in using reducers over text files, I've uploaded
a library to facilitate that here;
http://github.com/thebusby/iota
I'll be cleaning up the code with more documentation and examples at
some point in the near future, and will send out an announcement then.
In the mean time though feel free to use it.