--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+unsubscribe@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
>> Also, I think the transducer version should always be faster, no matter the size of the source collection (no threshold).
It's a bit more complicated than that, mostly because transducer pipelines require about 2 allocations per step during creation. Also, most of the performance boost from transducers is due to less garbage being created, and some times the heap of the JVM is so large you'll never see much change from switching to transducers.
Don't get me wrong, transducers are great and I often default to them over seqs, but in micro-benchmarks like this there's too much in play to always see a 100% performance boost.
On Mon, Nov 27, 2017 at 12:55 PM, David Bürgin <dbue...@gluet.ch> wrote:
Jiacai –
I saw you updated the gist. Just in case it passed you by: performance
profits from the source collection being reducible. So pouring ‘dataset’
into a vector beforehand should speed up the processing quite a bit.
Also, I think the transducer version should always be faster, no matter
the size of the source collection (no threshold).
--
David
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscribe@googlegroups.com.
You're starting from a lazy sequence, not a self-reducible collection. That's not wrong, but it's removing a key transduce/reduce power to work with reducible colls.
I think this means that collections which implement CollReduce have an implementation of reduce which is faster than just calling first and next on them. Transducers will use this implementation when available. This is one of the biggest speedup of transducers over seq, which always use first and next to iterate. That said, if the collection does not offer a faster CollReduce implementation, it will fallback to using first and next, and thus won't be faster than sequences.
| Roughly how much performance lag do we get when not working a transduction from a (self) reducible collection, and moreso why exactly?
I covered why above, but I do not know how much performance lag there is. It would depend on the concrete collection and how faster its CollReduce implementation is over using first and next.
| Should we typically choose a different vehicle for stream processing from large files, over using transducers? My current use case is stream-processing from large files.
I think as a stream, you won't benefit from CollReduce, but I'm not sure. I don't think you can really reduce over the stream faster than first and next. That said, you might benefit from loop fusion if your operations can be fused.
Disclaimer: I only vaguely know what I'm talking about, you probably want a more expert opinion.
Hi,As this thread seems to have been going down this path, I am joining it after having spent some time fiddling the source code of some clojure.core transducers and familiarizing with how to create, compose and use transducers in transducing processes. By the way I think the reference could be more explicit about the relationship between transducers, transducing processes and contexts for applying transducers (as is, IMO a lot of ambiguity arises, causing a lot of confusion in getting started). So, it was noted earlier in this thread by Alex Miller:You're starting from a lazy sequence, not a self-reducible collection. That's not wrong, but it's removing a key transduce/reduce power to work with reducible colls.I think that's also the case with applying any transducer to a file input (?!) and I am therefore wondering about:
- I didn't fully grasp the difference between self-reducible collections v.s. other ones (in this context, and in general).
Can you please delineate?
- Roughly how much performance lag do we get when not working a transduction from a (self) reducible collection, and moreso why exactly?
- Should we typically choose a different vehicle for stream processing from large files, over using transducers? My current use case is stream-processing from large files.
--
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscribe@googlegroups.com.
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to a topic in the Google Groups "Clojure" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clojure/JjiYPEMQK4s/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clojure+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.