srfi-171 transducers for racket

79 visualizzazioni
Passa al primo messaggio da leggere

Linus Björnstam

da leggere,
10 dic 2019, 04:16:1010/12/19
a us...@racket-lang.org
Hi everyone!

I was asked to port my srfi-171 (transducers) to racket by a staggering number of people (3!!), so by extremely popular demand (by my standards) I have done so: https://git.sr.ht/~bjoli/racket-srfi-171-transducers

These transducers are more or less the same as the ones in clojure, so they have very little overhead (each transduction step is just a procedure call) and work eagerly. They currently use mutable hidden state (like the clojure transducers) for the stateful transducers, but someone clever could probably make the state at least visible without too much performance impact. The simplest way would be visible mutable state, which I had working for guile-scheme but chose not to use because it became very slow in some schemes.

The documentation in the SRFI document still holds with some small caveats: bytevector-u8-reduce/transduce is now bytes-transduce/reduce and tdelete-duplicates now takes three symbols 'eq?, 'eqv? or 'equal? instead of arbitrary equality predicates.

I have really no idea how to package things for racket, nor do I have much interest in doing so. I have played with it a bit, it seems to work. There is a small test-suite still written in guile scheme, which should be trivial to port. If someone wants to package this as a proper racket package, I would be happy to accept pull requests or even transfer the repository to someone else. The license used is sub-licenseable, so feel free. If I find the repo to my liking I will deprecate my port and link to you.

Ideas for changes I would have done if I was using racket more than I actually do: a (transduce ...)-form that uses rackets sequence interface. Package it so It could be installed using raco. Port the test-suite. Another part is figuring out what to call the library.

The SRFI-document can be found here: https://srfi.schemers.org/srfi-171/srfi-171.html

The rebellion transducers seem to be aimed at another use-case (streaming data, and parallelism) with other kind of guarrantees. It has a _significantly_ higher overhead. I tried the following in rebellion

#lang racket
(require rebellion/streaming/transducer)
(require rebellion/collection/list)

(define lst (build-list 100000 values))

(time (length (transduce (in-list lst)
(filtering odd?)
(mapping (lambda (x) (* x x)))
#:into into-list)))

and it takes about 5 seconds. The same thing using srfi-171 styled transducers takes about 3ms (if my head is with me this morning, that is about 1650x). I suspect I might be doing something wrong, or that there might be some quadratic behaviour somewhere.

Anyway, go forth and have some fun!

Best regards
Linus Björnstam
- Just a lowly classical musician programming for fun

Jack Firth

da leggere,
10 dic 2019, 06:02:0010/12/19
a Racket Users
This is great! Thank you for your work on this srfi. Transducers are relatively new to Racket's ecosystem and I'm happy to see more of the design space being explored.


The documentation in the SRFI document still holds with some small caveats: bytevector-u8-reduce/transduce is now bytes-transduce/reduce and tdelete-duplicates now takes three symbols 'eq?, 'eqv? or 'equal? instead of arbitrary equality predicates.

I have a question about that. While implementing the deduplicating transducer, I tried to figure out a way to make it support arbitrary equality relations. But as far as I could tell, it's impossible to do that in less than quadratic time because there's no way to check whether something is a duplicate without comparing it to every unique element seen so far. Is there an implementation of this SRFI that figured out an efficient way to do this in the general case?


I have really no idea how to package things for racket, nor do I have much interest in doing so. I have played with it a bit, it seems to work. There is a small test-suite still written in guile scheme, which should be trivial to port. If someone wants to package this as a proper racket package, I would be happy to accept pull requests or even transfer the repository to someone else. The license used is sub-licenseable, so feel free. If I find the repo to my liking I will deprecate my port and link to you.

Ideas for changes I would have done if I was using racket more than I actually do: a (transduce ...)-form that uses rackets sequence interface. Package it so It could be installed using raco. Port the test-suite. Another part is figuring out what to call the library.

I'd be happy to write an `info.rkt` that packages the code. For a package name, I suggest `srfi-171`. Would you be interested in writing Scribble documentation for the package?


The rebellion transducers seem to be aimed at another use-case (streaming data, and parallelism) with other kind of guarrantees. It has a _significantly_ higher overhead. I tried the following in rebellion

#lang racket
(require rebellion/streaming/transducer)
(require rebellion/collection/list)

(define lst (build-list 100000 values))

(time (length (transduce (in-list lst)
                         (filtering odd?)
                         (mapping (lambda (x) (* x x)))
                         #:into into-list)))

and it takes about 5 seconds. The same thing using srfi-171 styled transducers takes about 3ms (if my head is with me this morning, that is about 1650x). I suspect I might be doing something wrong, or that there might be some quadratic behaviour somewhere.

You're not doing anything wrong. Rebellion's transducers have awful constant (but not quadratic) factors and make lots of unnecessary allocations. There's no fundamental reason for this; the protocol can support efficient zero-allocation transducer implementations perfectly fine. I just haven't put work into that yet. See this issue for a summary of the problem and some possible ways to improve performance. For anyone reading this who's interested in benchmarking and optimizing tight loops, I'd love to hear your input.

Linus Björnstam

da leggere,
10 dic 2019, 06:50:4410/12/19
a racket...@googlegroups.com
Hi Jack!

The "tdelete-duplicates" uses a hash-table to store already seen elements: if the element is in the hash table, just filter it out. If it is not, we do: (hash-set! already-seen element #t). That should be constant timeish. This means that you cannot (and should not) use the same reducing function as returned by ((tdelete-duplicates) rcons) for reducing different collections (unless you want to delete the duplicates in the first collection in the second collection as well). the transducer->reducer process is o(n), and the amount of transducers composed is usually small, so there is no need to re-use a reducer function unless you actually want to.

I would love some help into making it a racket pkg if you have the time. I have absolutely zero experience. I can probbly port the test suite myself, but I have quite a lot of similar work to do (srfi-document->texinfo for guile), so that will be quite some time in the future.
--
Linus Björnstam

On Tue, 10 Dec 2019, at 12:02, Jack Firth wrote:
> This is great! Thank you for your work on this srfi. Transducers are
> relatively new to Racket's ecosystem and I'm happy to see more of the
> design space being explored.
>
> > The documentation in the SRFI document still holds with some small caveats: bytevector-u8-reduce/transduce is now bytes-transduce/reduce and tdelete-duplicates now takes three symbols 'eq?, 'eqv? or 'equal? instead of arbitrary equality predicates.
>
> I have a question about that. While implementing the deduplicating
> <https://docs.racket-lang.org/rebellion/Transducers.html#%28def._%28%28lib._rebellion%2Fstreaming%2Ftransducer..rkt%29._deduplicating%29%29> transducer, I tried to figure out a way to make it support arbitrary equality relations. But as far as I could tell <https://github.com/jackfirth/rebellion/issues/310#issuecomment-546567800>, it's impossible to do that in less than quadratic time because there's no way to check whether something is a duplicate without comparing it to every unique element seen so far. Is there an implementation of this SRFI that figured out an efficient way to do this in the general case?
>
> > I have really no idea how to package things for racket, nor do I have much interest in doing so. I have played with it a bit, it seems to work. There is a small test-suite still written in guile scheme, which should be trivial to port. If someone wants to package this as a proper racket package, I would be happy to accept pull requests or even transfer the repository to someone else. The license used is sub-licenseable, so feel free. If I find the repo to my liking I will deprecate my port and link to you.
> >
> > Ideas for changes I would have done if I was using racket more than I
> actually do: a (transduce ...)-form that uses rackets sequence
> interface. Package it so It could be installed using raco. Port the
> test-suite. Another part is figuring out what to call the library.
> I'd be happy to write an `info.rkt` that packages the code. For a
> package name, I suggest `srfi-171`. Would you be interested in writing
> Scribble documentation for the package?
>
> > The rebellion transducers seem to be aimed at another use-case (streaming data, and parallelism) with other kind of guarrantees. It has a _significantly_ higher overhead. I tried the following in rebellion
> >
> > #lang racket
> > (require rebellion/streaming/transducer)
> > (require rebellion/collection/list)
> >
> > (define lst (build-list 100000 values))
> >
> > (time (length (transduce (in-list lst)
> > (filtering odd?)
> > (mapping (lambda (x) (* x x)))
> > #:into into-list)))
> >
> > and it takes about 5 seconds. The same thing using srfi-171 styled transducers takes about 3ms (if my head is with me this morning, that is about 1650x). I suspect I might be doing something wrong, or that there might be some quadratic behaviour somewhere.
>
> You're not doing anything wrong. Rebellion's transducers have awful
> constant (but not quadratic) factors and make lots of unnecessary
> allocations. There's no fundamental reason for this; the protocol can
> support efficient zero-allocation transducer implementations perfectly
> fine. I just haven't put work into that yet. See this issue
> <https://github.com/jackfirth/rebellion/issues/351> for a summary of
> the problem and some possible ways to improve performance. For anyone
> reading this who's interested in benchmarking and optimizing tight
> loops, I'd love to hear your input.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to racket-users...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/racket-users/8c036a47-6d90-4bcd-bfd3-fdfdd2f8d05b%40googlegroups.com <https://groups.google.com/d/msgid/racket-users/8c036a47-6d90-4bcd-bfd3-fdfdd2f8d05b%40googlegroups.com?utm_medium=email&utm_source=footer>.

Jack Firth

da leggere,
10 dic 2019, 13:54:0810/12/19
a Racket Users
The "tdelete-duplicates" uses a hash-table to store already seen elements: if the element is in the hash table, just filter it out.  If it is not, we do: (hash-set! already-seen element #t). That should be constant timeish.
 
I understand how that works for the usual eq? / eqv? / equal? equality relations, but how could that work for arbitrary equivalence relations? If you passed a case-insensitive string comparison function to tdelete-duplicates, storing already seen elements in a hash table wouldn't help because two "equal" strings could have different hash codes.

Linus Björnstam

da leggere,
10 dic 2019, 16:14:5310/12/19
a racket...@googlegroups.com
For racket it does not work out of the box, but for scheme srfi69 allows for arbitrary equality predicates and specialized hash functions. I believe r6rs hashtables has a string-CI-hash, but I can be wrong.

For the racket port I removed that functionality, since I couldn't do it easily using the default hash tables. For that you can use make-custom-hash: https://docs.racket-lang.org/reference/dicts.html?q=hash#%28def._%28%28lib._racket%2Fdict..rkt%29._make-custom-hash%29%29 b

You could just wrap the hash function with something that normalizes strings and chars and whatever other data types you think you can use, but that would mean an eternal chase for edge cases.

--
Linus Björnstam

On Tue, 10 Dec 2019, at 19:54, Jack Firth wrote:
> > The "tdelete-duplicates" uses a hash-table to store already seen
> elements: if the element is in the hash table, just filter it out. If
> it is not, we do: (hash-set! already-seen element #t). That should be
> constant timeish.
> I understand how that works for the usual eq? / eqv? / equal? equality
> relations, but how could that work for *arbitrary* equivalence
> relations? If you passed a case-insensitive string comparison function
> to tdelete-duplicates, storing already seen elements in a hash table
> wouldn't help because two "equal" strings could have different hash
> codes.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to racket-users...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/racket-users/90506548-c7fc-4f0d-9321-00cb03490aec%40googlegroups.com <https://groups.google.com/d/msgid/racket-users/90506548-c7fc-4f0d-9321-00cb03490aec%40googlegroups.com?utm_medium=email&utm_source=footer>.
Rispondi a tutti
Rispondi all'autore
Inoltra
0 nuovi messaggi