hash-filter: PR or not PR

35 views
Skip to first unread message

unlimitedscolobb

unread,
Oct 30, 2020, 7:35:31 PM10/30/20
to Racket Users
Hi,

I am currently using hash tables a lot, so inevitably I end up writing general functions.  I wrote the following `hash-filter`:

```
(define (hash-filter
         ht
         #:predicate [predicate #f]
         #:predicate/key [predicate/key
                          (if predicate
                              (λ (_ v) (predicate v))
                              (error 'hash-filter))])
  (for/fold ([filtered-pairs (hash-clear ht)])
            ([(k v) (in-hash ht)])
    (if (predicate/key k v)
        (hash-set filtered-pairs k v)
        filtered-pairs)))
```

Before I submit a pull request to the Racket repository, I would like to know whether adding this function to `racket/hash` is a good idea, or whether I should create a separate package with extra functions for hash tables.

-
Sergiu

jackh...@gmail.com

unread,
Oct 31, 2020, 5:32:22 AM10/31/20
to Racket Users
This is definitely a useful thing to have and I've wanted it myself before. However I'm generally of the opinion that we should avoid making more collection manipulation functions that are unnecessarily specialized to one type of collection. I'd like to see functions that operate on all sequences rather than on hash tables alone. Here are some usages of hash-filter compared with some existing alternatives:

(hash-filter ht #:predicate pred)
==
; Sequence composition
(make-immutable-hash
 (sequence->list
  (sequence-filter (lambda (pair) (pred (cdr pair)))
                   (sequence-map cons ht))))
==
; For macros
(for/hash ([(k v) (in-hash ht)] #:when (pred v)) (values k v))

The for macro is probably what I'd reach for in standard racket code. It has some drawbacks: the first time I tried to write it I forgot to write (values k v) and instead just wrote v. Keeping track of pulling apart the key and value, naming them with variables, and putting them back together is a little annoying.

The sequence composition approach is, in my humble opinion, completely unreadable and difficult to write to boot. The sequence-map function works totally fine on multivalued sequences but sequence-filter only works on single-value sequences, so we have to do a dance with cons car and cdr to turn the multivalued sequence of keys and values into a single-valued sequence of key-value cons pairs. Furthermore, the only built-in function for building a hash table from a sequence is overly specialized to lists, so you have to copy the sequence into a list solely so you can turn that list into a hash table with make-immutable-hash.

Instead of adding hash-filter to racket/hash, I think there's a few improvements we could make to the sequence composition side of things:

- Make sequence-filter allow multivalued sequences, provided the arity of the predicate is consistent with the arity of the sequence.
- Add a sequence->hash function that accepts a multivalued sequence of keys and values (exactly like what in-hash produces) and copies them into a hash table.

For the filter-by-value, filter-by-key, and filter-by-key-and-value cases, this lets us write:

(sequence->hash (sequence-filter (lambda (k v) (pred v)) ht)) ; Filter values
(sequence->hash (sequence-filter (lambda (k v) (pred k)) ht)) ; Filter keys
(sequence->hash (sequence-filter pred ht)) ; Filter key-value pairs

Which is close enough to hash-filter to satisfy my desire for conciseness, while still being general enough to work with other kinds of key-value sequences.

Shameless self plug: you might be interested in Rebellion's take on this, which uses transducers and key-value structs called entries:

(transduce (in-hash-entries ht) (filtering-values pred) #:into into-hash) ; Filter values
(transduce (in-hash-entries ht) (filtering-keys pred) #:into into-hash) ; Filter keys
(transduce (in-hash-entries ht) (filtering (lambda (e) (pred (entry-key e) (entry-value e)))) #:into into-hash) ; Filter key-value entries

unlimitedscolobb

unread,
Oct 31, 2020, 10:00:45 AM10/31/20
to Racket Users
On Saturday, October 31, 2020 at 10:32:22 AM UTC+1 jackh...@gmail.com wrote:
This is definitely a useful thing to have and I've wanted it myself before. However I'm generally of the opinion that we should avoid making more collection manipulation functions that are unnecessarily specialized to one type of collection.

I completely agree, and I think this is one of the reasons I felt the need to ask the question.

 
I'd like to see functions that operate on all sequences rather than on hash tables alone. Here are some usages of hash-filter compared with some existing alternatives:

(hash-filter ht #:predicate pred)
==
; Sequence composition
(make-immutable-hash
 (sequence->list
  (sequence-filter (lambda (pair) (pred (cdr pair)))
                   (sequence-map cons ht))))
==
; For macros
(for/hash ([(k v) (in-hash ht)] #:when (pred v)) (values k v))

Yeah!  I was looking at iteration macros, but forgot about #:when!  Thank you for reminding me, this code is good enough for my immediate purposes and I'll ditch hash-filter.  Iteration macros in Racket are ridiculously powerful and even though I use them all the time, I still don't feel like I know them well.

 
The for macro is probably what I'd reach for in standard racket code. It has some drawbacks: the first time I tried to write it I forgot to write (values k v) and instead just wrote v. Keeping track of pulling apart the key and value, naming them with variables, and putting them back together is a little annoying.

The sequence composition approach is, in my humble opinion, completely unreadable and difficult to write to boot. The sequence-map function works totally fine on multivalued sequences but sequence-filter only works on single-value sequences, so we have to do a dance with cons car and cdr to turn the multivalued sequence of keys and values into a single-valued sequence of key-value cons pairs. Furthermore, the only built-in function for building a hash table from a sequence is overly specialized to lists, so you have to copy the sequence into a list solely so you can turn that list into a hash table with make-immutable-hash.

I completely agree with your reasoning about the readability.  I'm also a little bit concerned about the performance of converting sequences to lists before building the hash table, but I'm not yet knowledgeable enough about these questions.

 
Instead of adding hash-filter to racket/hash, I think there's a few improvements we could make to the sequence composition side of things:

- Make sequence-filter allow multivalued sequences, provided the arity of the predicate is consistent with the arity of the sequence.
- Add a sequence->hash function that accepts a multivalued sequence of keys and values (exactly like what in-hash produces) and copies them into a hash table.

These sound interesting, thank you!  I feel like I'd give both a try, but no promises :-)


For the filter-by-value, filter-by-key, and filter-by-key-and-value cases, this lets us write:

(sequence->hash (sequence-filter (lambda (k v) (pred v)) ht)) ; Filter values
(sequence->hash (sequence-filter (lambda (k v) (pred k)) ht)) ; Filter keys
(sequence->hash (sequence-filter pred ht)) ; Filter key-value pairs

Very nice !


Which is close enough to hash-filter to satisfy my desire for conciseness, while still being general enough to work with other kinds of key-value sequences.

Shameless self plug: you might be interested in Rebellion's take on this, which uses transducers and key-value structs called entries:

I've never heard about Rebellion before in fact, thank you for the self plug :-) I started reading the docs, and my first impressions are that your library is very featureful and that it solves some of the small issues I've had with core Racket collections (e.g., the option structure, the unification of collection algorithms, etc.).

I'll keep on reading and see how I can use Rebellion and/or contribute, thanks!

-
Sergiu
Reply all
Reply to author
Forward
0 new messages