hash->list with try-order? (like hash-map)

119 views
Skip to first unread message

unlimitedscolobb

unread,
Oct 12, 2021, 7:01:06 PM10/12/21
to Racket Users
Hi,

I wrote myself this little function:

(define (hash->ordered-list h)
  (hash-map h cons #t))

which uses the try-order? argument of hash-map.

Is there a reason for hash->list not have an optional argument try-order?  Or perhaps having such a standalone function would be better?

I was planning to submit a patch, but then I thought I may be missing something.

-
Sergiu

George Neuner

unread,
Oct 13, 2021, 11:37:39 AM10/13/21
to racket users
I can't speak for the Racket team, but ...

Hash tables entries inherently are unordered, so there really is no
reason to expect an ordered list from reading out the data.  Also, the
docs indicate that 'try-order?' doesn't work for all data types - so it
may produce unexpected results.  Further, sorting the output potentially
can take a lot of extra time ... having never tried to get ordered
output from hash-map, I can only hypothesize that (at least in some
cases) it may be faster to create the unordered list and then sort it
than to try to produce an ordered list with 'try-order?'.

Certainly, you are welcome to submit a change, but I think it would be
best to leave the existing behavior and make ordered output an addition.

YMMV,
George

George Neuner

unread,
Oct 13, 2021, 11:54:40 AM10/13/21
to racket users

On 10/12/2021 7:01 PM, unlimitedscolobb wrote:

unlimitedscolobb

unread,
Oct 20, 2021, 5:53:35 PM10/20/21
to Racket Users
Hi,

Thank you George for your answer.

On Wednesday, October 13, 2021 at 5:37:39 PM UTC+2 gneuner2 wrote:

On 10/12/2021 7:01 PM, unlimitedscolobb wrote:
> I wrote myself this little function:
>
> (define (hash->ordered-list h)
>   (hash-map h cons #t))
>
> which uses the try-order? argument of hash-map.
>
> Is there a reason for hash->list not have an optional argument
> try-order?  Or perhaps having such a standalone function would be better?
>
> I was planning to submit a patch, but then I thought I may be missing
> something.
>
> -
> Sergiu
>

I can't speak for the Racket team, but ...

Hash tables entries inherently are unordered, so there really is no
reason to expect an ordered list from reading out the data.  Also, the
docs indicate that 'try-order?' doesn't work for all data types - so it
may produce unexpected results. 

I have two main use cases for producing an ordered list from a hash table:

1. A canonical way to pretty print a hash table: In my projects, I carry around and print out hash tables a lot, so I like the elements to appear in the same order all the time, whatever that order may be. Luckily, `hash-map` orders symbols alphabetically and numbers according to the natural order, so it's perfect for my use.

2. Testing the contents of a mutable hash table: The only way I found to that is to convert the hash table to an ordered list and compared it to the test list. This is clearly not the most efficient use of a hash table, but I can totally go with that, since it's about testing and not the actual performance.

Of course, I am totally open to learning better ways of doing these things!
 
Further, sorting the output potentially
can take a lot of extra time ... having never tried to get ordered
output from hash-map, I can only hypothesize that (at least in some
cases) it may be faster to create the unordered list and then sort it
than to try to produce an ordered list with 'try-order?'.

I scrounged around Racket sources a bit, and even though I am not sure I looked in the right place, I got the impression that that is what hash-map with try-order? set to #t does already: creating an unordered list out of the hash table and then `sort`ing it.

Certainly, you are welcome to submit a change, but I think it would be
best to leave the existing behavior and make ordered output an addition.



I pored a lot over this, and the reasoning which won at the end was that `hash->list` is basically a call to `hash-map` with `cons` as the function to apply. Thus, adding try-order? to hash->list is a very easy change, so why not have it.

Just to be clear, my PR does not alter the default behavior of `hash->list`, and only adds an optional argument.

-
Sergiu

George Neuner

unread,
Oct 21, 2021, 5:26:16 AM10/21/21
to racket...@googlegroups.com

On 10/20/2021 5:53 PM, unlimitedscolobb wrote:
I have two main use cases for producing an ordered list from a hash table:

1. A canonical way to pretty print a hash table: In my projects, I carry around and print out hash tables a lot, so I like the elements to appear in the same order all the time, whatever that order may be. Luckily, `hash-map` orders symbols alphabetically and numbers according to the natural order, so it's perfect for my use.

It's fine if it works for you.  Just beware hashing on things like lists, structs, objects, etc.

You might look into Laurent Orseau's "text-table" package.   Tables are a great way to print structured output.



2. Testing the contents of a mutable hash table: The only way I found to that is to convert the hash table to an ordered list and compared it to the test list. This is clearly not the most efficient use of a hash table, but I can totally go with that, since it's about testing and not the actual performance.

Of course, I am totally open to learning better ways of doing these things!

Depends on what you're trying to do.  Sometimes changing the data format IS the best way.  That said ...

Be aware that the code fragments below were just made up as I wrote this ... they have not been tested and may contain unbalanced parentheses but they should give you the idea.  Nothing here depends on the ordering of data in the hashes.  Also if you prefer "do" loops to "for" loops, you can use them instead.

Note also the use of  "in-list", "in-hash-pairs","in-mutable-hash-pairs".  Racket "for" loops work with sequences, and although many data types - including lists and hashes - will implicitly ACT as sequences, explicitly using the relevant sequence constructors can make your "for" loops run faster.

    see https://docs.racket-lang.org/reference/sequences.html


   =====================


To see if 2 hashes contain the same set of keys:

    (and (= (hash-count hash1) (hash-count hash2))
         (for/and ([k (in-list (hash-keys hash1))])
           (hash-has-key? hash2 k)))

There is a function "hash-keys-subset?"  that checks if the keys in one hash are a subset of keys in another hash.  It generally will be faster than an equivalent loop, but it requires that both hashes use the same key comparison function.


To see if 2 hashes contain the same set of (k,v) pairs:

    ; immutable
    (for/and ([(k,v) (in-hash-pairs hash1)])
        (equal v (hash-ref hash2 k fail))

    ; mutable
    (for/and ([(k,v) (in-mutable-hash-pairs hash1)])
        (equal v (hash-ref hash2 k fail))



Figuring out the difference between one hash vs another is a bit harder, but a loop similar to the equality check works for this also:

    (for/list ([(k,v) (in-{mutable-}hash-pairs hash1)]
                #:unless (equal v (hash-ref hash2 k fail)))
       (cons k v))

Note that the ordering matters - the loop finds things that are in hash1 but not in hash2.  Also instead of creating a list of what's missing, you could create another hash:

    ; create immutable hash
    (for/hash ([(k,v) (in-{mutable-}hash-pairs hash1)]
                #:unless (equal v (hash-ref hash2 k fail)))
       (values k v))

    ; update a mutable hash
    (for ([(k,v) (in-{mutable-}hash-pairs hash1)]
                #:unless (equal v (hash-ref hash2 k fail)))
       (hash-set! result k v))

Unfortunately, there is no variant of "for" that creates mutable hashes.  But the general form works for anything.



Obviously there is a theme here.  <grin>

You are free to mix and match things: if your test data already is in lists, you can use the lists directly - either as the source sequences or as the lookup targets (since it's only a test, searching lists with "member" et al shouldn't matter  <wink>).

Hope this helps,
George

David Storrs

unread,
Oct 22, 2021, 12:45:18 PM10/22/21
to George Neuner, Racket Users
On Thu, Oct 21, 2021 at 5:26 AM George Neuner <gneu...@comcast.net> wrote:

On 10/20/2021 5:53 PM, unlimitedscolobb wrote:



You can get a lot of mileage out of the 'set' datatype, which removes ordering from the equation, especially since lists will act as sets in a pinch.  (When you want to improve performance, use 'in-set' and/or 'list->set'.

To see if 2 hashes contain the same set of keys:

    (and (= (hash-count hash1) (hash-count hash2))
         (for/and ([k (in-list (hash-keys hash1))])
           (hash-has-key? hash2 k)))

Alternatively:

(set=? (hash-keys hash1) (hash-keys hash2))


; Return an unordered list of the keys that are in hash1 but not in hash2
(set-subtract (hash-keys hash1) (hash-keys hash2))

; Get a new hash consisting of the key/values that are in hash1 but not in hash2
(for/hash ([k (set-subtract (hash-keys hash1) (hash-keys hash2))])
  (values k (hash-ref hash1 k)))

; Get a ore detailed breakdown:
(require handy)
(define hash1 (for/hash ([k '(a b c d e f g)] [v 10]) (values k v)))
(define hash2 (for/hash ([k '(a b c d e z y)] [v 10]) (values k v)))
(define hash3 (hash-set* hash2 'c 111 'd 184))
(disjunction hash1 hash3)
Result:
(dict-disjunction
 '#hash((c . (2 111)) (d . (3 184))); values that differ between the hashes
 '#hash((f . 5) (g . 6)) ; key/values that exist only in hash1
 '#hash((y . 6) (z . 5)) ; key/values that exist only in hash3
 '#hash((a . 0) (b . 1) (c . 2) (d . 3) (e . 4) (f . 5) (g . 6)) ; hash1
 '#hash((a . 0) (b . 1) (c . 111) (d . 3) (e . 4) (y . 6) (z . 5))) ; hash3


Unfortunately, there is no variant of "for" that creates mutable hashes.  But the general form works for anything.

If you don't mind inefficiency then handy can be, well, handy:

(define imm-h  (for/hash ([k '(a b c)][v 3]) (values k v)))
(immutable? imm-h)
(immutable? (hash->mutable imm-h))

hash->mutable takes an existing hash, which can be either immutable or mutable, and adds its key/values to a new mutable hash one by one, then returns that hash.

The handy module is a bit of a Fibber McGee that really needs to be broken out.  It's thoroughly documented, but unfortunately only in comments.  Converting that to proper scribble is one of my Copious Free Times projects.

unlimitedscolobb

unread,
Oct 23, 2021, 3:37:01 AM10/23/21
to Racket Users
On Thursday, October 21, 2021 at 11:26:16 AM UTC+2 gneuner2 wrote:

On 10/20/2021 5:53 PM, unlimitedscolobb wrote:
I have two main use cases for producing an ordered list from a hash table:

1. A canonical way to pretty print a hash table: In my projects, I carry around and print out hash tables a lot, so I like the elements to appear in the same order all the time, whatever that order may be. Luckily, `hash-map` orders symbols alphabetically and numbers according to the natural order, so it's perfect for my use.

It's fine if it works for you.  Just beware hashing on things like lists, structs, objects, etc.


Good point indeed, thank you.
 
You might look into Laurent Orseau's "text-table" package.   Tables are a great way to print structured output.

Nice package!  These are things which I typically want to do sometimes.

At the moment I rely on Org-mode: when evaluating a Racket source code block, I can ask Org-mode to typeset the resulting list as a table, which is often perfect for my purposes.
 

2. Testing the contents of a mutable hash table: The only way I found to that is to convert the hash table to an ordered list and compared it to the test list. This is clearly not the most efficient use of a hash table, but I can totally go with that, since it's about testing and not the actual performance.

Of course, I am totally open to learning better ways of doing these things!

Depends on what you're trying to do.  Sometimes changing the data format IS the best way.  That said ...

Be aware that the code fragments below were just made up as I wrote this ... they have not been tested and may contain unbalanced parentheses but they should give you the idea.  Nothing here depends on the ordering of data in the hashes.  Also if you prefer "do" loops to "for" loops, you can use them instead.

Note also the use of  "in-list", "in-hash-pairs","in-mutable-hash-pairs".  Racket "for" loops work with sequences, and although many data types - including lists and hashes - will implicitly ACT as sequences, explicitly using the relevant sequence constructors can make your "for" loops run faster.

    see https://docs.racket-lang.org/reference/sequences.html


Oh, sequences, of course!  I try using them as much as I can in my code because they are so nice, but I just forgot about them in this particular situation :D
Thank you for all these examples George!

Which makes me wonder: why is there not a hash table comparison function which would be built like one of your suggestions?  I'd typically expect such a comparison function to be part of a hash table library.  Another opportunity for contributions I guess.

-
Sergiu
 

Hope this helps,
George

unlimitedscolobb

unread,
Oct 23, 2021, 10:16:09 AM10/23/21
to Racket Users
On Friday, October 22, 2021 at 6:45:18 PM UTC+2 david....@gmail.com wrote:
On Thu, Oct 21, 2021 at 5:26 AM George Neuner <gneu...@comcast.net> wrote:

On 10/20/2021 5:53 PM, unlimitedscolobb wrote:



You can get a lot of mileage out of the 'set' datatype, which removes ordering from the equation, especially since lists will act as sets in a pinch.  (When you want to improve performance, use 'in-set' and/or 'list->set'.

To see if 2 hashes contain the same set of keys:

    (and (= (hash-count hash1) (hash-count hash2))
         (for/and ([k (in-list (hash-keys hash1))])
           (hash-has-key? hash2 k)))

Alternatively:

(set=? (hash-keys hash1) (hash-keys hash2))

Ah, sure, good point!

; Return an unordered list of the keys that are in hash1 but not in hash2
(set-subtract (hash-keys hash1) (hash-keys hash2))

; Get a new hash consisting of the key/values that are in hash1 but not in hash2
(for/hash ([k (set-subtract (hash-keys hash1) (hash-keys hash2))])
  (values k (hash-ref hash1 k)))

; Get a ore detailed breakdown:
(require handy)
(define hash1 (for/hash ([k '(a b c d e f g)] [v 10]) (values k v)))
(define hash2 (for/hash ([k '(a b c d e z y)] [v 10]) (values k v)))
(define hash3 (hash-set* hash2 'c 111 'd 184))
(disjunction hash1 hash3)
Result:
(dict-disjunction
 '#hash((c . (2 111)) (d . (3 184))); values that differ between the hashes
 '#hash((f . 5) (g . 6)) ; key/values that exist only in hash1
 '#hash((y . 6) (z . 5)) ; key/values that exist only in hash3
 '#hash((a . 0) (b . 1) (c . 2) (d . 3) (e . 4) (f . 5) (g . 6)) ; hash1
 '#hash((a . 0) (b . 1) (c . 111) (d . 3) (e . 4) (y . 6) (z . 5))) ; hash3

Wow, `handy` is very handy!  I wasn't aware of its existence, but I'll guess you've got yourself a new user :-)
 

Unfortunately, there is no variant of "for" that creates mutable hashes.  But the general form works for anything.

If you don't mind inefficiency then handy can be, well, handy:

(define imm-h  (for/hash ([k '(a b c)][v 3]) (values k v)))
(immutable? imm-h)
(immutable? (hash->mutable imm-h))

hash->mutable takes an existing hash, which can be either immutable or mutable, and adds its key/values to a new mutable hash one by one, then returns that hash.

Very nice!

The handy module is a bit of a Fibber McGee that really needs to be broken out.  It's thoroughly documented, but unfortunately only in comments.  Converting that to proper scribble is one of my Copious Free Times projects.

 Ah, I see :-)

/me looks at his own CFT projects and sighs lightly.

-
Sergiu
Reply all
Reply to author
Forward
0 new messages