warning on use trie functions in #lang racket?

100 views
Skip to first unread message

John Clements

unread,
Jan 5, 2016, 1:49:29 PM1/5/16
to Racket Users
This program constructs a trie containing exactly two keys; each is a list of 256 integers. On my machine, it takes *twelve seconds*. The time taken appears to be n^2 in the length of the key, so doubling it to 256 means it’ll take about a minute to add a key.

#lang racket

(require pfds/trie)

(define (rand-list)
(for/list ([i (in-range 128)])
(random 256)))

(define t (trie (list (rand-list))))
(define u (time (bind (rand-list) 0 t)))

When I translate this to typed racket, the time taken is 0 ms.

I feel like I got a bit burned here, and an ordinary user might simply conclude that Racket libraries are teh suck.

Can we add a warning to the beginning of the documentation page for Tries (and perhaps for the other data structures as well) indicating that these functions are likely to be unusably slow in #lang racket?

I propose the following, at the top of the documentation. Possibly in boldface.

“NB: these data structures are written in Typed Racket. While they may be nominally usable in the (untyped) Racket language, the resulting dynamic checks will probably cause them to be unacceptably slow”

Suggestions? May I make a pull request here?

John



Sam Tobin-Hochstadt

unread,
Jan 5, 2016, 2:01:09 PM1/5/16
to John Clements, Racket Users
Yes, I think a warning at the top of the documentation for the `pfds` package would make sense, and probably Asumu would accept such a pull request. You might follow the phrasing in the math/array library.

Sam

--
You received this message because you are subscribed to the Google Groups "Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

John Clements

unread,
Jan 5, 2016, 2:39:21 PM1/5/16
to Asumu Takikawa, Racket Users
Asumu, does this make sense to you? Note that in particular, I think that a warning at the top of the pfds package wouldn’t have helped me; I think a warning at the top of each pfds page would make a lot more sense.

John

Jay McCarthy

unread,
Jan 5, 2016, 2:40:34 PM1/5/16
to John Clements, Asumu Takikawa, Racket Users
I would prefer taking this useful library and making it actually
useful to all Racket users.

Jay

On Tue, Jan 5, 2016 at 2:39 PM, 'John Clements' via Racket Users
--
Jay McCarthy
Associate Professor
PLT @ CS @ UMass Lowell
http://jeapostrophe.github.io

"Wherefore, be not weary in well-doing,
for ye are laying the foundation of a great work.
And out of small things proceedeth that which is great."
- D&C 64:33

Vincent St-Amour

unread,
Jan 5, 2016, 3:19:15 PM1/5/16
to Jay McCarthy, John Clements, Asumu Takikawa, Racket Users
(add1 Jay)

That kind of overhead has to be a bug somewhere, or a bad way to write
something. I hope.

Vincent

Benjamin Greenman

unread,
Jan 5, 2016, 3:29:04 PM1/5/16
to Vincent St-Amour, Jay McCarthy, John Clements, Asumu Takikawa, Racket Users
I'm afraid it's just O(n) contracts. For example:

#lang racket/base

(module stack typed/racket/base
  (define-type Stack (Listof Integer))
  (provide:
    [create (-> Stack)]
    [push (-> Integer Stack Stack)]
    [pop  (-> Stack Stack)])

  (define (create) '())
  (define push cons)
  (define pop cdr))

(require 'stack contract-profile)

(define (test)
  (for/fold ([st (create)])
            ([i (in-range 5000)])
    (push i st))
  (void))

(contract-profile (time (test)))
;; Running time is 80.46% contracts
;; 70/87 ms



Jay McCarthy

unread,
Jan 5, 2016, 3:30:12 PM1/5/16
to Benjamin Greenman, Vincent St-Amour, John Clements, Asumu Takikawa, Racket Users
Your comment is not evidence that Typed Racket libraries are useful to
Racket programmers, but in fact the opposite.

Jay

Benjamin Greenman

unread,
Jan 5, 2016, 3:33:16 PM1/5/16
to Jay McCarthy, Vincent St-Amour, John Clements, Asumu Takikawa, Racket Users
Right, my comment was intended to show that innocent Typed Racket libraries may be harmful to Racket programmers.

Sam Tobin-Hochstadt

unread,
Jan 5, 2016, 3:33:31 PM1/5/16
to Vincent St-Amour, Jay McCarthy, John Clements, Asumu Takikawa, Racket Users
The overhead appears to be a combination of the following:

- Typed Racket should see that checking whether something is a Trie
is a flat contract, but it doesn't.
- Typed Racket should be able to drop the result contract check
entirely, but it can't (I don't think this is expensive, though).
- Applying chaperones to a recursive data structure, and then
traversing that data structure, is quite slow.

Note that there's 2 passes over the data structure implied by the
contract, but they aren't expensive -- if you just write them yourself
with Racket code, they're basically instant. Everything is about the
cost of chaperoning the data.

Sam

On Tue, Jan 5, 2016 at 3:19 PM, Vincent St-Amour
<stam...@eecs.northwestern.edu> wrote:

Vincent St-Amour

unread,
Jan 5, 2016, 3:42:13 PM1/5/16
to Benjamin Greenman, Vincent St-Amour, Jay McCarthy, John Clements, Asumu Takikawa, Racket Users
Easy to work around.

#lang racket/base

(module stack typed/racket/base
(struct Stack ([s : (Listof Integer)]))
(provide:
[create (-> Stack)]
[push (-> Integer Stack Stack)]
[pop (-> Stack Stack)])

(define (create) (Stack '()))
(: push : Integer Stack -> Stack)
(define (push x s) (Stack (cons x (Stack-s s))))
(: pop : Stack -> Stack)
(define (pop s) (Stack (cdr (Stack-s s)))))

(require 'stack contract-profile)

(define (test)
(for/fold ([st (create)])
([i (in-range 5000)])
(push i st))
(void))

(contract-profile (test)) ; Running time is 0% contracts


TR works with the programs it's given, I agree. Maybe it can't do better
with the current version of pfds. That's fine.

But if pfds is doing somthing dumb, let's fix pfds.

Vincent

Sam Tobin-Hochstadt

unread,
Jan 5, 2016, 4:22:15 PM1/5/16
to Vincent St-Amour, Jay McCarthy, John Clements, Asumu Takikawa, Racket Users
The actual problem (in addition to the slowness of chaperones that
exist at every level of a structure) is that TR doesn't distinguish
between mutable and immutable hash tables. Therefore, a contract that
uses `hash/c` has to be a chaperone contract, thus causing the
slowdown John reported.

Sam

Robby Findler

unread,
Jan 5, 2016, 4:53:24 PM1/5/16
to Sam Tobin-Hochstadt, Vincent St-Amour, Jay McCarthy, John Clements, Asumu Takikawa, Racket Users
Could the contract do something helpful here, or is this really a fix
that needs to happen in TR and/or the pfds library?

Robby


On Tue, Jan 5, 2016 at 3:21 PM, Sam Tobin-Hochstadt

Sam Tobin-Hochstadt

unread,
Jan 5, 2016, 5:08:46 PM1/5/16
to Robby Findler, Vincent St-Amour, Jay McCarthy, John Clements, Asumu Takikawa, Racket Users
It's possible; I'm not sure. The essence of the issue is here: https://gist.github.com/38f4ef694d1df822b3e2 with this contract:

(recursive-contract (struct/c Trie number? (or/c #f (hash/c any/c Trie/c)))
                        #:chaperone)

The similar contract:

(recursive-contract (struct/c Trie number? (or/c #f (hash/c any/c Trie/c-flat #:flat? #t)))
                        #:flat)

Is at least 100x faster when traversing the tree after the contract is applied.

It seems like it would be tricky for the contract system to figure out that one can be used for the other.

Sam

Robby Findler

unread,
Jan 5, 2016, 6:03:25 PM1/5/16
to Sam Tobin-Hochstadt, Vincent St-Amour, Jay McCarthy, John Clements, Asumu Takikawa, Racket Users
This is related to a thought I've had, tho. It seems like it is hard
for TR or for the contract system to know if a lazy or a strict check
is the most useful, but the programmer may have some good information.
So maybe there could be a way for the TR programmer to say "make
things with this type, when doing contract generation, have eager
semantics" and then maybe even give the same things two different
types if it would be used in two different situations that would have
different performance characteristics.

In this particular case, however, the pdfs programmer could just say
"make it eager" and be done with it.

Robby


On Tue, Jan 5, 2016 at 4:08 PM, Sam Tobin-Hochstadt

Sam Tobin-Hochstadt

unread,
Jan 5, 2016, 6:07:30 PM1/5/16
to Robby Findler, Vincent St-Amour, Jay McCarthy, John Clements, Asumu Takikawa, Racket Users
The problem in this specific case is that the hash tables in the data
structure might be mutable, and so the fully-eager check doesn't
really help. But in general, I definitely agree.

Sam

Robby Findler

unread,
Jan 5, 2016, 6:11:47 PM1/5/16
to Sam Tobin-Hochstadt, Vincent St-Amour, Jay McCarthy, John Clements, Asumu Takikawa, Racket Users
Well, if they aren't, the contract could dynamically decide to put in
a chaperone.

Robby

On Tue, Jan 5, 2016 at 5:07 PM, Sam Tobin-Hochstadt

Asumu Takikawa

unread,
Jan 8, 2016, 7:51:41 AM1/8/16
to John Clements, Racket Users
On 2016-01-05 14:39:17 -0500, 'John Clements' via Racket Users wrote:
> Asumu, does this make sense to you? Note that in particular, I think that a
> warning at the top of the pfds package wouldn’t have helped me; I think a
> warning at the top of each pfds page would make a lot more sense.

I'd be happy to take a pull request for this. Though I'd also prefer that the
library just worked well.

The pfds could library could also just export everything without contracts
unsafely. Alternatively we could set it up to generate untyped variants of
each (with a different module path) that don't optimize or use contracts.

Cheers,
Asumu

Robby Findler

unread,
Jan 8, 2016, 9:06:10 AM1/8/16
to Asumu Takikawa, John Clements, Racket Users
We know there is a much more efficient set of contracts than what TR generates right? How about an unsafe export of TR functions to a wrapper module that implements the safe checks by hand (by macro?)? Maybe that exercise will even feed back into an improvement to TR?

Robby

Sam Tobin-Hochstadt

unread,
Jan 8, 2016, 9:19:43 AM1/8/16
to Robby Findler, Asumu Takikawa, John Clements, Racket Users
I think a better solution is just to add immutable hash tables to TR,
and then use them in the trie modules. That would allow TR to generate
exactly the contracts that we could write by hand.

Sam

Robby Findler

unread,
Jan 8, 2016, 9:42:45 AM1/8/16
to Sam Tobin-Hochstadt, Asumu Takikawa, John Clements, Racket Users
We can do the simpler thing for this release, tho. And changing TR
won't happen for months. The simple suggestion is an easy, immediate
fix, and it will also give us goalposts to shoot for.

Robby

Sam Tobin-Hochstadt

unread,
Jan 8, 2016, 9:45:28 AM1/8/16
to Robby Findler, Asumu Takikawa, John Clements, Racket Users
This library isn't part of the Racket distribution, so the release schedule doesn't really matter here.

Sam

Robby Findler

unread,
Jan 8, 2016, 9:48:12 AM1/8/16
to Sam Tobin-Hochstadt, Asumu Takikawa, John Clements, Racket Users
I thought your suggestion involved a change to TR.

But whatever. Do what you want.

Robby

On Fri, Jan 8, 2016 at 8:45 AM, Sam Tobin-Hochstadt

Sam Tobin-Hochstadt

unread,
Jan 8, 2016, 9:50:10 AM1/8/16
to Robby Findler, Asumu Takikawa, John Clements, Racket Users
My suggestion did involve changing TR. I think your suggestion was to
change the trie library, which doesn't have to worry about the Racket
release schedule.

Sam

On Fri, Jan 8, 2016 at 9:48 AM, Robby Findler

Robby Findler

unread,
Jan 8, 2016, 10:10:37 AM1/8/16
to Sam Tobin-Hochstadt, Asumu Takikawa, John Clements, Racket Users
I think we must be talking past each other. Let me try to back out a bit.

It is my understanding that the trie library's slowness that John
reported is all about contract overhead. Specifically, there are some
(effectively) lazy contracts on the trie structs that pile up, leading
to surprisingly bad performance and that these contracts are generated
by TR.

Our initial round of discussion came to the conclusion there's no
simple fix to TR or the contract system that can avoid this overhead.

So, instead of adding a big warning to the trie library (that Asumu
rightly is not excited about :), we could use TR's unsafe require
support to implement a small wrapper file that does a more efficient
version of the contract checking and then John's problem would be
fixed.

Over the period between the current and next release we could, in
addition, improve TR and the contract system to the point where that
wrapper file isn't necessary. That seems good too. It also seems
useful to have the short-term fix as part of doing the longer-term fix
because that will help us be sure we've really gotten the right
contracts in there by comparing the performance of the two versions.

Does this make sense as the right way to proceed?

Robby



On Fri, Jan 8, 2016 at 8:49 AM, Sam Tobin-Hochstadt

Sam Tobin-Hochstadt

unread,
Jan 8, 2016, 10:15:11 AM1/8/16
to Robby Findler, Asumu Takikawa, John Clements, Racket Users
On Fri, Jan 8, 2016 at 10:10 AM, Robby Findler
<ro...@eecs.northwestern.edu> wrote:
> I think we must be talking past each other. Let me try to back out a bit.
>
> It is my understanding that the trie library's slowness that John
> reported is all about contract overhead. Specifically, there are some
> (effectively) lazy contracts on the trie structs that pile up, leading
> to surprisingly bad performance and that these contracts are generated
> by TR.
>
> Our initial round of discussion came to the conclusion there's no
> simple fix to TR or the contract system that can avoid this overhead.
>
> So, instead of adding a big warning to the trie library (that Asumu
> rightly is not excited about :), we could use TR's unsafe require
> support to implement a small wrapper file that does a more efficient
> version of the contract checking and then John's problem would be
> fixed.
>
> Over the period between the current and next release we could, in
> addition, improve TR and the contract system to the point where that
> wrapper file isn't necessary. That seems good too. It also seems
> useful to have the short-term fix as part of doing the longer-term fix
> because that will help us be sure we've really gotten the right
> contracts in there by comparing the performance of the two versions.
>
> Does this make sense as the right way to proceed?

Yes, that's definitely the right thing to do. I was thinking that
since the pfds library isn't part of the main distribution, if we can
fix TR to do better here, even if not before the release, then adding
this workaround wouldn't be necessary. But of course that's wrong --
anyone using 6.3 or 6.4 would still see the awful performance that
John noticed. So we should fix pfds now using the technique you
suggest.

Sorry being confused here,
Sam

Robby Findler

unread,
Jan 9, 2016, 2:21:36 PM1/9/16
to Sam Tobin-Hochstadt, Asumu Takikawa, John Clements, Racket Users
I gave this a try here: https://github.com/rfindler/tr-pfds but got stuck.

Robby


On Fri, Jan 8, 2016 at 9:14 AM, Sam Tobin-Hochstadt

Sam Tobin-Hochstadt

unread,
Jan 9, 2016, 10:01:17 PM1/9/16
to Robby Findler, Asumu Takikawa, John Clements, Racket Users
Here's an initial pull request: https://github.com/takikawa/tr-pfds/pull/6

It runs John's test case in about 2 ms, instead of multiple seconds.

Sam

On Sat, Jan 9, 2016 at 2:21 PM, Robby Findler

John Clements

unread,
Jan 10, 2016, 6:38:21 PM1/10/16
to Sam Tobin-Hochstadt, Robby Findler, Asumu Takikawa, Racket Users

> On Jan 9, 2016, at 7:00 PM, Sam Tobin-Hochstadt <sa...@cs.indiana.edu> wrote:
>
> Here's an initial pull request: https://github.com/takikawa/tr-pfds/pull/6
>
> It runs John's test case in about 2 ms, instead of multiple seconds.

Many thanks!

John


signature.asc
Reply all
Reply to author
Forward
0 new messages