Bug in Typed Racket optimizer and snapshots

95 views
Skip to first unread message

Antonio Menezes Leitao

unread,
Aug 30, 2015, 6:21:06 AM8/30/15
to Racket Dev
Hi,

I recently found a strange behavior in Typed Racket, where immediately after type-checking my code, in generates an error that looks like this:

..\..\..\..\..\..\..\..\Program Files\Racket-6.2.900.10\share\pkgs\typed-racket-lib\typed-racket\static-contracts\optimize.rkt:158:6: free-id-table-ref: no mapping for #<syntax n*463>

Note that I'm using a recent Racket snapshot: 6.2.900.10--2015-08-25(a1e6c94/a)

I did lots of experiments to identify the cause of the error but I could only make it disappear by reducing the number of definitions in the file, or by replacing some macro calls with their expansion.

Unfortunately, I could not try a more recent snapshot as the page in http://www.cs.utah.edu/plt/snapshots/ is not working properly. The links go nowhere.

Any suggestions?

Best,
António.




Antonio Leitao

unread,
Sep 1, 2015, 9:28:37 AM9/1/15
to Racket Developers, d...@racket-lang.org
Hi,

Meanwhile, I reduced the problematic code to a maneageable fragment.

The following code typechecks and runs fine:

#lang typed/racket

(define-type (Base-Shape R)
 
(U (foo-shape R)
     
(bar-shape R)))

(define-syntax (def-base-shape stx)
 
(syntax-case stx ()
   
[(_ (R (func-name shape-name) [param-name : param-type] ...))
     
(syntax/loc stx
       
(begin
         
(struct (R) shape-name ([param-name : param-type] ...))
         
(define #:forall (R) (func-name [param-name : param-type] ...)
           
(shape-name param-name ...))))]))

(def-base-shape (R (foo-func foo-shape) [s : (Base-Shape R)]))
(def-base-shape (R (bar-func bar-shape) [s : R]))

(provide foo)
(define (foo [shapes : (Base-Shape String)])
 
(foo-func shapes))

;(provide bar)
(define (bar [shapes : (Base-Shape String)])
 
(bar-func shapes))


However, if you uncomment the last provide, it generates an error:

..\..\..\..\..\..\..\..\Program Files\Racket-6.2.900.10\share\pkgs\typed-racket-lib\typed-racket\static-contracts\optimize.rkt:155:2: free-id-table-ref: no mapping for #<syntax n*11>

Is there a workaround?

Best,
António.

Vincent St-Amour

unread,
Sep 7, 2015, 1:12:21 PM9/7/15
to Antonio Leitao, Racket Developers, d...@racket-lang.org
Antonio,

Thanks for the report! Your test case has resisted all my attempts at
simplification, which made it a bit tricky to work with. :)

I think I've found the cause of the problem, though. It looks like the
type to static contract translation's cache causes it to generate open
static contract terms, which the optimizer chokes on, which causes the
error you're seeing.

Specifically, the following static contract (for `bar`) is (I think) the
problem:

#<->/sc #<or/sc #<rec/sc ((n*9 #<struct/sc #<or/sc n*9 #<rec/sc ((n*10 #<struct/sc #<flat/sc String>>)) n*10>>>)) n*9>
#<rec/sc ((n*10 #<struct/sc #<flat/sc String>>)) n*10>>
#<rec/sc ((n*24 #<struct/sc #<or/sc n*11 #<rec/sc ((n*12 #<struct/sc #<flat/sc String>>)) n*12>>>)) n*24>>

Note the `n*11` on the last line, which is not bound by any `rec/sc`
(and is the particular identifier the optimizer chokes on).

The bit with `n*11` is present in the previous (very similar) static
contract (for `foo`), where it is in fact bound. Generating the SC for
`foo` puts that term in the cache, which causes SC generation for `bar`
to reuse it. (And that explains why the error occurred only when both
bindings were provided.)

Disabling the SC cache (in the source, there is no user-visible option)
makes the problem go away, as expected, but is not a real solution.

IIUC, the fix would be to not put open SC terms in the SC cache. It
looks like there's already some checking for that in place, but it
doesn't seem to be working, as this example shows. I'll look into this.

Asumu: You added this caching. Any thoughts on that?

Vincent
> --
> You received this message because you are subscribed to the Google
> Groups "Racket Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to racket-dev+...@googlegroups.com.
> To post to this group, send email to racke...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/racket-dev/3a2f30ad-218e-4684-a769-7371c069e3c0%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Vincent St-Amour

unread,
Sep 7, 2015, 2:52:07 PM9/7/15
to Antonio Leitao, Racket Developers, d...@racket-lang.org
Antonio,

Yep. That was the problem. Fix pushed!

Thanks for your patience.

Vincent

Vincent St-Amour

unread,
Sep 9, 2015, 6:25:24 PM9/9/15
to Antonio Leitao, racke...@googlegroups.com
[Re-adding dev]

Did you observe similar increases with the previous version of Racket
(i.e. before my fix)? Maybe with previous versions of your program (that
didn't trigger the SC bug you reported earlier).

The fix I pushed *does* reduce the caching done during contract
generation, which could explain a performance regression. But if the fix
caused the program to go from "erroring" to "taking a long time to
compile", the fix may just have uncovered an existing performance issue
that the error was masking.

The question at this point is: where is that time being spent?

To find out:
- Start without any compiled files for your program (i.e. clear out the
relevant `compiled` directories)
- Run: PLTSTDERR=debug@tr-timing raco make my-program.rkt

That should dump some timing info that shows where time is being spent
in TR.

Depending on the results, there may be different solutions.

Vincent



On Wed, 09 Sep 2015 04:12:33 -0500,
Antonio Leitao wrote:
>
> Hi Vincent,
>
> It seems that we now have a different problem. I tried a recent snapshot
> (6.2.900.15--2015-09-08(5f2d44c/a)) and although that bug is no longer
> present, it now takes forever to typecheck my code.
>
> I have a type definition that looks like
>
> (define-type (Foo T) (U (s1 T) (s2 T) (s3 T) ...))
>
> where the s1, s2, ... are struct types, and I have a very small number
> of polymorphic functions that accept a (Foo T). Now, if I comment all
> but one of the (si T) in the Foo type definition, it takes 17 seconds
> from pressing the Run button in DrRacket (background expansion disabled)
> till the prompt appears in the REPL, which is already a long (but
> acceptable) amount of time.
>
> However, if I incrementally "uncover" additional (si T) in the Foo type
> definition, I get the following times (the column on the left is the
> number of (si T) visible in the Foo type definition):
>
> 1 - 17s
> 2 - 19s
> 3 - 20s
> 4 - 21s
> 5 - 25s
> 6 - 44s
> 7 - 113s
> 8 - ran out of memory, after 195s
>
> Note that these rough timings were obtained in a very fast desktop.
>
> Any ideas?
>
> Best,
> António.

Vincent St-Amour

unread,
Sep 10, 2015, 12:16:22 PM9/10/15
to Antonio Leitao, racke...@googlegroups.com
[Re-adding dev. Please reply all in the future.]

The relevant bits of the logs (for context, for those who did not get
the attachments) are:


Union of 30 types:
tr-timing: Fixed contract ids at 15740 last step: 0 gc: 0 total: 15116
tr-timing: Starting optimizer at 31028 last step: 15288 gc: 3166 total: 30404
tr-timing: Optimized at 31278 last step: 250 gc: 63 total: 30654
tr-timing: Finished, returning to Racket at 31278 last step: 0 gc: 0 total: 30654

Union of 31 types:
tr-timing: Fixed contract ids at 16270 last step: 0 gc: 0 total: 15615
tr-timing: Starting optimizer at 66269 last step: 49999 gc: 9246 total: 65614
tr-timing: Optimized at 66550 last step: 281 gc: 77 total: 65895
tr-timing: Finished, returning to Racket at 66550 last step: 0 gc: 0 total: 65895


The way I read it, it's not the optimizer taking 3 times as much time in
the second case, but the step before. I.e., what happens between fixing
contract ids and launching the optimizer. Which is these two lines in
TR's `core.rkt`:

[(before-code ...) (change-provide-fixups (flatten-all-begins pre-before-code))]
[(after-code ...) (change-provide-fixups (flatten-all-begins pre-after-code))]

(If you want to confirm that the optimizer is not at fault, you change
change the #lang line to "#lang typed/racket #:no-optimize", which turns
off the optimizer altogether.)

I'll look into `change-provide-fixups`, but it would be really helpful
if you could share your problematic program, or at least a fragment that
exhibits this behavior. Otherwise, I don't have a good example program
to measure.

Vincent



On Wed, 09 Sep 2015 18:09:00 -0500,
Antonio Leitao wrote:
>
> Hi,
> Attached you'll find two dumps. One (out.txt) is for a union of 30
> types. The other (out2.txt) is for a similar union but containing 31
> types. It seems to me the problem is in the optimizer but the dumps do
> now show much information regarding this final step.
>
> Does this help you identify the problem?
>
> Best,
> António.

Antonio Leitao

unread,
Sep 10, 2015, 5:15:04 PM9/10/15
to Vincent St-Amour, racke...@googlegroups.com
Hi,

On Thu, Sep 10, 2015 at 5:16 PM, Vincent St-Amour <stam...@eecs.northwestern.edu> wrote:
[Re-adding dev. Please reply all in the future.]


Sorry. I was afraid the log files would annoy people.
 
I'll look into `change-provide-fixups`, but it would be really helpful
if you could share your problematic program, or at least a fragment that
exhibits this behavior. Otherwise, I don't have a good example program
to measure.

Here is a MWE:

#lang typed/racket
(require (for-syntax racket/syntax))

(define-syntax (structs stx)
  (syntax-case stx ()
    [(structs n)
     (with-syntax ([(name ...)
                    (for/list ([i (in-range (syntax->datum #'n))])
                      (format-id #'structs "struct-~a" i))]
                   [(fname ...)
                    (for/list ([i (in-range (syntax->datum #'n))])
                      (format-id #'structs "func-~a" i))])
       #'(begin
           (define-type (Union T)
             (U Number (name T) ...))
           (begin
             (struct (T) name
               ([x : (Union T)]))
             (define #:forall (T) (fname [x : (Union T)])
               (name x)))
           ...))]))

The approximate times I got are:

(structs 4) ;2s
(structs 5) ;7s
(structs 6) ;42s

Hope this helps and thanks in advance for your efforts.

Best,
António. 

Vincent St-Amour

unread,
Sep 10, 2015, 5:57:37 PM9/10/15
to Antonio Leitao, Vincent St-Amour, racke...@googlegroups.com
Antonio,

That's weird. With that example, I do see the timings you mentioned, but
if I turn on TR's timing logging, I see that almost all the time is
being spent in pass 1 of the typecheck. I.e., in a different place than
your original example, way earlier in the TR implementation.

That makes me suspect that this new example exhibits a different issue,
which is less likely to be related to the original bug you reported.

Vincent

Sam Tobin-Hochstadt

unread,
Sep 10, 2015, 6:21:45 PM9/10/15
to Vincent St-Amour, Antonio Leitao, racke...@googlegroups.com
For other reasons, I have a version of TR which looks at how long type inference is taking, and that's clearly a big culprit. Almost all of the time for `(structs 6)` is in a few calls to inference. Unfortunately, this doesn't yet give me a way to fix it. Based on the very high GC times, I think there's a lot too much substitution going on here, which is something I'm looking to fix in Typed Racket generally.

Sam

On Thu, Sep 10, 2015 at 5:57 PM Vincent St-Amour <stam...@eecs.northwestern.edu> wrote:
Antonio,

That's weird. With that example, I do see the timings you mentioned, but
if I turn on TR's timing logging, I see that almost all the time is

being spent in pass 1 of the typecheck. I.e., in a different place than
your original example, way earlier in the TR implementation.

That makes me suspect that this new example exhibits a different issue,
which is less likely to be related to the original bug you reported.

Vincent



On Thu, 10 Sep 2015 16:14:43 -0500,
Antonio Leitao wrote:
>
--
You received this message because you are subscribed to the Google Groups "Racket Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-dev+...@googlegroups.com.
To post to this group, send email to racke...@googlegroups.com.

Antonio Leitao

unread,
Sep 10, 2015, 6:41:21 PM9/10/15
to Vincent St-Amour, racke...@googlegroups.com
On Thu, Sep 10, 2015 at 10:57 PM, Vincent St-Amour <stam...@eecs.northwestern.edu> wrote:
Antonio,

That's weird. With that example, I do see the timings you mentioned, but
if I turn on TR's timing logging, I see that almost all the time is
being spent in pass 1 of the typecheck. I.e., in a different place than
your original example, way earlier in the TR implementation.

That makes me suspect that this new example exhibits a different issue,
which is less likely to be related to the original bug you reported.

OK. Attached is another example that is almost like the real code.

Near the top of the file, you will see the Base-Shape type, with some stuff inside a block comment. Even as it is, with only one type in the union, it already takes a very long time to process. But as you uncomment more and more types in that union it becomes really, really slow. I'm afraid you will not be able to uncomment everything.

 
testS.rkt

Vincent St-Amour

unread,
Sep 11, 2015, 2:25:53 PM9/11/15
to Antonio Leitao, Vincent St-Amour, racke...@googlegroups.com
Antonio,

I can confirm that, with this example, the contract generation code we
identified as a possible culprit is indeed the problem.

For reference, here are a few data points.

| # of union members | time in problem code | total time |
+--------------------+----------------------+------------+
| 1 | 1712 | 6820 |
| 2 | 2073 | 7421 |
| 3 | 2030 | 7745 |
| 4 | 2188 | 7335 |
| 24 | 11174 | 23656 |
| 25 | 9824 | 20024 |
| 30 | 17879 | 36099 |
| 32 | 42412 | 63279 |
| 39 | killed after ~12 minutes |

Pass 1 of the typechecker is the only other significant time
contributor, as was the case in the earlier, simpler example.
After around 30 union members, the calls to `change-provide-fixups`
begin to dominate.


Sam: You wrote `change-provide-fixups`, and left this helpful comment:

;; TODO: These are probably all in a specific place, which could avoid
;; the big traversal

This sounds like a potential solution. Can you elaborate?

Vincent
Reply all
Reply to author
Forward
0 new messages