updated Racket-on-Chez status

227 views
Skip to first unread message

Matthew Flatt

unread,
Jan 29, 2019, 9:49:57 AM1/29/19
to Racket Users
Here's a new status report on Racket CS:

http://blog.racket-lang.org/2019/01/racket-on-chez-status.html

Short version: Racket CS is done in a useful sense, but we'll wait
until it gets better before making it the default Racket
implementation.


Matthew

Alex Harsanyi

unread,
Jan 29, 2019, 8:32:43 PM1/29/19
to Racket Users
I know the report explains the various cases where Racket CS is slower than
Racket 7.1, but I would like to mention that these cases are very significant,
at least in my case -- I mention this because the report recommends making
Racket CS the default, which I am very concerned about:

> To maximize the maintenance benefits of Racket CS, it’s better to make it
> the default Racket variant sooner rather than later

To understand the performance impact, below are the numbers for my
application.  I mentioned this before [1] and I added some new timings to the
Google Sheets Document [2].

* the total Travis build time (build + tests) is ~13 minutes using Racket 7.1
  and it is 20 minutes, a 7 minute increase.

* the build time itself grows from 3.75 minutes in Racket 7.1 to 8.5 minutes
  in Racket CS, a 5 minute increase.  My Edit-Compile-Run cycle is already
  slow with Racket 7.1.

* even ignoring compile and load time, code sections which run after Racket
  initialization and library loading indicate that the running time is
  increased by 33% -- given that current execution time is several seconds to
  several minutes in Racket 7.1, a 33% increase is very visible to the end
  user.

As it stands now, the cases where RacketCS is slow have a significant impact
on my application.  Do others see the same performance degradation in their
applications?

Best Regards,
Alex.


Paulo Matos

unread,
Jan 30, 2019, 2:46:16 AM1/30/19
to racket...@googlegroups.com
I am about to run some thorough tests and benchmarks on my app. Will
report back on this. Thanks for raising these issues.

--
Paulo Matos

Luke Whittlesey

unread,
Jan 30, 2019, 9:51:06 AM1/30/19
to Matthew Flatt, Racket Users
This is really impressive work!

--
You received this message because you are subscribed to the Google Groups "Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Matthew Flatt

unread,
Jan 30, 2019, 9:53:21 AM1/30/19
to Alex Harsanyi, Racket Users
At Tue, 29 Jan 2019 17:32:42 -0800 (PST), Alex Harsanyi wrote:
> I mention this because the report recommends making
> Racket CS the default, which I am very concerned about:
>
> > To maximize the maintenance benefits of Racket CS, it’s better to make it
> > the default Racket variant sooner rather than later

I hope the context of this sentence is clear: The report is mostly
about performance, and it says explicitly that we won't switch until
performance is good enough (despite the maintenance benefits of
switching).

> As it stands now, the cases where RacketCS is slow have a significant impact
> on my application. Do others see the same performance degradation in their
> applications?

I'm interested, too, but I expect others to report similar results.

To connect your specific results more to the blog post, here are some
things I took from looking at your program a few weeks ago:

* A lot of code is generated from your source programs. Some of that
seems to be Typed Racket; for example, 15% of the generated code is
contracts to be used if typed code is called from an untyped
context. (Thanks to Sam for helping with that experiment.) In any
case, lots of generated code means even slower loading.

* The plots for distribution builds show that TR-driven compilation
gets slower in general. That by itself probably due to a combination
of factors, but it certainly affects other programs that are written
in TR.

* The test suite involves significant I/O. As you mentioned, the test
driving database operations, and those operations should be about
the same. But the database is populated by parsing files, so slow
I/O may be the bottleneck.

Alex Harsanyi

unread,
Jan 30, 2019, 7:31:53 PM1/30/19
to Racket Users


On Wednesday, January 30, 2019 at 10:53:21 PM UTC+8, Matthew Flatt wrote:
At Tue, 29 Jan 2019 17:32:42 -0800 (PST), Alex Harsanyi wrote:
> I mention this because the report recommends making
> Racket CS the default, which I am very concerned about:
>
> > To maximize the maintenance benefits of Racket CS, it’s better to make it
> > the default Racket variant sooner rather than later

I hope the context of this sentence is clear: The report is mostly
about performance, and it says explicitly that we won't switch until
performance is good enough (despite the maintenance benefits of
switching).

Thanks for clarifying that.  My interpretation of the report was that, while
some performance problems exist, these are acceptable and RacketCS should
become Racket 7.3 (or 8.0)
 

[...]


To connect your specific results more to the blog post, here are some
things I took from looking at your program a few weeks ago:

 * A lot of code is generated from your source programs.

Do you mean (1) that the source code is large or (2) that the generated
byte-code or machine code is unexpectedly large given the size of the
sources?

If (1), I don't consider it large, at least not when compared to the size of
the programs I write and develop at work.  Also, the program would be twice as
large if I had the time to add all the features I would like to :-)

If (2), is there a way to structure it such that less code is generated from
these sources?  Is there a "best practices" document for structuring Racket
programs?

 
Some of that
   seems to be Typed Racket; for example, 15% of the generated code is
   contracts to be used if typed code is called from an untyped
   context. (Thanks to Sam for helping with that experiment.) In any
   case, lots of generated code means even slower loading.

I always assumed that the contracts generated when TR code is used from
untyped Racket would be the same as the ones written by hand if the module
would be untyped and had contracts on the same exported functions.  Is this
not the case?

Is there some documentation on how to measure the amount of code used by
contracts (the 15% value you mention above)?  I experimented with TR for a few
things and left them in the code base, but if they create significant
performance issues, I can convert them to untyped Racket easily.

[...]


 * The test suite involves significant I/O. As you mentioned, the test
   driving database operations, and those operations should be about
   the same. But the database is populated by parsing files, so slow
   I/O may be the bottleneck.

I am not sure what "significant" means, but the tests don't read and write a
large amount of data:

* The total test data size is less than 10Mb: the largest file is 4Mb, the
  second largest is 1.3Mb and the third largest is 500Kb and the fourth is
  400Kb; The remaining files are less than 300Kb each, and half of them are
  under 100Kb. These are binary files, read in a byte vector in one go, using
  `file->bytes`.

* Some tests also read in a SQL schema file from disk, and this is 50Kb in
  size, plus a handful of small SQL files read in once only for each of the 6
  test programs.

* For most tests, the database is created in memory, so, while there are a lot
  of SQL statements run, they don't create any disk IO.

* One test that uses on-disk databases reads in about 12 SQL files, less than
  5Kb each.  The database IO is done by the SQLite library, not by racket.

* Another test using on-disk databases does no IO apart from SQLite (i.e the
  Racket code just runs SQL statements).

* The third test using on-disk databases writes about 20 small files to disk
  and reads half of them back in.

I would be happy to help you identify where the performance degradation
between Racket 7.1 and CS is when running these tests.  If you have any
questions or want me to run other tests or modified ones, I can do that.

Alex.

Matthew Flatt

unread,
Jan 30, 2019, 8:23:39 PM1/30/19
to Alex Harsanyi, Racket Users
At Wed, 30 Jan 2019 16:31:52 -0800 (PST), Alex Harsanyi wrote:
> On Wednesday, January 30, 2019 at 10:53:21 PM UTC+8, Matthew Flatt wrote:
> >
> > * A lot of code is generated from your source programs.
>
>
> Do you mean (1) that the source code is large or (2) that the generated
> byte-code or machine code is unexpectedly large given the size of the
> sources?

(2)

> If (2), is there a way to structure it such that less code is generated from
> these sources? Is there a "best practices" document for structuring Racket
> programs?

I don't think you should have to change your program, for now. Instead,
I think need to better understand how different layers of the language
implementation contribute to the code size. Maybe we find that
everything's really as it should be, or maybe not.

> I always assumed that the contracts generated when TR code is used from
> untyped Racket would be the same as the ones written by hand if the module
> would be untyped and had contracts on the same exported functions. Is this
> not the case?

That's pretty much the case, as far as I know.

> Is there some documentation on how to measure the amount of code used by
> contracts (the 15% value you mention above)?

No. Sam set up a branch of TR that skipped contract generation, and I
compared ".zo" sizes using that branch versus the normal one.

> I experimented with TR for a few things and left them in the code
> base, but if they create significant performance issues, I can
> convert them to untyped Racket easily.

Although there are costs to TR in compile time and load time,
especially in a program that also has untyped components, I generally
would not recommend moving away from TR.

> > * The test suite involves significant I/O. As you mentioned, the test
> > driving database operations, and those operations should be about
> > the same. But the database is populated by parsing files, so slow
> > I/O may be the bottleneck.
>
> I am not sure what "significant" means, but the tests don't read and
> write a large amount of data:

I was probably wrong about this, because I didn't look before into
"fit-file.rkt". I see now that the data-parsing code there doesn't use
Racket's port API, so the I/O layer in Racket CS is probably not the
issue --- at least not in the way that I thought.

> I would be happy to help you identify where the performance degradation
> between Racket 7.1 and CS is when running these tests.

Small examples that illustrate slowness in a specific subsystem are
always helpful. I can't always make the subsystem go faster right away,
but sometimes.

Sam Tobin-Hochstadt

unread,
Jan 30, 2019, 8:52:53 PM1/30/19
to Matthew Flatt, Alex Harsanyi, Racket Users
On Wed, Jan 30, 2019 at 8:23 PM Matthew Flatt <mfl...@cs.utah.edu> wrote:
>
> At Wed, 30 Jan 2019 16:31:52 -0800 (PST), Alex Harsanyi wrote:
> > I always assumed that the contracts generated when TR code is used from
> > untyped Racket would be the same as the ones written by hand if the module
> > would be untyped and had contracts on the same exported functions. Is this
> > not the case?
>
> That's pretty much the case, as far as I know.

That is (mostly) the case, except that Typed Racket generates
completely comprehensive contracts, which can mean they're much larger
than what you might write idiomatically.

> > Is there some documentation on how to measure the amount of code used by
> > contracts (the 15% value you mention above)?
>
> No. Sam set up a branch of TR that skipped contract generation, and I
> compared ".zo" sizes using that branch versus the normal one.

That branch is at https://github.com/samth/typed-racket/tree/no-contracts

> > I experimented with TR for a few things and left them in the code
> > base, but if they create significant performance issues, I can
> > convert them to untyped Racket easily.
>
> Although there are costs to TR in compile time and load time,
> especially in a program that also has untyped components, I generally
> would not recommend moving away from TR.

Unlike Matthew, I can be sure not to offend the creator of Typed
Racket by saying that in some cases, contract generation can take too
much compile time or bytecode space, or result in too big a runtime
overhead, to be acceptable. In those cases, I recommend either using
the `typed/racket/unsafe` library to omit contracts (and protect
things manually) or move away from Typed Racket entirely. The places I
know where this has been an issue are very large OO hierarchies (as in
the `racket/gui` library) and very large data types constructed from
many unions of many distinct structs. I'm happy to take a look at your
code if that would be helpful.

Sam

Robby Findler

unread,
Jan 30, 2019, 8:58:53 PM1/30/19
to Sam Tobin-Hochstadt, Matthew Flatt, Alex Harsanyi, Racket Users
Also I think that the size of the code generated by arrow contracts
(when keywords are involved and other situations that perhaps aren't
worth spelling out in detail in this email) is larger than we would
like, which also doesn't help.

Robby

Alex Harsanyi

unread,
Jan 31, 2019, 7:58:13 AM1/31/19
to Racket Users

On Thursday, January 31, 2019 at 9:23:39 AM UTC+8, Matthew Flatt wrote:
> I would be happy to help you identify where the performance degradation
> between Racket 7.1 and CS is when running these tests.

Small examples that illustrate slowness in a specific subsystem are
always helpful. I can't always make the subsystem go faster right away,
but sometimes.


I timed some key functions in my application to understand which parts of Racket CS are slow.  I did a write-up in the Gist listed below, but the result seems to be that even functions that run Racket only code with no IO or calls into C libraries run slower in Racket CS.  Code that calls into the database library to run SQL insert queries runs significantly slower.  The only things which were faster in Racket CS were one "Racket only" function, `df-histogram` and a function which retrieved data from an SQL query, `df-read/sql`


Alex.

Laurent

unread,
Jan 31, 2019, 8:37:57 AM1/31/19
to Matthew Flatt, Racket Users
Just wanted to say thank you for the update and for the honest report.

I look forward to using Racket CS, and to seeing how easily new features can be incorporated :)

David Storrs

unread,
Jan 31, 2019, 11:08:49 AM1/31/19
to Racket Users
Thank you for all the hard work you've put into this, everyone.

The benchmark graphs are impressive! One thing that surprised me is
that there are a handful of tests (tak1, dynamic2, tak, mazefun,
maze2, collatz-q, collatz) where Racket/CS actually outperformed CS.
How is that possible?

Matthias Felleisen

unread,
Jan 31, 2019, 11:24:39 AM1/31/19
to Alex Harsanyi, Matthew Flatt, Racket Users, Sam Tobin-Hochstadt

> On Jan 30, 2019, at 8:52 PM, Sam Tobin-Hochstadt <sa...@cs.indiana.edu> wrote:
>
>> Although there are costs to TR in compile time and load time,
>> especially in a program that also has untyped components, I generally
>> would not recommend moving away from TR.
>
> Unlike Matthew, I can be sure not to offend the creator of Typed
> Racket by saying that in some cases, contract generation can take too
> much compile time or bytecode space, or result in too big a runtime
> overhead, to be acceptable. In those cases, I recommend either using
> the `typed/racket/unsafe` library to omit contracts (and protect
> things manually) or move away from Typed Racket entirely. The places I
> know where this has been an issue are very large OO hierarchies (as in
> the `racket/gui` library) and very large data types constructed from
> many unions of many distinct structs. I'm happy to take a look at your
> code if that would be helpful.


I have no problem offending the lead TR designer and maintainer
(also thanks to Asumu, Stevie, Kent, and everyone else who designed
& implemented essential elements. The offense is to Sam alone :)

;; - - -

If adding Typed Racket to Unityped Racket code (I sure do like
your insistence on Dana Scott’s terminology) feels like it’s
slowing down your code, please try the feature-specific profiler
to determine the module boundary that imposes the cost. By
moving this boundary (like adding another typed module) you
might be able to get the full performance benefits of TR (there
are some).

Yes, we should improve FSP and document it for these cases,
but from looking at your blog and emails, I suspect you can
cope. This would be tremendously helpful for the continued
TR development.

Thanks — Matthias

Matthew Flatt

unread,
Feb 1, 2019, 7:44:17 AM2/1/19
to David Storrs, Racket Users
At Thu, 31 Jan 2019 11:08:35 -0500, David Storrs wrote:
> One thing that surprised me is
> that there are a handful of tests (tak1, dynamic2, tak, mazefun,
> maze2, collatz-q, collatz) where Racket/CS actually outperformed CS.
> How is that possible?

I have not investigated closely, but Racket CS might perform more
function inlining than plain Chez Scheme. That's because Racket CS
performs it own inlining pass to better support cross-module
optimization, and then it hands over the result to Chez Scheme, which
performs its usual inlining.

Another possibility is that Racket CS changes evaluation order for some
function-call expression by forcing left-to-right evaluation of the
arguments. That is, Racket CS might force an order of evaluation that
just happens to be slightly better.

Along similar lines, the differences are small enough that it could be
just from accidents of allocation order and size that trigger a
memory-alignment pattern that happens to be slightly better for some
layer of caching.

Gustavo Massaccesi

unread,
Feb 5, 2019, 8:49:12 AM2/5/19
to Alex Harsanyi, Racket Users
I have been trying a few variations of the code. It would be nice to have a test branch that use only the data in the repository. I used some fake data instead.

For the tests, I used the function get-mean-max-bounds https://github.com/alex-hhh/ActivityLog2/blob/master/rkt/data-frame/meanmax.rkt#L409 with this data

  (define fake-data2
    (for/list ([_ (in-range 10000000)])
      (if (< (random) .01)
         (vector #f #f)
         (vector (- (random) .5) (- (random) .5)))))

so, I tested with

  (time (get-mean-max-bounds fake-data2))



*** The main time improvement was changing
  (for ([b bavg] #:when (vector-ref b 1))
    ...)
to
  (for ([b (in-list bavg)] #:when (vector-ref b 1))
    ...)

This increase the speed to the double or more. In the microbenchmark, the new duration is the 40%-50% of the original duration.

IIUC, in all functions you know the type of sequence of the arguments, so my advice is to add in-list or in-vector to each and every for in the whole file (or project).

This is a good general recommendation. With in-list or in-vector or in-range, the generated code is very efficient. Without them, the code has to create a generic object to track the iteration, and the code is much slower.


*** I tried eliminating the set! and using for/fold instead. The problem is that the code is slower :(. In general it's better to avoid mutable variables, but in this case removing them makes the program slower. We should take a look at the internal code of Racket and try to fix it, because in a perfect world the version without set! should be faster. Meanwhile, keep the current version...


*** I tried replacing the for and set! with an explicit loop. Something like
     (let loop ([bavg bavg] [min-x #f] [max-x #f]  [min-y #f] [max-y #f])
       ...)

With this change, there is an additional 5% improvement in the speed, but the legibility is reduced too much. So this is better than the version with for and in-list, but I recommend to keep the legible version.


*** I tried replacing the initial value of min-x and friends with +inf.0, and removing the if in the updates. I'm convinced this is a good idea, but the change in speed is negligible.




In conclusion, try adding as much in-list, in-vector and in-range as you can.

Gustavo




--

Alex Harsanyi

unread,
Feb 5, 2019, 10:59:25 PM2/5/19
to Racket Users
First, thanks for looking into this.  Rather than answer inline, I will just comment a few things:

* Unfortunately, the tests use real data because they try to pick up problems with the code, not test the performance, however, some of the tests do run against only data from the repository.  I will not make my personal training data available publicly, however I gave Matthew Flatt access to it and he can run those tests.

* there are two additional tests that can be run directly, they are on the "ah/perf-test" branch and are named "test/t-db-insert-activity.rkt" and "test/t-df-mean-max.rkt" -- the second one sets up a minimal `df-mean-max` test with some real data.  Interestingly, `df-mean-max` runs faster in RacketCS in that test, which is not what I found when I run the other tests -- this needs more investigation.

* I was aware that using `in-list`, `in-vector`, etc have better performance characteristics, but I prefer not to use them unless necessary: it is much nicer when code works with a variety of container formats, as much as possible, I have occasionally changed data representation, and Racket will not pick up that a function which uses `in-list` is now being passed a vector, this will result in a run time error instead.  In fact, most of the tests just run the code in the hope of triggering contract violations.

Which brings me to the last point:  the BAVG set that `get-mean-max-bounds` receives will have about 95 items in it, it will be this list [1]. If I would use a data-set of 10 million items in BAVG, I would have bigger problems: the `spline` function which also uses this BAVG set would run out of memory as it would try to construct a matrix of 10e14 items which would take 720 terabytes of memory (assuming floating point numbers)

More realistic values are, which I am aiming for:

* input data processing functions, such as `df-mean-max`, should work reasonably fast for a data-set of 3600 to 10800 items, which corresponds to activities that last 1 to 3 hours.  They should also work acceptably for data sets of 21600 items (an Ironman bike split).
* data plot and summary functions should work reasonably fast for data sets of 100 to 5000 items (depending on the case) -- in fact my code goes to great lengths to ensure that the data that is passed to any plot function is reasonably small, so plots are displayed in 1 second or less.

I am always looking for feedback and if you want to spend some time diagnosing performance issues, I can setup more individual tests, or I can let you know what realistic data sets are so you can setup tests of your own.

Best Regards,
Alex.

Reply all
Reply to author
Forward
0 new messages