Racket BC versus CS unsafe flonums

46 views
Skip to first unread message

Dominik Pantůček

unread,
Nov 6, 2020, 2:15:13 PM11/6/20
to Racket Users
Hello Racketeers,

going through some more possible optimizations of my side projects, I
encountered a different behavior of unsafe-flvector-set! between BC and
CS implementations:

==== CUT HERE ====
#lang racket

(require racket/unsafe/ops
racket/flonum)

(define flv (make-flvector 10))
(for ((i (in-range 10)))
(unsafe-flvector-set! flv i i))
(for ((i (in-range 10)))
(display (~a #:width 4 (unsafe-flvector-ref flv i))))
(newline)
==== CUT HERE ====

$ racket test-flvector.rkt
SIGSEGV MAPERR si_code 1 fault on addr 0x9
Aborted (core dumped)
$ racketcs test-flvector.rkt
0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
$

The SIGSEGV is expected behavior (although might not get triggered every
time). The latter looks like implicit fx->fl operation.

Changing to safe flvector-set! results in expected contract violation
with both variants:

$ racketcs test-flvector.rkt
0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
$ racket test-flvector.rkt
flvector-set!: contract violation
expected: flonum?
given: 0
argument position: 3rd
other arguments...:
(flvector 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0)
0
context...:
"/home/joe/Projects/Programming/TD4/test-flvector.rkt": [running body]
temp35_0
for-loop
run-module-instance!
perform-require!
$ racketcs test-flvector.rkt
flvector-set!: contract violation
expected: flonum?
given: 0
context...:

/home/joe/Projects/Programming/racket-lang/racket/racket/collects/racket/private/for.rkt:1534:9
body of "/home/joe/Projects/Programming/TD4/test-flvector.rkt"
$

I assume that CS' unsafe-flvector-set! is actually pretty safe when it
comes to flonum-convertible numbers. It might be a bit faster because it
lacks the contract, but definitely it is not a "low level" variant.

Also given the fact that unsafe-fl+ and others give no performance
advantage (the opposite is true for well-written algorithms), maybe the
performance guide and unsafe ops documentation should get an update
before CS becomes the default implementation.

Any thoughts?


Cheers,
Dominik

Matthew Flatt

unread,
Nov 6, 2020, 2:45:52 PM11/6/20
to Dominik Pantůček, Racket Users
At Fri, 6 Nov 2020 20:14:51 +0100, Dominik Pantůček wrote:
> I assume that CS' unsafe-flvector-set! is actually pretty safe when it
> comes to flonum-convertible numbers. It might be a bit faster because it
> lacks the contract, but definitely it is not a "low level" variant.

Hm, right --- it's implemented by `bytevector-ieee-double-native-set!`,
which is defined to convert its last argument to a flonum (so the
conversion applies even in unsafe mode). I hadn't noticed that before,
so thanks for pointing it out!

I will investigate faster option. A primitive without conversion could
make the safe `flvector-set!` slightly faster, too, by avoiding a
redundant check.

> Also given the fact that unsafe-fl+ and others give no performance
> advantage

I think `unsafe-fl+` can still be faster. For example, the second loop
below is the unsafe one, and it's 10-20% faster. I constructed this
program carefully, though, to avoid flonum inference and to avoid
allocating flonum results (which swamps the cost of checks).

The performance guide is pretty conservative about suggesting
performance improvements via unsafe operations, already, but
suggestions are welcome.

Mathew

----------------------------------------

#lang racket/base
(require racket/unsafe/ops
racket/flonum)

(define N 100000000)

(define (f x)
(fl< (fl+ x x) 0.0))

(define (u-f x)
(fl< (unsafe-fl+ x x) 0.0))

(set! f f)
(set! u-f u-f)

(collect-garbage)

(time
(for/fold ([v #f]) ([i (in-range N)])
(f 1.0)))

(collect-garbage)

(time
(for/fold ([v #f]) ([i (in-range N)])
(u-f 1.0)))

Matthew Flatt

unread,
Nov 7, 2020, 9:36:15 AM11/7/20
to Dominik Pantůček, Racket Users
At Fri, 6 Nov 2020 12:45:46 -0700, Matthew Flatt wrote:
> I will investigate faster option. A primitive without conversion could
> make the safe `flvector-set!` slightly faster, too, by avoiding a
> redundant check.

Long story short, I added flvectors to the Chez Scheme level as of
v7.9.0.4. With that change, `flvector-set!` is faster, and
`flvector-ref` and `flvector-set!` cooperate better with flonum
unboxing.

For example, this microbenchmark now avoids allocation and runs about 8
times as fast:

(let ([v (make-flvector 100)])
(time
(for ([j (in-range 100000)])
(for ([i (in-range (flvector-length v))])
(flvector-set! v i (fl+ 1.0 (flvector-ref v i)))))))

Also, your program now crashes as you intended.


(To make room in Chez Scheme's type encoding for flvectors, I removed
immutable fxvectors. Immutable fxvectors do not seem useful, and
there's no such thing at the Racket level.)


Matthew

Dominik Pantůček

unread,
Nov 7, 2020, 10:43:05 AM11/7/20
to racket...@googlegroups.com
Wow, you are faster than I :)

On 07. 11. 20 15:36, Matthew Flatt wrote:
> At Fri, 6 Nov 2020 12:45:46 -0700, Matthew Flatt wrote:
>> I will investigate faster option. A primitive without conversion could
>> make the safe `flvector-set!` slightly faster, too, by avoiding a
>> redundant check.
>
> Long story short, I added flvectors to the Chez Scheme level as of
> v7.9.0.4. With that change, `flvector-set!` is faster, and
> `flvector-ref` and `flvector-set!` cooperate better with flonum
> unboxing.

Ok, I originally wanted to reply to your original remark about
unsafe/safe flonum operations, but this actually extends my questions.

My current understanding is that the best performance you get from
unsafe operations while using safe operations as hints for the flonum
unboxing algorithm, right?

So with flvectors it is now the same? When I convey some safe hint to
the unboxing code, is it the best (in terms of performance) to use
unsafe flvector procedures then? To be honest, when I try to write down
"definitive" rules how to structure the code (mainly tight loops) and
use safe/unsafe operations properly, I get quickly lost.

I'll do some empirical benchmarks and see whether I can get some
generally valid answer.

>
> For example, this microbenchmark now avoids allocation and runs about 8
> times as fast:
>
> (let ([v (make-flvector 100)])
> (time
> (for ([j (in-range 100000)])
> (for ([i (in-range (flvector-length v))])
> (flvector-set! v i (fl+ 1.0 (flvector-ref v i)))))))

I was just about to ask. This is how most of my code looks right now
(albeit more complex and with very short flvectors - 3 or 9 elements
mostly). I'll test the new implementation and see if there is a difference.

>
> Also, your program now crashes as you intended.
>

Aweseome! (May sound weird, but I really like consistent behavior).

>
> (To make room in Chez Scheme's type encoding for flvectors, I removed
> immutable fxvectors. Immutable fxvectors do not seem useful, and
> there's no such thing at the Racket level.)

There definitely are scenarios, where immutable fxvectors may be a good
idea. However in those scenarios, allocations ruin the performance most
of the time. And although CS performs way better with extensive box
allocations than BC, it can still quickly become a noticeable bottleneck.


Dominik

Matthew Flatt

unread,
Nov 7, 2020, 4:59:39 PM11/7/20
to Dominik Pantůček, racket...@googlegroups.com
At Sat, 7 Nov 2020 16:42:43 +0100, Dominik Pantůček wrote:
> My current understanding is that the best performance you get from
> unsafe operations while using safe operations as hints for the flonum
> unboxing algorithm, right?

I'm not sure I understand what you mean, but I don't think unsafe
operations are particularly helpful for unboxing hints.

Let's separate out safety checks from unboxing. For safety checks, I'd
put the unsafe flonum operations into two categories:

* `fl+`, `fl-`, etc., where the only relevant check is whether an
argument is a flonum --- In this case, the benefits of unsafe
variants are small to nothing. Depending on context, the compiler
can often infer that the arguments are definitely flonums, so the
generated code is the same. It's only when the compiler cannot infer
flonumness that an unsafe variant has any performance advantage, and
then it's just skipping a type-tag check.

* `flvector-ref` and `flvector-set!` --- In these cases, there's an
array-bounds requirement that is well beyond the reach of the
compiler to infer. So, there will be a check on every use of the
safe variants in a loop, for example. The check also requires more
instructions than checking flonumness, so it's easier to see some
speedup with unsafe variants of these operations.

Unboxing is a bigger deal for performance than safety checks, though,
and unboxing is completely driven by what the compiler can infer. Using
unsafe operations provides no more or less information to the
compiler's unboxing inference than safe operations do.


Matthew
Reply all
Reply to author
Forward
0 new messages