Bound checks elimination hint.

349 views
Skip to first unread message

Bruno Albuquerque

unread,
Feb 21, 2020, 11:36:56 AM2/21/20
to golang-nuts
I wrote some simple code to convert a RGB24 image represented as a []byte to a NRGBA image (also as a []byte), like so:


Unfortunately, the performance is lacking here and I am trying to improve it. The first low hanging fruit seems to be taking advantage of BCE:

$go test -bench .
goos: linux
goarch: amd64
BenchmarkNRGBA-8       484   2636344 ns/op

$ go test -gcflags="-B" -bench .
goos: linux
goarch: amd64
BenchmarkNRGBA-8       855   1654882 ns/op

Unfortunately, I could not find an incantation that would remove the bound checks. My naive attempt was to just do

_ = nrgbaData[len(nrgbaData)-1]

at around line 7 in the program above but this did not help and added one extra bound check.

Funny enough, if I do 

_ = nrgbaData[len(nrgbaData)]

all the bound checks seem to be eliminated but, obviously, the program panics at this line.

I am using

go build -gcflags="-d=ssa/check_bce/debug=1"

to check for the bound checks.

What am I missing here? Any suggestions on how to make this work?

And while we are at it, adding goroutines to convert partitions of the image also helps (2 goroutines gives a 50% increase in performance more or less). Other these 2 things, any other suggestion s0on how to make it faster?

Thanks in advance.

Sebastien Binet

unread,
Feb 21, 2020, 12:35:24 PM2/21/20
to Bruno Albuquerque, golang-nuts
On Fri, Feb 21, 2020 at 5:36 PM Bruno Albuquerque <b...@gmail.com> wrote:
I wrote some simple code to convert a RGB24 image represented as a []byte to a NRGBA image (also as a []byte), like so:


Unfortunately, the performance is lacking here and I am trying to improve it. The first low hanging fruit seems to be taking advantage of BCE:

$go test -bench .
goos: linux
goarch: amd64
BenchmarkNRGBA-8       484   2636344 ns/op

$ go test -gcflags="-B" -bench .
goos: linux
goarch: amd64
BenchmarkNRGBA-8       855   1654882 ns/op

Unfortunately, I could not find an incantation that would remove the bound checks. My naive attempt was to just do

_ = nrgbaData[len(nrgbaData)-1]

at around line 7 in the program above but this did not help and added one extra bound check.

Funny enough, if I do 

_ = nrgbaData[len(nrgbaData)]

all the bound checks seem to be eliminated but, obviously, the program panics at this line.
what about:
   _ = nrgbaData[:len(nrgbaData)]

does this also remove the bound checks?

-s

Bruno Albuquerque

unread,
Feb 21, 2020, 1:08:27 PM2/21/20
to Sebastien Binet, golang-nuts
Nope. Bound checks are still there. I am puzzled by this one.

Bruno Albuquerque

unread,
Feb 21, 2020, 1:25:09 PM2/21/20
to Sebastien Binet, golang-nuts
This is interesting. If I simplify the loop to something like this:

nrgbaData := make([]byte, len(rgbData)+(len(rgbData)/3))

        _ = nrgbaData[len(rgbData)]

        for i := 0; i < len(rgbData); i++ {
                nrgbaData[i] = rgbData[i]
        }

then the bounds check at the line inside the for loop is removed.

But if I change it to this:

nrgbaData := make([]byte, len(rgbData)+(len(rgbData)/3))

        _ = nrgbaData[len(rgbData)]

        for i := 0; i < len(rgbData); i += 3 {
                nrgbaData[i] = rgbData[i]
        }

then the bounds check is not eliminated.

Considering it is still guaranteed that "i" inside the loop will never be equal to or greater than len(rgbData), I do not understand why the bounds check is now required.

Any ideas?

andrey mirtchovski

unread,
Feb 21, 2020, 2:11:25 PM2/21/20
to Bruno Albuquerque, Sebastien Binet, golang-nuts
got it down to two:

https://play.golang.org/p/jmTqhLGaLY_T
> --
> You received this message because you are subscribed to the Google Groups "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAEd86TwC19nJDwAe-kR6Ze8zPf4zjPQp0dMksPAwMwsXCGWRiw%40mail.gmail.com.

Bruno Albuquerque

unread,
Feb 21, 2020, 2:20:55 PM2/21/20
to Sebastien Binet, golang-nuts
[now sending to the entire group]

And the plot thickens.


Basically, this is a very simple loop iterated with different step increments.

For step increments of 1, 2, 4, 5 and 10, there are no bound checks.

For step increments of 3, 6, 7, 8 and 9 there are bound checks.

So, in this simplified case, if the step divides the length evenly, then there are no bound checks. If it does not, then bound checks are inserted.

This seems to be an unnecessary check as the actual relevant thing is to make sure that the index is not equal to or greater than the slice length. Either that or I am missing something

Robert Engels

unread,
Feb 21, 2020, 4:36:08 PM2/21/20
to Bruno Albuquerque, Sebastien Binet, golang-nuts
Curious, in the cases of where you were able to eliminate the bounds checks what was the performance difference in the overall process?

I would be very surprised, even in matrix type loops, that this makes an appreciable difference given modern cpu branch predictors. 

I think a better area of attack would be adding cpu vector operations support. 

On Feb 21, 2020, at 1:20 PM, Bruno Albuquerque <b...@gmail.com> wrote:


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

Nigel Tao

unread,
Feb 22, 2020, 6:37:09 AM2/22/20
to Bruno Albuquerque, golang-nuts
On 2/22/20, Bruno Albuquerque <b...@gmail.com> wrote:
> https://play.golang.org/p/Y7ICg7t4_nd
>
> ...
>
> Unfortunately, I could not find an incantation that would remove the bound
> checks.

The code from that play.golang.org link is:

----
func NRGBA(rgbData []byte) []byte {
nrgbaData := make([]byte, len(rgbData)+(len(rgbData)/3))

offset := 0
for i := 0; i < len(rgbData); i = i + 3 {
nrgbaData[i+0+offset] = rgbData[i]
nrgbaData[i+1+offset] = rgbData[i+1]
nrgbaData[i+2+offset] = rgbData[i+2]
nrgbaData[i+3+offset] = 255
offset++
}

return nrgbaData
}
----

Well, the compiler cannot remove the bounds checks because that
function isn't actually bounds-safe, right? For example, if I call
that func with an rgbData slice such that len(rgbData) == 1, then:

nrgbaData is make'd with length (1 + (1/3)), which is 1.
offset is 0.
The loop over i runs 1 time (at i := 0, which satisfies i < 1).

But inside the loop, this line of code:
nrgbaData[i+1+offset] = rgbData[i+1]
is:
nrgbaData[1] = rgbData[1]
which is two bounds violations. As discussed above, both nrgbaData and
rgbData have length 1.

For the broader point, if you're trying to optimize pixel format
conversions, you might find
https://github.com/golang/exp/tree/master/shiny/driver/internal/swizzle
educational.

Nigel Tao

unread,
Feb 22, 2020, 6:54:43 AM2/22/20
to Bruno Albuquerque, Sebastien Binet, golang-nuts
On 2/22/20, Bruno Albuquerque <b...@gmail.com> wrote:
> https://play.golang.org/p/P2JPI42YJa8
>
> ...
>
> So, in this simplified case, if the step divides the length evenly, then
> there are no bound checks. If it does not, then bound checks are inserted.
>
> This seems to be an unnecessary check

If the step (e.g. 3) does not divide the length evenly, then e.g. "i
+= 3" can overflow such that i becomes negative, even though the
"len(a)" in the "i < len(a)" condition is a legitimate array or slice
length: a non-negative int, less than or equal to INT_MAX.

See:
https://play.golang.org/p/QGNKOtw3m62

Brian Candler

unread,
Feb 22, 2020, 8:04:02 AM2/22/20
to golang-nuts
On Saturday, 22 February 2020 11:54:43 UTC, Nigel Tao wrote:

If the step (e.g. 3) does not divide the length evenly, then e.g. "i
+= 3" can overflow such that i becomes negative, even though the
"len(a)" in the "i < len(a)" condition is a legitimate array or slice
length: a non-negative int, less than or equal to INT_MAX.

See:
https://play.golang.org/p/QGNKOtw3m62


That's an interesting thought.  I note that in the original code, the array is being indexed by an int. If the platform is one where int is 64 bits, the problem is moot as no existing hardware could handle an array with 2^63 elements.

My guess is it's do do with the boundary conditions for which the optimisation is written.  For a loop with step N which divides into the array size, the compiler knows that all of a[i] to a[i + N-1] are safe, i.e. the slice a[i:i+N] is safe, inside the loop.

However this is not true if N doesn't divide into the array size, as the final iteration has a smaller range of safe offsets.  It probably wasn't worth calculating the smaller safe range, as it's unlikely to benefit any real-world program.  It *could* have said that in that circumstance, only a[i] is safe - but again, probably doesn't benefit many real-world programs.

Bruno Albuquerque

unread,
Feb 22, 2020, 9:58:14 AM2/22/20
to Nigel Tao, Sebastien Binet, golang-nuts
Ah. Now it makes more sense. I completely abstracted away overflows (again, not a scenario that would happen in practice in any case).

Bruno Albuquerque

unread,
Feb 22, 2020, 9:59:38 AM2/22/20
to Nigel Tao, golang-nuts
Fair enough. This would never happen in practice but the compiler has no way to know. Would adding an explicit len() check before the loop help the compiler in any way (can not test right now)?

Concerning Swizzle: Although what I am doing is technically a color space conversion, going from RGB to (N)RGBA (as the alpha channel is always 255, it can be converted to non-normalized and normalized RGBA de same way), the actual conversion is just adding the alpha channel (or an extra byte every 3 bytes). I guess one can do this in a pretty fast way without having to resort to assembly.

Nigel Tao

unread,
Feb 22, 2020, 4:42:03 PM2/22/20
to Bruno Albuquerque, golang-nuts
On 2/23/20, Bruno Albuquerque <b...@gmail.com> wrote:
> Would adding an explicit len() check before the loop help the
> compiler in any way (can not test right now)?

I don't know. Sorry.

drc...@google.com

unread,
Feb 27, 2020, 12:00:34 PM2/27/20
to golang-nuts
FYI, strided iteration is something we're trying to do better but this is very tricky code, and it is also possible for it to get very expensive at compile time.  
Reply all
Reply to author
Forward
0 new messages