[Sbcl-devel] Potential contrib: sb-simd

59 views
Skip to first unread message

Heisig, Marco

unread,
Dec 20, 2021, 9:40:06 AM12/20/21
to sbcl-...@lists.sourceforge.net
Hello everyone,

I have been working on a convenient SIMD interface for SBCL (https://github.com/marcoheisig/sb-simd), and it has slowly reached the point of being usable - at least according to Reddit ( https://www.reddit.com/r/Common_Lisp/comments/riedio/quite_amazing_sbcl_benchmark_speed_with_sbsimd/ ).

The question is whether it makes sense to turn sb-simd into an SBCL contrib or whether I should ship it via Quicklisp.

Pro (turning sb-simd into a contrib):

- Heavy use of SBCL internals and defines hundreds of VOPs.

- No external dependencies.

Contra:

- Rather large for a contrib. Compiling/loading it takes almost as long as compiling SBCL itself.

- Some features, especially the vectorizer, are still under development and subject to frequent changes.

- The project includes a database of instructions that may be useful independently of SBCL. If I decide to extract this database into a separate project one day, we'd have a contrib that depends on a Quicklisp project. Or I'd have to duplicate a lot of code, with all the problems that come with that.

The last item concerns me most. Any thoughts on the issue?

Best regards,
Marco

PS: While I have the attention of the experts - can someone think of an elegant idea how I can mark a piece of x86-64 code to either always VEX encoded instructions or never? In sb-simd, I have a separate package for each instruction set and all instructions therein use the correct encoding for their instruction set. But something like (cl:+ float1 float2), (cl:coerce fixnum 'single-float), or the instructions emitted by a MOVE to/from XMM registers don't know or care about the encoding and frequently incur a SSE<->AVX transition penalty. I wrote a hack to store instruction set information in the lexical environment (https://github.com/marcoheisig/sb-simd/blob/master/code/tweak-sbcl.lisp), but that is not pretty and doesn't cover all cases. And it forces me to overwrite several frequently used SBCL VOPs like move-to-single-reg. Any better ideas? Could one somehow write a pass that checks whether a piece of x86-64 code contains at least one VEX-encoded instruction, and if so, forces all relevant instructions to use VEX encoding?

_______________________________________________
Sbcl-devel mailing list
Sbcl-...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sbcl-devel

Charles Zhang via Sbcl-devel

unread,
Dec 20, 2021, 12:44:25 PM12/20/21
to marco....@fau.de, sbcl-...@lists.sourceforge.net
Hi Marco,

This stuff looks really great. I checked out the repository and I'm impressed by all the work put into it, especially the vectorizer.

I think it makes the most sense to make sb-simd a contrib that builds with the rest of the system, if only to keep it from bit-rotting, since it will be annoying if the quicklisp world or whatever breaks whenever an internal change is made. That could be localized and the simd stuff could be updated much better if it was part of the source repository. There's probably some stuff in sb-simd that could logically be included in the system itself too, especially the non-x86 specific things.

As for the cons (in order):
- There may be some tricks to lower the compile time or at least make the contrib build optional at developer build time.
- The code would be subject to change whether it's a contrib or not, so there's not really an issue there. People would need to update sb-simd-using-code either way.
- This isn't such a big deal at all actually. If you decide to factor out a database into a separate repository, there's no reason why you can't pull that repository into sb-simd when building sb-simd as a git submodule or whatever. (sbcl bundles asdf as a contrib, and all other contribs rely on it. quicklisp doesn't need to be in the picture at all, so just use plain ASDF) The contrib stuff happened before the git age, so things like submodules are not currently exercised for that reason. The source distribution could just include some pinned version of the database known to work with the version of the contrib.

Re. PS: I don't know the details really of sb-simd, but this sounds like what move vops and storage classes were designed for. You should be able to specify move vops (see backend/move.lisp) to take the sse <-> avx coercion costs into account and representation selection should deal with those coercions. You don't want/need a pass or environment hackery to choose the right instructions. That probably just means you need to choose the right vops. A good way of doing this is with vops with the right storage base included for avx vs sse, and/or creating instructions or instruction macros that can get parameterized by avx vs see, and choosing the appropriate vop based on SC restrictions. (macrology/vop variants/instruction macros may help with this).

Good work!
Charles

Philipp Marek via Sbcl-devel

unread,
Dec 20, 2021, 2:01:17 PM12/20/21
to Charles Zhang, sbcl-...@lists.sourceforge.net
> I think it makes the most sense to make sb-simd a contrib that builds
> with the rest of the system, if only to keep it from bit-rotting,

I second that; I also believe that having it tightly integrated
with SBCL is better for the long-term.



I'd like to bring up a related topic:

Both SBCL and Babel resp. Flexi-Streams have very similar,
potentially identical, tables for unicode characters.

Here's the LOC:

17629 babel-20200925-git src/jpn-table.lisp
43589 sbcl
src/code/external-formats/enc-cn-tbl.lisp
44973 sbcl
src/code/external-formats/enc-jpn-tbl.lisp
48314 flexi-streams-20200925-git enc-cn-tbl.lisp

My estimate is that when both Babel and Flexi-Streams are loaded into an
image there's 1/2 to 1 MB of duplicated data here.

Could we provide some good, fast functions that do lookups on the SBCL
internal
tables, so that the other projects can just reuse them via some feature?
(#+common-unicode-tables or something like that, perhaps.)


Thanks for any ideas or efforts!


Ph.

Bela Pecsek

unread,
Dec 20, 2021, 2:05:43 PM12/20/21
to Heisig, Marco, sbcl-...@lists.sourceforge.net
Hi Marco,

I only wanted to comment on the compile time “issue” but ended up commenting on contrib as well.

On my laptop building sb-simd takes a touch less then 30 seconds that would be something like 20 odd percentage compile time increase over full SBCL build.

I think making it into contrib would make a lot of sense to make it stable between builds and 
putting it automatically into everyone’s hand to use it.

Kindest Regards,
Bela

Sent from my iPad

On 20 Dec 2021, at 15:40, Heisig, Marco <marco....@fau.de> wrote:

Hello everyone,

Stas Boukarev

unread,
Dec 20, 2021, 2:27:07 PM12/20/21
to Heisig, Marco, sbcl-...@lists.sourceforge.net
I think it can be kept in sync without making it a contrib. Automated
builds should catch any errors.

Charles Zhang via Sbcl-devel

unread,
Dec 20, 2021, 2:35:30 PM12/20/21
to stas...@gmail.com, Heisig, Marco, sbcl-...@lists.sourceforge.net
The distinction with whether it's a contrib or not comes down to when a change in the compiler proper is made. It's not about just having some automated build system. Say someone changes the EA constructor for x86s:

As a contrib, it's on the person who changes the EA constructor to update sb-simd appropriately.

If it's out of tree, it's on whoever is maintaining the out of tree repository.

So there's a clear organizational difference in terms of effectiveness of staving away bitrot.

Stas Boukarev

unread,
Dec 20, 2021, 2:56:13 PM12/20/21
to Charles Zhang, sbcl-...@lists.sourceforge.net
Say someone changes the make-array constructor and it breaks on some
project, then we go and fix it. Should be the same here.
It's not the location of the source code that matters, but who is
maintaining it. Right now sb-simd is managed by a separate group of
people. It has its own bug tracker, mailing list or whatever. That all
is an additional maintenance burden outside of just keeping the
sb-assem APIs stable.
If something's actively looked after then there will be no bitrot. But
if the original authors lose interest at some point, no amount of
inclusion into the source tree will help keep it alive.

Charles Zhang via Sbcl-devel

unread,
Dec 20, 2021, 3:15:11 PM12/20/21
to stas...@gmail.com, sbcl-...@lists.sourceforge.net
Agree with you that the "who" matters. But I still think that there's a layer of friction that gets removed by inclusion in source tree. For example, someone may not know anything about sb-simd, and makes a breaking change, but will know something broke immediately by building, and can figure out a fix right at the point of breakage. If it's out of tree, it gets reported to mailing list, bisection may be needed, original breaker may not be available at that time, etc. I agree that in principle "we go fix it" should suffice, but there is an additional layer of stuff that needs to happen.

The question is whether sb-simd warrants inclusion in the source tree to minimize this friction. Given the amount of internal stuff sb-simd uses (vops, patches, etc), and the general utility of such a contrib, I think it makes sense to classify this differently than say, a quicklisp project written in totally portable common lisp which happens to have a type incorrect make-array constructor. I would even say some of the existing contribs in tree are probably even more "portable/stable" than sb-simd.

Perhaps more opinionated portions of the library which are higher level such as auto-vectorizer IR could be maintained out of tree, as they are less subject to random breakage from backend changes.

Heisig, Marco

unread,
Dec 21, 2021, 6:49:50 AM12/21/21
to stas...@gmail.com, Charles Zhang, sbcl-...@lists.sourceforge.net
Thanks for the quick replies. I tend to agree with karlosz. If the past is an indicator of the future, most commits of sb-simd will depend on a particular version of SBCL. And while I'm willing to maintain sb-simd for the forseable future, I'm not willing to support backward compatibility to older versions of SBCL. So an sb-simd on Quicklisp would be broken for most of the people most of the time.

I can separate sb-simd into a relatively stable part that strongly depends on SBCL internals, and an experimental part (mostly the vectorizer) that doesn't. The stable part could be made a contrib and the experimental part could be put on Quicklisp. How does that sound?
________________________________________
From: Charles Zhang <charles...@yahoo.com>
Sent: Monday, December 20, 2021 9:14:14 PM
To: stas...@gmail.com
Cc: Heisig, Marco; sbcl-...@lists.sourceforge.net
Subject: Re: [Sbcl-devel] Potential contrib: sb-simd

Agree with you that the "who" matters. But I still think that there's a layer of friction that gets removed by inclusion in source tree. For example, someone may not know anything about sb-simd, and makes a breaking change, but will know something broke immediately by building, and can figure out a fix right at the point of breakage. If it's out of tree, it gets reported to mailing list, bisection may be needed, original breaker may not be available at that time, etc. I agree that in principle "we go fix it" should suffice, but there is an additional layer of stuff that needs to happen.

The question is whether sb-simd warrants inclusion in the source tree to minimize this friction. Given the amount of internal stuff sb-simd uses (vops, patches, etc), and the general utility of such a contrib, I think it makes sense to classify this differently than say, a quicklisp project written in totally portable common lisp which happens to have a type incorrect make-array constructor. I would even say some of the existing contribs in tree are probably even more "portable/stable" than sb-simd.

Perhaps more opinionated portions of the library which are higher level such as auto-vectorizer IR could be maintained out of tree, as they are less subject to random breakage from backend changes.

On Mon, Dec 20, 2021 at 11:55, Stas Boukarev
<stas...@gmail.com> wrote:
Say someone changes the make-array constructor and it breaks on some
project, then we go and fix it. Should be the same here.
It's not the location of the source code that matters, but who is
maintaining it. Right now sb-simd is managed by a separate group of
people. It has its own bug tracker, mailing list or whatever. That all
is an additional maintenance burden outside of just keeping the
sb-assem APIs stable.
If something's actively looked after then there will be no bitrot. But
if the original authors lose interest at some point, no amount of
inclusion into the source tree will help keep it alive.

On Mon, Dec 20, 2021 at 10:34 PM Charles Zhang <charles...@yahoo.com<mailto:charles...@yahoo.com>> wrote:
>
> The distinction with whether it's a contrib or not comes down to when a change in the compiler proper is made. It's not about just having some automated build system. Say someone changes the EA constructor for x86s:
>
> As a contrib, it's on the person who changes the EA constructor to update sb-simd appropriately.
>
> If it's out of tree, it's on whoever is maintaining the out of tree repository.
>
> So there's a clear organizational difference in terms of effectiveness of staving away bitrot.
>
> On Mon, Dec 20, 2021 at 11:27, Stas Boukarev
> <stas...@gmail.com<mailto:stas...@gmail.com>> wrote:
> I think it can be kept in sync without making it a contrib. Automated
> builds should catch any errors.
>
> On Mon, Dec 20, 2021 at 5:40 PM Heisig, Marco <marco....@fau.de<mailto:marco....@fau.de>> wrote:
> >
> > Hello everyone,
> >
> > I have been working on a convenient SIMD interface for SBCL (https://github.com/marcoheisig/sb-simd), and it has slowly reached the point of being usable - at least according to Reddit ( https://www.reddit.com/r/Common_Lisp/comments/riedio/quite_amazing_sbcl_benchmark_speed_with_sbsimd/ ).
> >
> > The question is whether it makes sense to turn sb-simd into an SBCL contrib or whether I should ship it via Quicklisp.
> >
> > Pro (turning sb-simd into a contrib):
> >
> > - Heavy use of SBCL internals and defines hundreds of VOPs.
> >
> > - No external dependencies.
> >
> > Contra:
> >
> > - Rather large for a contrib. Compiling/loading it takes almost as long as compiling SBCL itself.
> >
> > - Some features, especially the vectorizer, are still under development and subject to frequent changes.
> >
> > - The project includes a database of instructions that may be useful independently of SBCL. If I decide to extract this database into a separate project one day, we'd have a contrib that depends on a Quicklisp project. Or I'd have to duplicate a lot of code, with all the problems that come with that.
> >
> > The last item concerns me most. Any thoughts on the issue?
> >
> > Best regards,
> > Marco
> >
> > PS: While I have the attention of the experts - can someone think of an elegant idea how I can mark a piece of x86-64 code to either always VEX encoded instructions or never? In sb-simd, I have a separate package for each instruction set and all instructions therein use the correct encoding for their instruction set. But something like (cl:+ float1 float2), (cl:coerce fixnum 'single-float), or the instructions emitted by a MOVE to/from XMM registers don't know or care about the encoding and frequently incur a SSE<->AVX transition penalty. I wrote a hack to store instruction set information in the lexical environment (https://github.com/marcoheisig/sb-simd/blob/master/code/tweak-sbcl.lisp), but that is not pretty and doesn't cover all cases. And it forces me to overwrite several frequently used SBCL VOPs like move-to-single-reg. Any better ideas? Could one somehow write a pass that checks whether a piece of x86-64 code contains at least one VEX-encoded instruction, and if so, forces all relevant instructions to use VEX encoding?
> >
> > _______________________________________________
> > Sbcl-devel mailing list
> > Sbcl-...@lists.sourceforge.net<mailto:Sbcl-...@lists.sourceforge.net>
> > https://lists.sourceforge.net/lists/listinfo/sbcl-devel
>
>
>
> _______________________________________________
> Sbcl-devel mailing list
> Sbcl-...@lists.sourceforge.net<mailto:Sbcl-...@lists.sourceforge.net>

Charles Zhang via Sbcl-devel

unread,
Dec 21, 2021, 4:09:42 PM12/21/21
to marco....@fau.de, stas...@gmail.com, sbcl-...@lists.sourceforge.net
That sounds ideal to me.

Christophe Rhodes

unread,
Jan 6, 2022, 6:32:55 AM1/6/22
to Heisig, Marco, sbcl-...@lists.sourceforge.net
Hi!

I spent a little time over the break thinking about this, and working
through some of the issues on a smaller problem. The `nibbles` library
has optimized implementations of some accessors on x86 and x86-64, and I
wanted to try and see concretely what the benefits and issues would be
in turning those optimizations into a contrib.

The results of that are at:
<https://github.com/sharplispers/nibbles/pull/12>
in conjunction with
<https://github.com/csrhodes/sbcl/tree/sb-nibbles>

Potential benefits:

- no more user-visible breakage if we decide to change assembler syntax
(has happened at least twice in the past with nibbles);

- easy to imagine adding other architecture support for these operators,
with continuous integration "in the right place";

Potential issues:

- additions to the nibbles interface will not gain optimized support in
sync: it will depend on updating the SBCL contrib, and then also on
which version of SBCL the library is run on;

- it becomes very hard to deal with any backwards-incompatible changes
because of the matrix of possible nibbles/SBCL version combinations;

Not solved by this:

- remaining dependence in nibbles on some SBCL internals,
e.g. `defknown`, `deftransform`;

I think, having done this exercise, I am in favour of this sb-nibbles
contrib, and if there is a "backend" subset of sb-simd that looks fairly
stable and uncontroversial I would similarly be in favour of getting
that integrated. I would be hesitant to suggest having things which are
under heavy development or speculative as part of the same thing, at
least as first, but would rather suggest building that on top for now
and moving them into the contrib once a stable interface to them
emerges.

Does that make sense? Thoughts, either generally or specifically on
sb-nibbles?

Best wishes,

Christophe

Heisig, Marco

unread,
Jan 6, 2022, 9:56:28 AM1/6/22
to Christophe Rhodes, sbcl-...@lists.sourceforge.net
Hi Christophe,

thank you for looking into this! I am already in the process of separating sb-simd into a part that is stable and particularly dependent on SBCL internals and one that isn't. I will reach out to sbcl-devel once that process is complete.

For those interested - the repository for the portable, experimental part that I'm now separating from sb-simd is https://github.com/marcoheisig/Loopus (still in an embryonic state)

There are also some details left to be worked out, like whether a CPUID function should be in sb-simd or directly in SBCL, or how to handle the SSE/AVX transitions I mentioned earlier. But those discussions can wait until I have prepared a pull request.

Best regards,
Marco
________________________________________
From: Christophe Rhodes <cs...@cantab.net>
Sent: Thursday, January 6, 2022 12:31:52 PM
To: Heisig, Marco
Cc: sbcl-...@lists.sourceforge.net
Subject: Re: [Sbcl-devel] Potential contrib: sb-simd

Reply all
Reply to author
Forward
0 new messages