[Sbcl-devel] SB-TRANSACTION - announcement and some questions

31 views
Skip to first unread message

max

unread,
Jun 22, 2013, 4:13:10 PM6/22/13
to sbcl-...@lists.sourceforge.net
Hello,

I would like to announce my first contribution to SBCL:

SB-TRANSACTION is a SBCL compiler plugin for the x86-64 family of CPUs
to access the new Restricted Transactional Memory (RTM) assembler
instructions dealing with hardware memory transactions (described for
example at
http://en.wikipedia.org/wiki/Transactional_Synchronization_Extensions)
introduced by Intel on Core i5 4570, Core i5 4670 and Core i7 4770.

The assembler instructions XBEGIN, XEND, XABORT and XTEST are wrapped
into regular Lisp functions:
* XBEGIN -> (transaction-begin)
* XEND -> (transaction-end)
* XABORT -> (transaction-abort)
* XTEST -> (transaction-running-p)

The additional function (transaction-supported-p) internally uses CPUID
to determine if RTM instructions are supported by the CPU.

SB-TRANSACTION is available from GitHub
https://github.com/cosmos72/stmx/tree/master/sb-transaction and is
currently packaged as a subfolder inside my library STMX
https://github.com/cosmos72/stmx - a high-performance implementation of
transactional memory.

At the moment SB-TRANSACTION is hardware-only and STMX is software-only;
my objective is to integrate them.

Constructive feedback is welcome :)


-----------------------------------------------------------------------
Now, the questions:

1) where can I find some documentation about the differences between
calling (sb-c::%primitive SOME-VOP-WITH-ARGS) and calling a DEFKNOWN
function?
I examined various online resources, including:
Paul Khuong blog
http://www.pvk.ca/Blog/2013/04/13/starting-to-hack-on-sbcl/
http://pvk.ca/Blog/Lisp/hacking_SSE_intrinsics-part_1.html
RAM’s "The Python Compiler for CMU Common Lisp"
http://www.cs.cmu.edu/~ram/pub/lfp.ps
SBCL Internals
http://www.sbcl.org/sbcl-internals/index.html
SSE intrinsics implementation for ECL & SBCL
https://github.com/angavrilov/cl-simd
A relevant stackoverflow topic

http://stackoverflow.com/questions/15350409/is-there-a-way-to-get-sbcl-to-print-out-the-value-of-a-cpu-register
And some very useful non-english pages defining simple VOPs
(russian) my-xor VOP http://lisper.ru/apps/format/138
(japanese) cpuid VOP http://kurohuku.blogspot.fi/2009/11/sbclcpuid.html

The canonical solution seems to define a function that calls a DEFKNOWN
function, and the two functions are typically the same, in an apparent
self-recursion. Unluckily, such solution seems quite fragile to me: a
simple C-c C-c on the function definition in SLIME is all it takes to
generate a real self-recursing function, bypassing the correct expansion.

On the other hand, calling (sb-c::%primitive ...) seems to generate less
optimized code with more type checks and register moves, and it does not
work for :conditional VOPs.

In any case, I found nowhere any explanation of the theorical difference
between the two.

2) I did not yet implement Hardware Lock Elision (HLE), an alternative
set of assembler instructions to use hardware transactional memory, and
it seems quite a more difficult task to me, as the relevant instructions
XACQUIRE and XRELEASE must be used as prefixes for other instructions as
MOV, LOCK XADD, and in general memory writes. In my ignorance I see no
other solution than duplicating a lot of existing VOPs as
compare-and-swap, regular memory writes... to add variants which insert
the prefixes XACQUIRE or XRELEASE. A lot of work, and a lot of
duplicated code.

Any suggestion about a better way to implement them?


Regards,

Massimiliano

------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Sbcl-devel mailing list
Sbcl-...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sbcl-devel

Lutz Euler

unread,
Jun 23, 2013, 6:43:37 AM6/23/13
to max, sbcl-...@lists.sourceforge.net
Hi Massimiliano,

> SB-TRANSACTION is a SBCL compiler plugin for the x86-64 family of CPUs
> to access the new Restricted Transactional Memory (RTM) assembler
> instructions dealing with hardware memory transactions (described for
> example at
> http://en.wikipedia.org/wiki/Transactional_Synchronization_Extensions)
> introduced by Intel on Core i5 4570, Core i5 4670 and Core i7 4770.

I won't comment on most of your work as I don't have a CPU supporting
these instructions.

But I can point you at some info about prefix instructions.

> 2) I did not yet implement Hardware Lock Elision (HLE), an alternative
> set of assembler instructions to use hardware transactional memory,
> and it seems quite a more difficult task to me, as the relevant
> instructions XACQUIRE and XRELEASE must be used as prefixes for other
> instructions as MOV, LOCK XADD, and in general memory writes. In my
> ignorance I see no other solution than duplicating a lot of existing
> VOPs as compare-and-swap, regular memory writes... to add variants
> which insert the prefixes XACQUIRE or XRELEASE. A lot of work, and a
> lot of duplicated code.
>
> Any suggestion about a better way to implement them?

See my commits 65bdee4ba534e82c352cff3eec16473daaf285dd
"Improve handling of x86[-64] prefix instructions in the disassembler."
and eb53f2bf913aa34aee83b35eb2b709a2e0d40366 "Make the disassembler
understand instruction prefixes." from 2011, and the ones around these
two.

The commit messages explain how to define instruction prefixes that
avoid the duplicated code you rightly don't want. This way is already
used for REX, LOCK, REP etc. Please compare the assembly side of LOCK
with REP: You have the choice whether to make the prefix an additional
argument to the main instruction as done with REP or have it be a
separate instruction to be inserted by (INST ...).

(Don't look at REX because that additionally is suppressed in the
disassembly and must sometimes be placed after the first byte of the
opcode.)

Also, you have the choice to make your X... simply normal instructions.
That makes the disassembly look less nice but otherwise works just as
well (see how the disassembly of LOCK looked before my changes, when
LOCK was a normal instruction, shown in the commit message of the
"Improve ..." commit cited above).

If you want your prefixes printed like FS on x86, that is, directly
before the memory reference, you need to find out how that is done with
FS. I believe that means modifying some internal function in the
disassembler.

Hope that helps! Regards

Lutz
Reply all
Reply to author
Forward
0 new messages