Intel has an article that includes a link to my library:
http://www.intel.com/cd/ids/developer/asmo-na/eng/151201.htm
(page 4)
Sun has an article that includes a link to my library:
http://developers.sun.com/solaris/articles/chip_multi_thread.html
(section 2.2.1.1.2)
A professor (cs.nyu.edu) includes a link to my library in one of his recent
lectures:
http://www.cs.nyu.edu/artg/internet/Spring2006/lectures/DavidBuksbaum-BuildingHighThroughputMulti-threadedServersInCSharpAndDotNet.ppt
(PowerPoint link)
Should I be getting nervous?
:O
I am not sure if this guy is a professor or if this slide-show is referenced
in the [...]/lectures directory of the cs.nyu.edu web server... He seems to
developed high-performance server that may deal sensitive financial
transactions... I think you have to have experience in other areas besides
programming to be hired for this kind of stuff...
AFAICT, financial programming involves knowing some of the messaging
protocols like FIX, some database programming, GUI stuff. Some
distributed multi-tier whatever they're calling it now. Multi-threading
shows up but it seems mostly to be Java so they can't be desperate for
performance.
--
Joe Seigh
When you get lemons, you make lemonade.
When you get hardware, you make software.
The presentation also discusses massively-multiplayer game servers, so
I think it's more a general overview of some of the tech. behind
demanding multiplexing servers than it is specific to finance.
> AFAICT, financial programming involves knowing some of the messaging
> protocols like FIX, some database programming, GUI stuff. Some
> distributed multi-tier whatever they're calling it now. Multi-threading
> shows up but it seems mostly to be Java so they can't be desperate for
> performance.
It depends what you qualify as financial programming. For example,
does the code handling trade order management (NASDAQ, for example)
count? What about data dissemination services such as Reuters or
Bloomberg? Or complex accounting systems for investment banks? Not to
mention the high-end number crunching required for risk analysis and
such. Most of the above benefit greatly from a concurrent design at
some level, but it's largely an issue of scale. A small shop doesn't
care if their system has a low throughput, but a large firm may.
Sean
Not desperate enough. As far as the performance benefits to be gained
from threading, most don't know any better. Of course as a threading
persion, I would say that threads aren't accorded enough importance. :)
Anyway, I just put in a little OT recently helping out in an emergency,
manually running some windows based Java application. Gui based so
it couldn't be automated. I annoyed some the others by mentioning
in unix we would probably automate something like this with a script
so it wouldn't take hours or days of manual work. And if I wrote it,
the execution part would be taking seconds rather than minutes. Java
may not all that slow in theory but in practice the current api's and
design patterns tend to serious bloat. Watching a seriously slow gui
forms refresh/repaint caused by just clicking a select button, you knew
there was a lot of bloated code running. The execute part which you
couldn't see since it was running on another server was probably in the
same bad shape. But you can sell this kind of crap since the gui looks
nice and the people making the purchase decisions aren't programmers and
don't know any better. Unfortunately, this kind of ignorance isn't just
limited to customers. It exists in programming groups of places you'd
think should know better, but obviously don't. Otherwise, why would
crap software like the program above exist in the market.
:)
[...]
> AFAICT, financial programming involves knowing some of the messaging
> protocols like FIX, some database programming, GUI stuff. Some
> distributed multi-tier whatever they're calling it now.
Humm... Good point. I can use lock-free algorithms to implement servers, but
I guess that would be at a lower level... He mentions C#.
Well, the C# memory model screws you real good when you try to implement
lock-free algorithms with it. Have you seen Jeffery Richter's PowerThreading
library Joe?
http://www.wintellect.com/Login.aspx?ReturnUrl=%2fMemberOnly%2fPowerThreading.aspx
(you have to sign up to download) ;(...
Its implemented in C#. I think he is "trying" to use C# garbage collection
to protect his lock-free queue and stack from ABA. However, he doesn't quite
get around to it... His queue and stack do not use DWCAS. He crams no
monotonic version counter into his CAS, so ABA will devastate him.
I "think I saw" a major bug in his design... He uses a lock-free LIFO for a
node cache... His lock-free queue can pop a node, store it in the node
cache, and another thread can pop the node and push into the same queue...
Boom, ABA....
Can you or anyone else see the same bug I can see? No DWCAS means that he
is going to have to remove all references from a popped node and wait for
GC; that means he cannot cache in the lock-free LIFO.
I wonder if he knows all of the caveats... Humm...
Ooops!
:)
> Multi-threading shows up but it seems mostly to be Java so they can't be
> desperate for
> performance.
Perhaps he is thinking of building a JVM...
;)
[...]
> Well, the C# memory model screws you real good when you try to implement
> lock-free algorithms with it. Have you seen Jeffery Richter's
> PowerThreading library Joe?
>
>
> http://www.wintellect.com/Login.aspx?ReturnUrl=%2fMemberOnly%2fPowerThreading.aspx
> (you have to sign up to download) ;(...
>
>
[...]
He has explicit full memory barriers in the wrong places... I thought that
C# volatile took care of this with overly strict memory model... He uses the
full memory barriers after loads... I thought that loads in C# had
#LoadStore | #LoadLoad...
Why do you this he uses full barriers after the loads from the queue anchor?
The barriers are in the CAS loop of the queue... He is using variation the
Michael and Scott algorithm. Lots of atomic operations in a loop...
Seems like the lock-free collections in this PowerThreading library are
extremely expensive. Man, I would just go with mutexs vs. this thing...
Okay here is some info...
David Buksbaum's job title is SVP Development Manager of Systematic Trading,
Citadel Investment
http://en.wikipedia.org/wiki/Citadel_Investment_Group
He created order execution servers that server a few-hundred thousand users
and are currently processing stuff from the New York Stock Exchange...
The professors name is Prof. Arthur Goldberg. Here is syllabus for spring
2006:
http://www.cs.nyu.edu/artg/internet/Spring2006/syllabus.html
Search for the text "David Buksbaum" and you will go directly to a link to
the powerpoint slide-show I link to, and a short description...
Hummm...
Any thoughts?
I've seen some of his stuff. Enough that I don't think I need to check
out everything he does on its own account unless someone else says he's
come up with something interesting.
>> I wonder if he knows all of the caveats... Humm...
>>
>
> I've seen some of his stuff. Enough that I don't think I need to check
> out everything he does on its own account unless someone else says he's
> come up with something interesting.
The PowerThreading library has an implementation of this:
http://groups.google.com/group/comp.programming.threads/msg/5093fda45d8da381
I bet that the following very crude lock-free reader-writer will outperform
it:
Humm...
Yeah. That would amortize cost of load barrier in Alpha:
http://groups.google.com/group/comp.programming.threads/msg/7b427bceff6f75da
?
The queue in the library doesn't allow for traversals... Just push and
pop... Volatile has load acquire in C#, so Richter has unnecessary barriers
in his code... If it did allow for traversals, a user would be executing
load acquire on every node visited during the traversal... This is an
example of how C# memory model can screw you...
;^(...
Humm... I am thinking of a theoretical 64-bit JVM implementation that is
based on a collection of clever high-performance lock-free algorithms. I
wonder if it would be able to run, as-is, on a SPARC in 64-bit mode. As we
all should know by now; its lacking a DWCAS instruction...
Ahhh, I will just wait for Sun to release that "obstruction-free KCSS wonder
tonic"; Yummy!
;)
However, I am making great use of many different types of lock-free offset
techniques in order to implement some rather advances writer-side lock-free
stuff on SPARC64... Humm... I am sometimes, basically forced to use some
"special stuff" to "skirt around the missing DWCAS instruction", so as a
side-effect, its basically forcing me to come up with different ways to
utilize various forms of offset trickery; its kind of entertaining... For
instance, I have atomic_ptr running smoothly in 64-bit mode. It doesn't have
to use any of Suns LL/SC emulation stuff.... Humm...
:)