On 12/06/2016 10:38 PM, Smit Shah wrote:
> I started programming as many young folks do these days, diving into
> scripting language (Ruby) and using it to build CRUD apps etc.
>
> As time went on, I started dealing with more interesting problems
> and solving them efficiently became increasingly complex. I started
> looking into more performant languages (which lets you exercise more
> control), concurrent programs and distributed systems etc.
>
> However, I realized that all these complex systems are built on
> fundamental knowledge of how computers work, eg.e TCP/UDP/IP network
> stack, disks, memory, processors, compilers etc. I think it's
> paramount to know such things to come up with projects like
> Disruptor/Aeron etc.
>
> So my question is rather a simple one, how does one start to get good
> at these things? Did you folks pick these things up from books and
> whitepapers? Where does one start basically? Also, what should I pick
> up first? There is just so much to learn e.g Algorithms, Data
> structures, Design Fundamentals, etc.
Hi Smit,
I agree with the advice of others: "write code and profile it"-- you
need empirical feedback about what really works.
But when you're starting out, it also helps to have a map of the
territory. All of the things that you mentioned are relevant -- some are
new, and some are old, some are changing. The new and changing stuff
maybe you need to research on line, the old stuff maybe a second-hand
book from 20 years ago will be fine.
I read a lot of research papers when I'm starting on a new problem. That
gives me a feel for what researchers have found. Search for papers on
line with Google scholar. You can often also browse the listings of
papers at relevant conferences. (i.e.: find a paper that interests you,
find what conference it was published at, review other publications from
that conference, recurse into the references of relevant papers.)
I've put some book recommendations below, because I like books, but a
lot of this information is available on-line.
For TCP/IP networking, Steven's "Unix Network Programming" is the bible.
After that you can start reading people's papers about optimizing
network stacks.
You probably want to read a book on operating systems. I don't have a
good recommendation, but any OS internals documentation will teach you
something. (Recently, I learned a lot from reading the RSX11M design
documents (an old precursor of VMS):
http://www.mirrorservice.org/sites/www.bitsavers.org/pdf/dec/pdp11/rsx11/
Same goes for compiler internals documentation. For example, I remember
reading some good papers about IBM's XL compiler implementation at one
stage.
If you haven't taken a data structures course it would be worth reading
an introductory text. Then at least implement linked lists (single,
double), a red-black or AVL tree, and a hash table (or some other
similar data structures that you're interested in). Test, profile,
optimize, understand the performance constraints etc. Alternative, once
you understand an algorithm you could compare, profile and improve other
people's implementations. For example, I was surprised of the diversity
when I started looking into the different ways of implementing singly
linked lists:
http://www.rossbencina.com/static/blog/singly-linked-list-data-structure-variants/singly-linked-lists.pdf
The algorithms and data structures bible, of course, is Knuth's "Art of
Computer Programming." But there's plenty of info on the web once you
know what to look for. NIST has a dictionary of algorithms and data
structures that will at least give you names for things, then you can
search for implementations, or information about how to implement them:
https://xlinux.nist.gov/dads/
When I was starting out I read a lot of books on software design. You
can go many different ways with this and I hesitate to make
recommendations. (In the 90's I went down the OO Booch, Martin, Lakos
(3 books) path, which was good because it covered micro to macro design
considerations. On the other hand SICP is popular in some circles.) One
way or another you need to learn to design software, not just plug
components together. Knowledge in this area has been incrementally
growing and there are good books from every era.
It's nice to at least try to implement a compiler of some sort.
Memory allocation is another low-level thing you should know something
about. (e.g. different types of garbage collectors, different allocation
algorithms and their constraints). One specific example: I like this
talk about Bonwick's Slab allocator (or you could just read the original
paper):
http://paperswelove.org/2015/video/ryan-zezeski-memory-by-the-slab/
Most of the above are not really mechanical sympathy though, just basic
CS stuff from the pre-virtualized age.
Cache interactions are an important factor in mechanical sympathy. You
need to learn about how caches are structured, latencies, how they
communicate and synchronise. Any Computer systems architecture book will
have a chapter on cache coherency protocols, or you could start here:
https://en.wikipedia.org/wiki/Cache_coherence
https://en.wikipedia.org/wiki/MOESI_protocol
Also this book:
David Loshin, "Efficient Memory Programming"
(In summary: know about how data locality and access patterns interact
with caches. There are also heaps of blog posts about cache-optimized
data structures and algorithms.)
Going deeper:
Curt Schimmel, "Unix Systems for Modern Architectures" covers
kernel-level CPU cache management and multi-processor kernel architectures.
At some point it's a good idea to read CPU architecture manuals (e.g.
from Intel) and papers published about specific algorithms.
Regarding concurrency, lock-free concurrency is a popular topic, but
probably not worth getting deeply into until you've mastered the above.
There are two good books though:
Scott, "Shared-Memory Synchronisation"
Herlihy, Shavit, "The Art of Multiprocessor Programming"
The Scott book is a really excellent summary/introduction to the
mechanisms of both locking and lock-free synchronization. It covers
low-level details like memory barriers and caches. I wish I had read it
15 years ago when I was struggling to learn that stuff. Highly recommended!
Nitsan Wakart collected a bunch of interesting recent papers about
lock-free algorithms, you might like to take a look, just to see if you
understand:
https://github.com/JCTools/JCTools/tree/master/resources
(but there are also a bunch of older papers worth looking at).
If you're so inclined, doing a close-to-the-metal project like
programming a microcontroller in assembly language, that might be
informative. It helps to have a reason to do this though.
A link that needs no introduction:
http://www.agner.org/optimize/
Finally, if you're practically minded, working on projects that require
maximum performance will be a powerful motivator.
Ok, enough. I hope this helps,
Good luck!
Ross.