Mechanical sympathy in scripting languages?

William Pietri

unread,

May 31, 2016, 11:23:17 AM5/31/16

to mechanical-sympathy

Hi! I've been lurking for a year or so, and I've greatly enjoyed the list. This recent comment has me thinking, so I wanted to ask a question of the folks here:

On 05/20/2016 09:09 AM, Gil Tene wrote:

Perhaps it's because performance and concurrency issues seem to be top of mind and relevant to Java folks more than others? [...] I would personally categorize C, C++, C#, Java, Rust, Go, and some related langs as "closer to the metal" than some other environments (Python, JS, Ruby, Perl, PHP), mostly in the sense that you can more easily and predictably reason about what the code actually does in terms of machine operations. But I'm sure there is mechanical sympathy to be applied in those other environments too.

My background has some closer-to-the-metal work. Some Apple ][ assembler as a kid; a bunch of C; lots of Java (including happily using Prevayler); tons of performance tuning. But most work these last several years has been in startup-land, where speed of iteration has been the most important criterion, so I've mainly been working in Ruby and Python.

95% of the time, those languages are fine. But sometimes when I hit a performance issue, I'm at a bit of a loss. I know my technical options, of course, but there's a... philosophical issue between me and most people who work in those languages (which use global interpreter locks). When something is taking too long, my instinct is to say, "Hey, let's keep the data right where it is in RAM and use the box's 7 idle cores." Theirs is to look brightly at me and say, "Oh, we'll just run more processes and communicate over sockets. And maybe use more boxes. And memcache must work in here somehow, right?"

What happens in my head is an accounting of the serialization and deserialization costs, plus the RAM soaked up by many runtimes, plus a lot of unnecessary trips through the kernel, plus maybe crossing the network to other boxes, plus a lot of programming shenanigans to break apart and reunify data, plus the assorted ops issues. I try not to look pained or sigh audibly, at least until they're out of sight. Then I think about my grandmother's basement, which had shelves full of coffee cans and paper bags and egg cartons, saved because she had grown up in the Great Depression, and her value metrics were hopelessly out of date. I wonder if that's me now, and I sigh again.

After one of these architectural conversations, I'm sure one of us is missing something important, but I'm not always sure who that is.

So I guess I'm wondering three things:

How do others with mechanical sympathy in scripting-heavy environments deal with thinking about and collaborating around system design?
My normal heuristic for deciding when to put on my mechanical sympathy goggles is "user-visible performance degradation". Before that, I mainly favor other things. This has pluses and minuses. What other approaches do people use?
Do folks here see rising language options that provide the flexibility of languages like Python and Ruby but still let those of us with mechanical sympathy make good use of the resources at hand?

Thanks,

William

Dan Eloff

unread,

May 31, 2016, 11:55:54 AM5/31/16

to mechanica...@googlegroups.com

Not really to address your concerns, but just share my experience (pain?) in being a mainly Python developer for the last decade:

-Computers are ridiculously fast, so most of the time in a typical web application / mobile backend performance doesn't matter.

-Web developers have appalling mechanical sympathy, but like you say, maybe that's us that has the outdated viewpoint. Usually shipping code with low levels of defects is a more sensible priority than shipping fast code - except in the few cases where speed really matter

-It's not unusual to see thousands of (serial) roundtrips to the database in a single page. Scaling that usually means a combination of caching and combining queries - both of which complicate the code and the application stack

-If you want to scale something to multiple cores, you're usually shit out of luck. Even if you get around the GIL you are probably limited by the database, and it's nearly impossible to split your database transaction across multiple threads and connections. And even if you're lucky enough to have a pure compute problem that you could code in C to get around the GIL, good luck getting C into the build process, your whole team will vote you down

Eventually you end up with a pretty involved stack, database servers, cache servers (maybe redis once memcached becomes not enough), app servers, maybe some in-process caching too - plus typically nginx or apache or both in front of / hosting the python proceses(s). Because of the GIL that (s) had better not be omitted. Don't forget the overhead and impedance mismatch from using an ORM - because who doesn't these days. The whole thing gets very complex, very unwieldy, and very slow. Plus because it's a dynamic language, once the code base gets large enough it becomes very hard to change things with confidence - even with good tests (which take forever to run.) Now, change to a microservices based architecture because you need to scale development too and multiply that stack complexity for every service, and then square that to account for all the ways the services can interact/depend on each other. Have fun.

I'm sick of it. I'm working fulltime on a new kind of embedded relation in-memory database that embodies mechanical sympathy. Use it from an efficient runtime like Go, or Java, or .NET. No need for caches, roundtrips for queries, query construction and parsing, etc. Just throw up a cluster of 3 servers running one app server per microservice each and just scale vertically until you can't anymore. By the time you run out of headroom, you'll be a top 100 site handling a million requests a second and you can spend the resources to partition your services correctly (bonus points if you already have things nicely separated.) And the best part is, thanks to using fast, static languages you can change code with confidence and run the tests quickly - that iteration time is critical to actually enjoying the work we do and I've observed it's faster for static languages! Oh, the irony. That's my dream for the future of web/app backend development anyway. I'm done with Python and the like.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Trent Nelson

unread,

May 31, 2016, 2:27:47 PM5/31/16

to mechanical-sympathy

On Tuesday, May 31, 2016 at 11:55:54 AM UTC-4, Daniel Eloff wrote:

Not really to address your concerns, but just share my experience (pain?) in being a mainly Python developer for the last decade:

-Computers are ridiculously fast, so most of the time in a typical web application / mobile backend performance doesn't matter.
-Web developers have appalling mechanical sympathy, but like you say, maybe that's us that has the outdated viewpoint. Usually shipping code with low levels of defects is a more sensible priority than shipping fast code - except in the few cases where speed really matter
-It's not unusual to see thousands of (serial) roundtrips to the database in a single page. Scaling that usually means a combination of caching and combining queries - both of which complicate the code and the application stack
-If you want to scale something to multiple cores, you're usually shit out of luck. Even if you get around the GIL you are probably limited by the database, and it's nearly impossible to split your database transaction across multiple threads and connections. And even if you're lucky enough to have a pure compute problem that you could code in C to get around the GIL, good luck getting C into the build process, your whole team will vote you down

I had pretty good results with PyParallel (http://pyparallel.org). Even slip in a mechanical sympathy reference on the Github README (https://github.com/pyparallel/pyparallel):

Designing around mechanical sympathy at the hardware, kernel, OS userspace, and protocol level. The reason Windows was used to develop PyParallel is because it simply has the best kernel and userspace primitives for not only solving the problem, but solving it as optimally as the underlying hardware will allow. Windows itself exhibits excellent mechanical sympathy between all levels of the stack, which we leverage extensively. The reception of a TCP/IP packet off the wire, to the I/O request completion by the NIC's device driver, to the threadpool I/O completion, to the dispatching of the most optimal thread within our process, to calling back into the C function we requested, to us immediately invoking the relevant Python object's data_received() call; mechanical sympathy is maintained at every layer of the stack, facilitating the lowest possible latency.

With the underlying implication being that the NT kernel has more mechanical sympathy than the Linux kernel.

How's that for a contentious point-of-view? ;-)

(I stand by it though.)

Trent.

Daniel Janzon

unread,

May 31, 2016, 4:58:00 PM5/31/16

to mechanica...@googlegroups.com

On Tue, May 31, 2016 at 5:23 PM William Pietri <wil...@scissor.com> wrote:

So I guess I'm wondering three things:

How do others with mechanical sympathy in scripting-heavy environments deal with thinking about and collaborating around system design?

My normal heuristic for deciding when to put on my mechanical sympathy goggles is "user-visible performance degradation". Before that, I mainly favor other things. This has pluses and minuses. What other approaches do people use?

Do folks here see rising language options that provide the flexibility of languages like Python and Ruby but still let those of us with mechanical sympathy make good use of the resources at hand?

1. I think the python crowd tries to deal with it by delegating everything that requires performance to a third party multi-core service or distributed database. Those are often written in Java or C++. I think getting further than that, i.e. discussing which super-new data base to use, is difficult with most of these dynamically typed people.

2. I think the most timeless approach to when to wear the optimization goggles is the economic view. Does your python product/service need to run on a large number of servers? Then *maybe* you're at the economic optimum but it is likely that using a mechanically sympathetic language would bring you down to a single machine (and one more for redundancy). That's not only less hardware, but also less complexity and hence a cheaper solution. The cost of troubleshooting a failed request served by micro services on twenty machines can easily be estimated - just go and ask the support team how much time they spend on it and multiply with estimated salaries.

3. I think the mainstream languages will get more flexibility. C++, Java and C# gets new features all the time. As another poster mentioned, it is hard to maintain and develop large pieces of software written in Python so I think there will come something else addressing that problem. Or maybe the static analyzers for Python and the likes will be developed so they can analyze complete applications and not only single files.

Evan Meagher

unread,

May 31, 2016, 6:21:32 PM5/31/16

to mechanica...@googlegroups.com

As a caveat to Daniel's first bullet point, I'd point out the leverage one can gain for math-heavy workloads from Python libraries like NumPy, which rely on bindings to underlying C and Fortran libraries for the heavy lifting of numerical computing. I've been able to put off a native rewrite of a bunch of Python code that my company runs on ARM SoC devices specifically because of the performance we're able to squeeze out of NumPy's ability to vectorize array operations. For most of these daemons, the only code that actually runs in the Python interpreter relates to process lifecycle management and interaction with the outside world (e.g. option parsing, logging), while the guts of the data path is mostly invocations of NumPy APIs.

--

You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Evan Meagher

William Pietri

unread,

Jun 2, 2016, 5:51:39 PM6/2/16

to mechanica...@googlegroups.com

On 05/31/2016 08:55 AM, Dan Eloff wrote:

Not really to address your concerns, but just share my experience (pain?) in being a mainly Python developer for the last decade:

Thanks, Dan. It's good to know I'm not alone in these struggles.

I'm sick of it. I'm working fulltime on a new kind of embedded relation in-memory database that embodies mechanical sympathy. Use it from an efficient runtime like Go, or Java, or .NET. No need for caches, roundtrips for queries, query construction and parsing, etc. Just throw up a cluster of 3 servers running one app server per microservice each and just scale vertically until you can't anymore. By the time you run out of headroom, you'll be a top 100 site handling a million requests a second and you can spend the resources to partition your services correctly (bonus points if you already have things nicely separated.) And the best part is, thanks to using fast, static languages you can change code with confidence and run the tests quickly - that iteration time is critical to actually enjoying the work we do and I've observed it's faster for static languages! Oh, the irony. That's my dream for the future of web/app backend development anyway. I'm done with Python and the like.

That sounds great.

You might check out the history of Prevayler, which sounds in some ways similar. It was ahead of its time; I think I first came across it in 2002 and used it heavily in 2004-5. Its basic notion: if you keep your data hot in RAM and log all your changes to disk before applying them, you get the ACID guarantees, a very simple system, and unbelievable speed.

Once we got used to it, it was an amazing development experience. When you can run your 1000+ unit tests in a second or so, progress is rapid. If everything's in one JVM, the app is easy to deploy and maintain. Spinning up a new environment literally took two seconds. Having a complete log of all changes made it easier to run down bugs and provided a bunch of analytics at low cost.

It never really caught on, though. It was at an early-adopter stage so you had to be willing to write some things yourself, like indexes and migrations. You'd be surprised how many developers can't even conceive of finding a user without being able to say "SELECT * FROM Users WHERE...". But I think the main problem was that the model was so alien that only people with mechanical sympathy understood the benefits.

In your shoes, I'd definitely try to find some way to make it extremely easy to get started with. That is, instead of targeting developers like you and me, that you make something ridiculously easy for junior developers to pick up. At the time with Prevayler my take was, "Well duh, it isn't that hard to figure out." But later, watching the rise of Rails, I realized how much adoption depends on getting that much broader audience on board. In retrospect, I wish I had tried to push Prevayler down that path.

William

Rajiv Kurian

unread,

Jun 2, 2016, 9:10:54 PM6/2/16

to mechanical-sympathy

So much of speed of al language is down to implementations of the language. Java once upon a time was also really slow. My experience has been that dynamic languages with a JIT (luaJit, javascript etc) are surprisingly competitive when used with knowledge of how the JIT works. Compared to Java an additional thing programmers need to do is to make sure that the shape of their object remains the same i.e. don't use objects as hash maps. The JIT then works really well.

My experience writing Java is that the optimizer is so good that straight forward implementations of hot loops (all primitives, no allocs) sometimes end up beating the exact same C++ equivalent. I was quite shocked by this the first time it happened, since there isn't too much run time information for the JIT to exploit. My conclusion was that JVMs just have state of the art compilers that are quite competitive with GCC/Clang. I sometimes have to use SIMD instructions in C++ to finally beat the straight forward Java implementation.

Similarly modern Javascript compilers are quite good too. There is regular competition between browser vendors and a lot of good compiler engineers working on them. For numerical code, if you write code in asm.js style (or rather compile to it) it will give you near C++ speeds (maybe 60-70%). These numbers seem to be improving every day too. Checkout the Unreal engine browser demo to see what it's capable of doing. Javascript also has things like ArrayBuffers, DataView etc which make it fairly easy to simulate structs and arrays of structs which is probably the biggest win when it comes to most applications. Emscripten allows for translation of C code with structs to Javascript. SIMD is being actively worked on too and seems like it will be on Javascript before it gets to Java. For throughput hungry code, there are quite a few libraries that translate OpenCL/CUDA like code to WebGL shaders that can be run from javascript. Web workers etc allow for limited threading capability too. So I think if you use the right set of features Javascript allows for fast code. Most of the techniques from C++ or Java still apply. I picked Javascript because we don't have a choice (yet) for web applications, but LuaJIT has its own share of exotic capabilities.

Do folks here see rising language options that provide the flexibility of languages like Python and Ruby but still let those of us with mechanical sympathy make good use of the resources at hand?

I am beating a dead horse here, but I like Rust. It is less verbose than Java/C++ though maybe not quite as compact as Python. It also eliminates almost all undefined behavior, which IMO is one of the biggest gotchas for new comers to C++. The lifetime analysis logic is getting some upgrades soon which will make it more new comer friendly.

Thanks,

William

Dan Eloff

unread,

Jun 3, 2016, 12:53:23 PM6/3/16

to mechanica...@googlegroups.com

Yeah, it's similar in concept to Prevayler, but more of a database than a persistence framework, including indices. I'm very much thinking about what made rails successful. Keeping things as simple as possible, making it easy and natural to use, and making really good quality documentation. Do you mind if I contact you, William, off-list later in the year to get your feedback?

@Rajiv, Implementations matter, but so does the language - Mike Pall thought that he could never have achieved the results with Python that he did with LuaJIT, and if anyone knows about that, he would. So far, looking at PyPy's much larger and better financed efforts over more than a decade - he's spot on. That said I was very impressed with LuaJIT and got amazing performance with it in the past. That man (if he really is just one man) is an incredible engineer. The problem though with all dynamic languages is it's very hard to reason about the underlying "machine". With a native compiled language you just have to think about the OS and the physical hardware - which is insanely complex and difficult to reason about as is. With a slightly higher level language like Java or Go you also have to give thought to the runtime and the GC. With a dynamic language you have to think about all the optimizations the JIT can do and what can cause them to be enabled or disabled on particular functions or traces. That's not only an order of magnitude harder (and never documented) it's a constantly moving target. Then you find yourself writing very unnatural code to take advantage of those optimizations - things might have been simpler in a lower level language, depending on how much of that you find yourself doing.

Rajiv Kurian

unread,

Jun 3, 2016, 3:10:26 PM6/3/16

to mechanical-sympathy

Yup, some static languages make for an easier job for the optimizer. My point is that there is room for mechanical sympathy in dynamic languages too. We don't have an option when it comes to Javascript as of now so thinking about how to make it fast is a worthwhile investment even if it is in some ways tougher than thinking about C. Javascript especially has surprisingly accessible escape hatches. With SIMD access, asm.js optimizations (leads to AOT compilation in a few browsers), ArrayBuffers etc there is enough room to do some really cool things. You can get reasonably predictable assembly if you use the right things. At the very least Javascript alone is no longer a good excuse for lousy web performance.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Eric Wong

unread,

Jun 4, 2016, 1:27:36 AM6/4/16

to mechanica...@googlegroups.com

William Pietri <wil...@scissor.com> wrote:
> My background has some closer-to-the-metal work. Some Apple ][
> assembler as a kid; a bunch of C; lots of Java (including happily
> using Prevayler); tons of performance tuning. But most work these
> last several years has been in startup-land, where speed of
> iteration has been the most important criterion, so I've mainly been
> working in Ruby and Python.

I mainly work in Perl5 or Ruby (and C if required);
but will often use GNU make, shell + awk as necessary; too.

Working with the ruby-core team the past few years, we're
certainly aware of mechanical sympathy; but there's a lot of
trade offs involved with how far we can go given how dynamic
Ruby is.

I'm certain the core developers of other scripting
implementations are similarly knowledgeable about performance;
but there's a huge gap between those developers and most users
of the language :)

> 95% of the time, those languages are fine. But sometimes when I hit
> a performance issue, I'm at a bit of a loss. I know my technical
> options, of course, but there's a... philosophical issue between me
> and most people who work in those languages (which use global
> interpreter locks

> <https://en.wikipedia.org/wiki/Global_interpreter_lock>). When

> something is taking too long, my instinct is to say, "Hey, let's
> keep the data right where it is in RAM and use the box's 7 idle
> cores." Theirs is to look brightly at me and say, "Oh, we'll just
> run more processes and communicate over sockets. And maybe use more
> boxes. And memcache must work in here somehow, right?"

Yeah, it sucks. We haven't figured out how to remove the GVL
in Ruby without hurting single-threaded performance (which we're
already bad at! :<) or breaking compatibility (we aren't great
there, either).

matz is still thinking and experimenting with better concurrency
APIs, too. Personally, I find the declarative style of Makefiles
to be an excellent way to express concurrency.

> What happens in my head is an accounting of the serialization and
> deserialization costs, plus the RAM soaked up by many runtimes, plus
> a lot of unnecessary trips through the kernel, plus maybe crossing
> the network to other boxes, plus a lot of programming shenanigans to
> break apart and reunify data, plus the assorted ops issues. I try
> not to look pained or sigh audibly, at least until they're out of
> sight.

I certainly think of the data, first: reducing transfers/copies,
reducing round trips, etc. For those reasons, I could never
stand how bloated the web is with graphics/JS/CSS, either.

> Then I think about my grandmother's basement, which had
> shelves full of coffee cans and paper bags and egg cartons, saved
> because she had grown up in the Great Depression, and her value
> metrics were hopelessly out of date. I wonder if that's me now, and
> I sigh again.

I see top-posting and HTML mail on mailing lists wasting my
bandwidth + storage and sigh, too :)

> After one of these architectural conversations, I'm sure one of us
> is missing something important, but I'm not always sure who that is.
>
> So I guess I'm wondering three things:
>

> * How do others with mechanical sympathy in scripting-heavy

> environments deal with thinking about and collaborating around
> system design?

I consider data optimization heavily: ways to reduce I/O,
optimizing DB design, data structures, deduplication,
deltafication, cache/memoizability, compression etc.

Given my time is limited, I think my time is better spent
optimizing data before code; as important data tends to
outlive code (code is just plumbing for data).

But data optimizations apply to code, too: reducing allocations,
smaller data structures, etc...

> * My normal heuristic for deciding when to put on my mechanical

> sympathy goggles is "user-visible performance degradation". Before
> that, I mainly favor other things. This has pluses and minuses. What
> other approaches do people use?

There's generic optimizations to apply throughout the codebase
as long as it does not decrease readability.

Things like: limiting object lifetimes, favoring fast features
of the particular language, and avoiding the slow features.
streaming data to parsers instead of slurping.

I suppose some of this knowledge comes from studying
interpreter/VM internals and contributing patches to the VM,
etc.

But often it's golfing and writing terser code :)
Many programmers I've seen seem to believe they're
paid by the quantity code they produce :<

I'm also cheap and anti-consumerist, so I use older/slower
hardware which forces me notice problems sooner rather
than later.

> * Do folks here see rising language options that provide the

> flexibility of languages like Python and Ruby but still let those of
> us with mechanical sympathy make good use of the resources at hand?

I've actually been using Perl5 more in recent years and
appreciate the stability and longetivity of the language.

If needed, I can (v)fork off and use pipes/sockets or use
Inline::C (I find XS too ugly). But I'll still use
shell/make/awk/sed in some places, too. I like using what's
already bundled on typical GNU/Linux system so I don't have to
wait for new stuff to download or build.

William Pietri

unread,

Jun 4, 2016, 10:01:39 AM6/4/16

to mechanica...@googlegroups.com

Thanks, Eric, for the thoughtful note. I read it with great interest.
(And thanks to all responders in this thread; I really appreciate the
discussion.)

One small reply in case I gave the wrong impression:

On 06/03/2016 10:27 PM, Eric Wong wrote:
> [...]

> Working with the ruby-core team the past few years, we're certainly
> aware of mechanical sympathy; but there's a lot of trade offs involved
> with how far we can go given how dynamic Ruby is. I'm certain the core
> developers of other scripting implementations are similarly
> knowledgeable about performance; but there's a huge gap between those
> developers and most users of the language :)
>

> [...]

>
> Yeah, it sucks. We haven't figured out how to remove the GVL in Ruby
> without hurting single-threaded performance (which we're already bad
> at! :<) or breaking compatibility (we aren't great there, either).
> matz is still thinking and experimenting with better concurrency APIs,
> too. Personally, I find the declarative style of Makefiles to be an
> excellent way to express concurrency.

I should say here that my philosophical issue isn't with Ruby itself. I
totally appreciate how Ruby and Python ended up with GILs, and I have a
lot of respect for their development teams. And I love Matz's choice to
favor developer ease over machine ease. Whenever I have a quick problem
to solve, Ruby's what I turn to.

The struggle I mentioned is more with people who have never done
anything but scripting. Not knowing how to turn inward, closer to the
machine, they turn outward. I'm not opposed to that; distributed systems
are interesting and fun. But it pains me as a first choice.

William

Ben Evans

unread,

Jun 4, 2016, 11:32:08 AM6/4/16

to mechanica...@googlegroups.com

On Sat, Jun 4, 2016 at 3:01 PM, William Pietri <wil...@scissor.com> wrote:
>
> One small reply in case I gave the wrong impression:
>
> On 06/03/2016 10:27 PM, Eric Wong wrote:
>>
>> Yeah, it sucks. We haven't figured out how to remove the GVL in Ruby
>> without hurting single-threaded performance (which we're already bad at! :<)
>> or breaking compatibility (we aren't great there, either). matz is still
>> thinking and experimenting with better concurrency APIs, too. Personally, I
>> find the declarative style of Makefiles to be an excellent way to express
>> concurrency.
>
> I should say here that my philosophical issue isn't with Ruby itself. I
> totally appreciate how Ruby and Python ended up with GILs, and I have a lot
> of respect for their development teams.

This put me in mind of experiences with trying to work with threaded
Perls. In the end the community basically gave up on trying to support
threading, another testament to how hard it is to add full support for
it after the fact. The evolution of Java's support for threads has
been at time painful, but the conceptual and implementation bear traps
are always more obvious with the benefit of hindsight.

An interesting discussion & I've enjoyed reading it - thanks all.

Ben

Reply all

Reply to author

Forward