Gentle Introduction to Lockless Concurrency

Alex P

unread,

Jul 29, 2015, 7:50:35 AM7/29/15

to mechanical-sympathy

Hi everyone,

I've summarised a little Introduction to Lockless Concurrency in Java, covering basic ideas and intuition behind lock-free code.

I'm working on a bit advanced post on the same subject, covering implementation details of several simple lockless data structures (Circural Buffer, Concurrent Bit Set and Concurrent Linked Queue), which might be more interesting for anyone who's already familiar with the general idea.

It's a medium-size read, you can find it here: http://coffeenco.de/articles/lockless_concurrency.html

If you have any comments, corrections or suggestions, hit me up.

Looking forward for your feedback

Thanks!

Alex

Vitaly Davidovich

unread,

Jul 29, 2015, 8:19:28 AM7/29/15

to mechanica...@googlegroups.com

Hi Alex,

I skimmed the article, and noticed this bit:

The alert reader will notice that such operations are prone to so called ABA Problems, or a false-positive match, when the value between read and write operations is getting changed from A to B and then back to A. Although CPU designers have already solved this problem for us by adding a counter alongside with the value being swapped. Every operation will receive a value together with a counter, which both will be later used for when attempting a commit. If you need such guarantees on JVM, you have to use anAtomicStampedReference.

What exactly are you referring to by "Although CPU designers have already solved this problem for us by adding a counter alongside with the value being swapped..."?

Also, you may want to briefly mention that shared memory concurrency involving multiple writers, even if lockless, is unlikely to scale well as core counts go up. As a guiding principle, one should try to minimize cross core (write) communication.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jorge Branco

unread,

Jul 29, 2015, 8:21:46 AM7/29/15

to mechanica...@googlegroups.com

Thanks! I'll read it later this night.

--

Alex P

unread,

Jul 29, 2015, 9:22:14 AM7/29/15

to mechanical-sympathy, vit...@gmail.com

> What exactly are you referring to by "Although CPU designers have already solved this problem for us by adding a counter alongside with the value being swapped..."?

I've mostly referred to DCAS/CAS2 (double word compare and swap), which allows you to track the sequence number together with the pointer to the swapped structure. I didn't want to go to deep into the low-level details, and stick to the general concepts...

> Also, you may want to briefly mention that shared memory concurrency involving multiple writers, even if lockless, is unlikely to scale well as core counts go up. As a guiding principle, one should try to minimize cross core (write) communication.

True. I've mentioned it briefly, maybe I should elaborate a bit more on that one.

Thanks for feedback!

Vitaly Davidovich

unread,

Jul 29, 2015, 9:37:25 AM7/29/15

to Alex P, mechanical-sympathy

I've mostly referred to DCAS/CAS2 (double word compare and swap), which allows you to track the sequence number together with the pointer to the swapped structure. I didn't want to go to deep into the low-level details, and stick to the general concepts..

So I think you mean DWCAS (double-width CAS, and not double CAS). AFAIK, Hotspot does not use DWCAS for AtomicStampedReference -- has that changed? DCAS isn't implemented by any mainstream CPU that I know of.

Being a gentle introduction you probably shouldn't delve too deep, you're right, but it may be nice to mention the (searchable) concept so someone reading this can have a starting point for more elaboration. It's also good for other readers to ensure they understand what you're referring to; I personally thought this was some reference to either DWCAS or HTM, but then I was also somewhat confused when you said CPU designers solved this by adding a counter. They didn't add a counter themselves, they added ability for code to make use of double-width memory as it sees fit, and one use is adding a generation counter. Maybe this is pedantic, so apologies ...

True. I've mentioned it briefly, maybe I should elaborate a bit more on that one.

Ok, I may have missed it.

Martin Thompson

unread,

Jul 30, 2015, 3:19:01 AM7/30/15

to mechanical-sympathy, oleksand...@gmail.com, oleksand...@gmail.com

I'd argue that "state of the art" in scalable design is to have no contention. It does not matter if you manage contention with locks or CAS techniques. Once you have contention then Universal Scalability Law kicks in as you have to face the contention and coherence penalty that contented access to shared state/resources brings. Multiple writers to shared state is a major limitation to the scalability of any design. Persistent data structures make this problem worse and not better due to path-copy semantics that is amplified by the richness of the domain model.

A special case of shared state is when you have only one writer and many readers. This is a much simpler design to scale and understand. Here persistent data structures can sometimes be of great benefit, but not always as allocation is a major limiting factor on most JVMs.

Alex P

unread,

Jul 30, 2015, 4:08:01 AM7/30/15

to mechanical-sympathy, mjp...@gmail.com

I guess this is more of a wording problem, since you have stated it out yourself "multiple writers to shared state is a major limitation to the scalability of any design".

As regards persistent / immutable data structures, I should have mentioned something along these lines: there's a cost introduced by structural sharing, that's for sure.

Roman Leventov

unread,

Aug 25, 2015, 8:05:08 PM8/25/15

to mechanica...@googlegroups.com

I notice that a major factor limiting scalability, even in mostly-read workloads, is coherence on lock words (CLs) themselves. So I except HTM/optimistic (no-write) locking should help a lot, though hasn't experimented with this yet.

--

pron

unread,

Aug 30, 2015, 10:40:36 AM8/30/15

to mechanical-sympathy, oleksand...@gmail.com

On Thursday, July 30, 2015 at 10:19:01 AM UTC+3, Martin Thompson wrote:

I'd argue that "state of the art" in scalable design is to have no contention.

... and I'd argue that state-of-the-art, or "perfect" scalability can be achieved if contention is kept to O(1) in the size of the data, amortized across operations. That significantly widens the scope of scalable data structures possible, as many balanced trees (e.g. B+-trees) can be constructed to have this property.

Otherwise, state-of-the-art would only apply to "embarrassingly concurrent" domains...

Vitaly Davidovich

unread,

Sep 1, 2015, 10:08:43 AM9/1/15

to mechanical-sympathy

I think the scaling talked about here is cpu count growth, not data set size.

sent from my phone

--

pron

unread,

Sep 1, 2015, 10:47:25 AM9/1/15

to mechanical-sympathy

Right, but there's a subtle interaction between the number of cores and size of the dataset which makes what I said true about perfect scalability (when talking about online, concurrent systems).

Suppose you have a data structure (and concurrent balanced trees have this property), where keeping data size the same but increasing the number of cores (i.e. distributing the same number of items across more cores) would increase contention -- so it would seem like you have negative scalability -- however, keeping the number of cores the same and increasing the number of data items reduces contention. Then such a data structure would have the property that if you keep the number of cores linearly related to the size of the data set in a given application, you'd get perfect scalability, as you'd be able to increase your workload by adding cores while maintaining the same throughput per core and amortized latency. When talking about concurrent online workloads, this is the kind of scalability you want (as opposed to data-parallel workloads, where perfect scalability means linearly reducing processing time for a given data set by adding more cores).

Ron

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Vitaly Davidovich

unread,

Sep 1, 2015, 11:52:58 AM9/1/15

to mechanical-sympathy

Suppose you have a data structure (and concurrent balanced trees have this property), where keeping data size the same but increasing the number of cores (i.e. distributing the same number of items across more cores) would increase contention -- so it would seem like you have negative scalability -- however, keeping the number of cores the same and increasing the number of data items reduces contention

It *is* negative with respect to CPU scalability. It may have positive scalability with data set size, but that's a separate topic. Oftentimes you have fixed (or near fixed) working set, and want to throw more CPUs at the problem. The working set may be fixed because there's already partitioning of load across machines (for a variety of reasons).

What you're basically describing is bulk/batch operations where you can amortize the expensive operation(s) by doing more work in between the expensive operations. No argument that batch/bulk design is good, but I'm not sure I'd consider it a scalability aspect (at least in what I understood this thread to be about).

Ron

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

pron

unread,

Sep 1, 2015, 12:22:03 PM9/1/15

to mechanical-sympathy

On Tuesday, September 1, 2015 at 6:52:58 PM UTC+3, Vitaly Davidovich wrote:

It *is* negative with respect to CPU scalability. It may have positive scalability with data set size, but that's a separate topic. Oftentimes you have fixed (or near fixed) working set, and want to throw more CPUs at the problem. The working set may be fixed because there's already partitioning of load across machines (for a variety of reasons).

What you're basically describing is bulk/batch operations where you can amortize the expensive operation(s) by doing more work in between the expensive operations. No argument that batch/bulk design is good, but I'm not sure I'd consider it a scalability aspect (at least in what I understood this thread to be about).

I think it's the opposite. For batch applications you want the computation to run faster by adding more cores. For online, concurrent applications you usually have some distribution of latency you'd like to maintain, and perfect scalability means not reducing the latency for the same workload, but maintaining the latency for a growing workload by increasing the number of cores linearly with the data set. You'll note that Martin's original definition -- namely no contention -- you get the same kind of scalability. Adding CPUs doesn't decrease latency but preserves it with a growing workload. My definition covers more designs and has the exact same property: namely linearly grow your workload/concurrency while preserving latency.

Perhaps you're referring to my mention of "amortized latency". I did not mean increasing maximum latency, but rather preserving the same required latency distribution. It is true, however, that if your required latency distribution is flat and very, very low, then your choice of architectures is greatly reduced. But that would limit your choice of OSs (or other low-level infrastructure) as well, because even most OSs amortize their latency (and, obviously, most GCs).

Ron

On Tue, Sep 1, 2015 at 10:47 AM, pron <ron.pr...@gmail.com> wrote:

Right, but there's a subtle interaction between the number of cores and size of the dataset which makes what I said true about perfect scalability (when talking about online, concurrent systems).

Suppose you have a data structure (and concurrent balanced trees have this property), where keeping data size the same but increasing the number of cores (i.e. distributing the same number of items across more cores) would increase contention -- so it would seem like you have negative scalability -- however, keeping the number of cores the same and increasing the number of data items reduces contention. Then such a data structure would have the property that if you keep the number of cores linearly related to the size of the data set in a given application, you'd get perfect scalability, as you'd be able to increase your workload by adding cores while maintaining the same throughput per core and amortized latency. When talking about concurrent online workloads, this is the kind of scalability you want (as opposed to data-parallel workloads, where perfect scalability means linearly reducing processing time for a given data set by adding more cores).

Ron

On Tuesday, September 1, 2015 at 5:08:43 PM UTC+3, Vitaly Davidovich wrote:

I think the scaling talked about here is cpu count growth, not data set size.

sent from my phone

On Aug 30, 2015 10:40 AM, "pron" <ron.pr...@gmail.com> wrote:

On Thursday, July 30, 2015 at 10:19:01 AM UTC+3, Martin Thompson wrote:
I'd argue that "state of the art" in scalable design is to have no contention.

... and I'd argue that state-of-the-art, or "perfect" scalability can be achieved if contention is kept to O(1) in the size of the data, amortized across operations. That significantly widens the scope of scalable data structures possible, as many balanced trees (e.g. B+-trees) can be constructed to have this property.

Otherwise, state-of-the-art would only apply to "embarrassingly concurrent" domains...

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Vitaly Davidovich

unread,

Sep 1, 2015, 12:44:23 PM9/1/15

to mechanical-sympathy

I think it's the opposite. For batch applications you want the computation to run faster by adding more cores. For online, concurrent applications you usually have some distribution of latency you'd like to maintain, and perfect scalability means not reducing the latency for the same workload, but maintaining the latency for a growing workload by increasing the number of cores linearly with the data set. You'll note that Martin's original definition -- namely no contention -- you get the same kind of scalability. Adding CPUs doesn't decrease latency but preserves it with a growing workload. My definition covers more designs and has the exact same property: namely linearly grow your workload/concurrency while preserving latency.

The idea is to increase throughput by recruiting more CPUs when data set size stays the same (we can talk about increasing data set sizes, but that's a related although different discussion); this can reduce overall latency though. If you recruit more CPUs but the design has heavy contention, then you'll get sublinear throughput scaling and at some extreme, it could go negative (i.e. the contention/communication overhead starts dominating the useful work).

To give a simple but concrete example, say you're ingesting quotes for X stocks. If you run the entire processing chain on 1 CPU and that processing cannot keep up with the ingress rate, you'll end up queuing and having to apply some backpressure somewhere. If you shard these symbols across multiple CPUs, then you can of course run the processing in parallel. Time it takes to process each tick stays the same (assuming perfect parallelism), but latency will go down. The "no contention" aspect comes into play because it's likely some data sharing/communication needs to occur between the CPUs. The less you share/communicate between them, the better the system will scale. This is true even if you didn't care about thread safety and simply had multiple CPUs writing to same cachelines with no atomics/locks/etc.

Ron

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

pron

unread,

Sep 1, 2015, 1:02:38 PM9/1/15

to mechanical-sympathy

On Tuesday, September 1, 2015 at 7:44:23 PM UTC+3, Vitaly Davidovich wrote:

To give a simple but concrete example, say you're ingesting quotes for X stocks. If you run the entire processing chain on 1 CPU and that processing cannot keep up with the ingress rate, you'll end up queuing and having to apply some backpressure somewhere. If you shard these symbols across multiple CPUs, then you can of course run the processing in parallel. Time it takes to process each tick stays the same (assuming perfect parallelism), but latency will go down. The "no contention" aspect comes into play because it's likely some data sharing/communication needs to occur between the CPUs. The less you share/communicate between them, the better the system will scale. This is true even if you didn't care about thread safety and simply had multiple CPUs writing to same cachelines with no atomics/locks/etc.

Right. The same is true if you keep contention to a constant relative to X (or however you define your working-set size). That constant doesn't have to be zero.

The reason this is important has to do with how you do the load balancing. Techniques that do key-based sharding can get zero contention, but they limit the domain interaction between your data items (I don't know if it matters for stocks, but it certainly matters for, say, online games). However, if you can do the load balancing in a way that doesn't enforce sharding but guarantees constant (non-zero) contention -- and doing that is possible (with a caveat below) -- you can get perfect linear scalability without placing (possibly onerous) constraints on your domain modeling or code. The latency will only be higher by a constant.

The caveat for the data structures (that I'm familiar with) with that property is an increase in maximum latency but with guaranteed commensurate decrease in high-latency events' frequency. So your latency distribution remains the same (i.e. 99th, 99.99th percentiles etc. are the same), but the maximum increases. If you can't ever, ever tolerate higher latency -- no matter how rare -- then you need a different data structure for perfect scalability. So the data structures I'm talking about give you perfect scalability w.r.t the required latency distribution, but they may increase the (increasingly rare) worst case.

Ron

On Tue, Sep 1, 2015 at 12:22 PM, pron <ron.pr...@gmail.com> wrote:

On Tuesday, September 1, 2015 at 6:52:58 PM UTC+3, Vitaly Davidovich wrote:

It *is* negative with respect to CPU scalability. It may have positive scalability with data set size, but that's a separate topic. Oftentimes you have fixed (or near fixed) working set, and want to throw more CPUs at the problem. The working set may be fixed because there's already partitioning of load across machines (for a variety of reasons).

What you're basically describing is bulk/batch operations where you can amortize the expensive operation(s) by doing more work in between the expensive operations. No argument that batch/bulk design is good, but I'm not sure I'd consider it a scalability aspect (at least in what I understood this thread to be about).

I think it's the opposite. For batch applications you want the computation to run faster by adding more cores. For online, concurrent applications you usually have some distribution of latency you'd like to maintain, and perfect scalability means not reducing the latency for the same workload, but maintaining the latency for a growing workload by increasing the number of cores linearly with the data set. You'll note that Martin's original definition -- namely no contention -- you get the same kind of scalability. Adding CPUs doesn't decrease latency but preserves it with a growing workload. My definition covers more designs and has the exact same property: namely linearly grow your workload/concurrency while preserving latency.

Perhaps you're referring to my mention of "amortized latency". I did not mean increasing maximum latency, but rather preserving the same required latency distribution. It is true, however, that if your required latency distribution is flat and very, very low, then your choice of architectures is greatly reduced. But that would limit your choice of OSs (or other low-level infrastructure) as well, because even most OSs amortize their latency (and, obviously, most GCs).

Ron

On Tue, Sep 1, 2015 at 10:47 AM, pron <ron.pr...@gmail.com> wrote:

Right, but there's a subtle interaction between the number of cores and size of the dataset which makes what I said true about perfect scalability (when talking about online, concurrent systems).

Suppose you have a data structure (and concurrent balanced trees have this property), where keeping data size the same but increasing the number of cores (i.e. distributing the same number of items across more cores) would increase contention -- so it would seem like you have negative scalability -- however, keeping the number of cores the same and increasing the number of data items reduces contention. Then such a data structure would have the property that if you keep the number of cores linearly related to the size of the data set in a given application, you'd get perfect scalability, as you'd be able to increase your workload by adding cores while maintaining the same throughput per core and amortized latency. When talking about concurrent online workloads, this is the kind of scalability you want (as opposed to data-parallel workloads, where perfect scalability means linearly reducing processing time for a given data set by adding more cores).

Ron

On Tuesday, September 1, 2015 at 5:08:43 PM UTC+3, Vitaly Davidovich wrote:

I think the scaling talked about here is cpu count growth, not data set size.

sent from my phone

On Aug 30, 2015 10:40 AM, "pron" <ron.pr...@gmail.com> wrote:

On Thursday, July 30, 2015 at 10:19:01 AM UTC+3, Martin Thompson wrote:
I'd argue that "state of the art" in scalable design is to have no contention.

... and I'd argue that state-of-the-art, or "perfect" scalability can be achieved if contention is kept to O(1) in the size of the data, amortized across operations. That significantly widens the scope of scalable data structures possible, as many balanced trees (e.g. B+-trees) can be constructed to have this property.

Otherwise, state-of-the-art would only apply to "embarrassingly concurrent" domains...

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Vitaly Davidovich

unread,

Sep 1, 2015, 8:48:35 PM9/1/15

to mechanical-sympathy

What limits the interaction for games or makes domain modeling "onerous"? How you load balance across cpus is a domain specific function, orthogonal to scalability (in terms of cpu). The "no contention"/shared-nothing aspect requires you have a single mutator (and thus "owner") of some state with communication channels for many readers. This isn't really a datastructure issue (although they play a role, obviously), but system design.

Can you give a concrete example of what you're referring to?

sent from my phone

Ron

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

pron

unread,

Sep 2, 2015, 5:01:23 AM9/2/15

to mechanical-sympathy

On Wednesday, September 2, 2015 at 3:48:35 AM UTC+3, Vitaly Davidovich wrote:

What limits the interaction for games or makes domain modeling "onerous"? How you load balance across cpus is a domain specific function, orthogonal to scalability (in terms of cpu). The "no contention"/shared-nothing aspect requires you have a single mutator (and thus "owner") of some state with communication channels for many readers. This isn't really a datastructure issue (although they play a role, obviously), but system design.

Can you give a concrete example of what you're referring to?

sent from my phone

Certainly. Suppose we have a stream of messages coming in, each updates a single domain entity (e.g. a stock), with no interaction whatsoever with other entities. The problem, in this first example, is not with the domain but with the load-balancing strategy. We can shard by, say, entity ID, but we don't know the distribution of entities in advance, nor how much processing each entity will require (e.g. how many messages each entity receives), so a static sharding scheme will result in sub-optimal distribution among compute nodes (I call them compute nodes rather than cores, because all this applies in a distributed setting as well). So, instead, we look for a dynamic load-balancing scheme, but the data structure providing the dynamic load balancing is no longer single-writer and will introduce some contention. To make it more concrete, let's say we still balance by entity ID, but instead of a simple hashing scheme (zero concurrency), we use a B+-tree to store the IDs, and balance the compute nodes in such a way that each is responsible for handling all entities whose ID is stored in a subtree of the root. That B+-tree -- while no longer guaranteeing zero write contention, can be made to guarantee a constant level of contention regardless of how many entities we have. This improves our latency distribution on the whole, because "busy" entities will be better balanced, and have a less adverse effect on other entities' latency.

Another, similar example, is an MMO. You'd want to load-balance your compute nodes based on some spatial partitioning, but you don't know in advance how the players will be distributed in your game map, so you'd want a dynamic load-balancing scheme, which introduces (constant) contention.

An example where the domain itself requires contention is, again, an MMO, where a message sent by player A tells you it's just shot at player B. Player B will need to be updated somehow -- either with message passing or direct manipulation, but in either case you have contention.

You can see how the single-writer principle affected the design of VoltDB, and not for the better, IMO. VoltDB shards the in-memory database using some static strategy (say, by key), and only a single thread is allowed to write to a specific row. However, if you have a transaction that requires updating more than one row, VoltDB must lock and stop all writer threads responsible for the entities participating in the transaction. Instead, your data structure can forgo the single-writer principle and opt for constant contention. Everything will scale just as well as with zero contention (single writer), but you will no longer need to trade off domain constraints (no transactions) for zero contention.

So, single-writer is certainly very good, but it's not state-of-the-art because it precludes dynamic balancing and might even limit the domain unnecessarily. With constant contention you can achieve a more stable system that is less prone to bad load/load-balancer interaction (I believe that in CS parlance we say that the constant-contention online algorithm is more competitive against adversaries).

Ron

Ron

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Vitaly Davidovich

unread,

Sep 2, 2015, 7:21:02 AM9/2/15

to mechanical-sympathy

With stocks, you usually know upfront which symbols you'll receive, or you can design your system to know this a priori. You don't need dynamic load balancing - you can design your system to have a cost function based on prior days to deduce expected number of messages. Even if you do dynamic, you can avoid contention by having single cpu responsible for figuring out which worker a stock should get assigned to. Once it decides based on whatever dynamic strategy, it communicates that to the worker and the worker handles the tick. Think of it like, e.g., nginx having a single accept thread that then hands off the file descriptor for the accepted connection to a worker thread, and from that point forward the worker handles all the comms on that fd.

Player A shoots player B can be done in a similar manner by having cpus handling subset of players, and communicating with each other contention free.

I don't quite understand the constant contention point - certainly if you have more cpus participating in the writing (definition of contention), that will not scale well. Increasing the working set to then say you've kept contention constant because your data set size is larger isn't really the same thing. Also, the contention won't be constant if you have cpus contending and increase cpu count, by definition. Even if you increase the data set size and cpu count, it's unlikely that all the data is actually in use all the time.

There are other benefits to single writer, such as easier reasoning/debuggability/determinism/etc. But as I mentioned earlier, the trick is to design your system such that it can do that, and so it's not really a datastructure issue.

sent from my phone

Ron

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

pron

unread,

Sep 2, 2015, 7:48:46 AM9/2/15

to mechanical-sympathy

Obviously, if you don't need dynamic balancing then you don't need it. If you do (or if you don't want to carefully design the system for a static strategy), though, you can still have perfect scalability. Static load balancing in MMOs, for example, usually requires a specific game design or various community-management measures that are not necessary if you use good dynamic load balancing.

Message-passing is contention, or, more accurately, it's what we try to avoid when we try to avoid contention. We care about contention because it triggers slow communication (when we analyze concurrent data structures for contention, we count the number of messages required to establish cache-coherence). If you communicate explicitly, your performance isn't any better -- it just uses a different abstraction. For example, in my own simulations, I sometimes use a message-passing abstraction and sometimes shared-memory abstraction. The latter is more convenient when you need transactions.

Obviously, there are many benefits to single-writer; being the only way to achieve perfect scalability is not one of them. It's certainly the simplest, if it works well enough. When it doesn't, algorithms that cause constant contention can still achieve the same scalability with better adversary-competitiveness, and sometimes with less restrictions on the domain model and more possibilities (like multi-entity transactions).

Ron

Ron

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Vitaly Davidovich

unread,

Sep 2, 2015, 8:23:16 AM9/2/15

to mechanical-sympathy

Obviously, if you don't need dynamic balancing then you don't need it. If you do (or if you don't want to carefully design the system for a static strategy), though, you can still have perfect scalability. Static load balancing in MMOs, for example, usually requires a specific game design or various community-management measures that are not necessary if you use good dynamic load balancing.

It's not always obvious that you can have good static load balancing, that's part of my point about thinking through the design. I'm also not buying "perfect scalability" by adding contending CPUs but also increasing data set size. As I mentioned, the increased data set size won't necessarily mean an increase in working set size but the extra CPUs will contend no matter what.

Message-passing is contention, or, more accurately, it's what we try to avoid when we try to avoid contention. We care about contention because it triggers slow communication (when we analyze concurrent data structures for contention, we count the number of messages required to establish cache-coherence). If you communicate explicitly, your performance isn't any better -- it just uses a different abstraction. For example, in my own simulations, I sometimes use a message-passing abstraction and sometimes shared-memory abstraction. The latter is more convenient when you need transactions.

I'm not sure what your definition of contention is. As I mentioned earlier, it's ideal to minimize cross core communication to reduce cache traffic. However, occasionally sharing a cacheline modified by one CPU with 1+ other reading CPUs is fine, particularly if the CPUs are on the same socket and share LLC. With concurrent and lock-free datastructures, they always assume they can be updated from multiple CPUs, and at best use atomic instructions (and at worst granular locks or CAS loops); with single writer/multiple readers you can do this with no atomic instructions (on some archs like x86/64).

Obviously, there are many benefits to single-writer; being the only way to achieve perfect scalability is not one of them. It's certainly the simplest, if it works well enough. When it doesn't, algorithms that cause constant contention can still achieve the same scalability with better adversary-competitiveness, and sometimes with less restrictions on the domain model and more possibilities (like multi-entity transactions).

It's actually far from simple to design single-writer for anything other than toy examples -- that's again part of my point. It's easier to reach for a concurrent data structure. And again, your definition of constant contention/perfect scalability by virtue of increasing data set size doesn't jive.

Ron

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

pron

unread,

Sep 2, 2015, 9:49:05 AM9/2/15

to mechanical-sympathy

On Wednesday, September 2, 2015 at 3:23:16 PM UTC+3, Vitaly Davidovich wrote:

It's not always obvious that you can have good static load balancing, that's part of my point about thinking through the design. I'm also not buying "perfect scalability" by adding contending CPUs but also increasing data set size. As I mentioned, the increased data set size won't necessarily mean an increase in working set size but the extra CPUs will contend no matter what.

They won't. Here's the math (it talks about a distributed cache-coherent grid, but the result is the same). The amortized-worst-case number of cache coherence messages turns out to be kM/n (for a concurrent B-tree) where k is some constant of the data structure, M is the number of compute nodes and n is the size of the data set (and if M/n turn out to be a constant -- more or less -- and they usually are for most systems, you get constant contention).

With concurrent and lock-free datastructures, they always assume they can be updated from multiple CPUs, and at best use atomic instructions (and at worst granular locks or CAS loops); with single writer/multiple readers you can do this with no atomic instructions (on some archs like x86/64).

Yes. But you suggested a message-passing solution to the MMO player-shoots-player problem, which requires multiple writers (i.e. multiple shooters can send a message to the shootee's queue). If the domain allows any incoming event to modify more than a single entity, you just can't ensure those entities are all in the same shard (although a dynamic load-balancing strategy might make it true often enough).

It's actually far from simple to design single-writer for anything other than toy examples -- that's again part of my point. It's easier to reach for a concurrent data structure. And again, your definition of constant contention/perfect scalability by virtue of increasing data set size doesn't jive.

Yeah, I think we're talking past each other, but in any case, there is no known scalable single-writer solution to a general dataset problem (think in-memory database) that allows multiple-entity transactions.

Ron

Ron

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Vitaly Davidovich

unread,

Sep 2, 2015, 10:57:42 AM9/2/15

to mechanical-sympathy

They won't. Here's the math (it talks about a distributed cache-coherent grid, but the result is the same). The amortized-worst-case number of cache coherence messages turns out to be kM/n (for a concurrent B-tree) where k is some constant of the data structure, M is the number of compute nodes and n is the size of the data set (and if M/n turn out to be a constant -- more or less -- and they usually are for most systems, you get constant contention).

How does k scale with an increase in CPUs hammering the datastructure? Also, I'm not sure what cache coherence messages you're talking about, but I'm assuming this is modeling your distributed system as if it was multi-CPU system. Otherwise, the number of CPU cache coherence messages would depend on number of cachelines involved, and not simply kM/n. Perhaps this is part of the disconnect here -- the contention this thread has been talking about is within a single multi-CPU system taking microarch details into account (e.g. cheapest way to communicate between a single writer and multiple reader CPUs).

Yes. But you suggested a message-passing solution to the MMO player-shoots-player problem, which requires multiple writers (i.e. multiple shooters can send a message to the shootee's queue). If the domain allows any incoming event to modify more than a single entity, you just can't ensure those entities are all in the same shard (although a dynamic load-balancing strategy might make it true often enough).

Multiple shooters sending to same queue is back to shared memory, which is not shared-nothing design. You can, e.g., have separate queues between the CPUs in the system dealing with players. And the cross-cpu message would only need to happen if the players are handled by different workers. And yes, one would hope/try to minimize that by intelligent load balancing.

Ron

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

pron

unread,

Sep 2, 2015, 11:39:38 AM9/2/15

to mechanical-sympathy

On Wednesday, September 2, 2015 at 5:57:42 PM UTC+3, Vitaly Davidovich wrote:

How does k scale with an increase in CPUs hammering the datastructure? Also, I'm not sure what cache coherence messages you're talking about, but I'm assuming this is modeling your distributed system as if it was multi-CPU system. Otherwise, the number of CPU cache coherence messages would depend on number of cachelines involved, and not simply kM/n. Perhaps this is part of the disconnect here -- the contention this thread has been talking about is within a single multi-CPU system taking microarch details into account (e.g. cheapest way to communicate between a single writer and multiple reader CPUs).

That doesn't matter -- the result is the same. k is a constant that depends on the structure of the balanced tree, in particular its fanout. As the fanout directly determines the size, in cache lines, of each tree-node update message (it's a cache-aware data structure), the number of CPU bus messages is some constant multiple of the number of update messages (in fact, the fanout can be -- and often is -- chosen so that each tree node occupies exactly one cache line) so the math is the same.

There is a difference between such a distributed system and the intra-CPU case -- it's not contention but the total number of cache misses (on either reads or writes). In the distributed case, the "L1 cache" is very large (the RAM of the entire machine, or of the NUMA socket etc.), it's what's known as a cache-only memory architecture (COMA) design. The concurrent balanced tree can also minimize the total number of "cache misses" in this case, if reads (or writes) have some spatial locality (where spatial is defined by the tree -- e.g., for a B+-tree, it means key ranges, and spatial locality means that most queries are single-range queries; spatial could also mean graph span, if the data structure is designed for graphs). You can achieve the same with sharding if you shard by range, but again, then you don't get dynamic load balancing if you need it (adversary competitiveness, which, BTW, has brought down many MMOs that relied on sharding). In the intra-CPU case, the L1 cache is small relative to the data set each core ends up serving, so this doesn't matter.

Ron

Ron

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Vitaly Davidovich

unread,

Sep 2, 2015, 11:57:00 AM9/2/15

to mechanical-sympathy

I don't think you answered my question about how k reacts to increased number of CPUs modifying the data structure. Specifically, as the number of concurrent writers and readers increases, what are the progress, latency, and throughput properties? Assume all readers and writers are working with the same leaf page.

sent from my phone

Ron

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

pron

unread,

Sep 2, 2015, 12:11:41 PM9/2/15

to mechanical-sympathy

k is constant. It depends on the fanout, which, in turn, depends on the cache-line size. The number of mutating (or reading) CPUs doesn't affect k at all -- it is equal to M. Given the same data set, adding CPUs would increase contention and hurt latency linearly. But the assumption is that you can achieve your required latency with some small dataset, and now want to increase it. Adding CPUs as a constant proportion to the dataset size would keep the latency profile (distribution) exactly the same, but, as you can see in the post, the worst-case will increase (but become rarer, which is why the distribution stays the same).

Note, however, that what "data" and "mutation" mean depends on the domain. If the data is location of player characters, then each message coming from each player is a mutation. If, OTOH, your data is some set of stocks and you use the tree for dynamic load balancing, then messages updating some internal value of the stock are not mutators as they don't change the tree structure. A mutation would be adding a new stock. Of course, it doesn't make much sense to use such a data structure in this case because all it does is distribute the number of stocks evenly among threads without exploiting any sort of locality -- a simple round-robin assignment would achieve the same. If, however, it's better somehow for a group of stocks to be handled by the same thread (based on some property), then this could be a good dynamic load-balancing solution.

Ron

Ron

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Vitaly Davidovich

unread,

Sep 2, 2015, 12:24:31 PM9/2/15

to mechanical-sympathy

Given the same data set, adding CPUs would increase contention and hurt latency linearly

This is what I'm trying to figure out -- how did you determine it's linear?

Note, however, that what "data" and "mutation" mean depends on the domain

By mutation I mean structural changes to the tree. And this gets to the point of contention that I'm talking about. With single writer, there's no contention on mutation because, by definition, it's the sole owner and always makes forward progress with no algorithmic delays. With a shared structure that always has to support the possibility of multiple mutators, there will be a weakening of forward progress (the system will make progress, but the pace will slow down). If the structure is using CAS loops or something akin to that, then increasing number of concurrent updaters will make all but one of them fail and have to retry; at some CPU count, this may end up choking the system with coherence traffic and a lock would actually achieve better progress properties. However, if you use a lock, now you're involving the kernel in the arbitration. And, this all now becomes somewhat unpredictable as kernel scheduling comes into play.

Ron

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Ron Pressler

unread,

Sep 2, 2015, 12:37:07 PM9/2/15

to mechanica...@googlegroups.com, mechanical-sympathy

On Sep 2, 2015, at 7:24 PM, Vitaly Davidovich <vit...@gmail.com> wrote:

This is what I'm trying to figure out -- how did you determine it's linear?

Because that’s the formula: amortized worst-case latency = kM/n

The full derivation is in the blog post I linked to.

By mutation I mean structural changes to the tree. And this gets to the point of contention that I'm talking about. With single writer, there's no contention on mutation because, by definition, it's the sole owner and always makes forward progress with no algorithmic delays. With a shared structure that always has to support the possibility of multiple mutators, there will be a weakening of forward progress (the system will make progress, but the pace will slow down)…

Coherence traffic is equal to kM/n

You’re trying to come up with a mental model, but you need to look at the algorithm itself, because it takes care of your concerns — it most certainly doesn’t work like a queue. The idea is that a balanced tree will cause heavier node-write contention the higher the node is in the tree, but because of the tree’s design, the chance of a mutation is lower the higher the node is, and the two cancel each other out perfectly (note that by “chance” I don’t mean probabilistically, but amortized). The data structure ensures, then, that most tree mutations, then, involve a single core, fewer might involve two cores, and so on, and the whole thing cancels out.

Ron

On Wed, Sep 2, 2015 at 12:11 PM, pron <ron.pr...@gmail.com>wrote:

k is constant. It depends on the fanout, which, in turn, depends on the cache-line size. The number of mutating (or reading) CPUs doesn't affect k at all -- it is equal to M. Given the same data set, adding CPUs would increase contention and hurt latency linearly. But the assumption is that you can achieve your required latency with some small dataset, and now want to increase it. Adding CPUs as a constant proportion to the dataset size would keep the latency profile (distribution) exactly the same, but, as you can see in the post, the worst-case will increase (but become rarer, which is why the distribution stays the same).

Note, however, that what "data" and "mutation" mean depends on the domain. If the data is location of player characters, then each message coming from each player is a mutation. If, OTOH, your data is some set of stocks and you use the tree for dynamic load balancing, then messages updating some internal value of the stock are not mutators as they don't change the tree structure. A mutation would be adding a new stock. Of course, it doesn't make much sense to use such a data structure in this case because all it does is distribute the number of stocks evenly among threads without exploiting any sort of locality -- a simple round-robin assignment would achieve the same. If, however, it's better somehow for a group of stocks to be handled by the same thread (based on some property), then this could be a good dynamic load-balancing solution.

Ron

On Wednesday, September 2, 2015 at 6:57:00 PM UTC+3, Vitaly Davidovich wrote:

I don't think you answered my question about how k reacts to increased number of CPUs modifying the data structure. Specifically, as the number of concurrent writers and readers increases, what are the progress, latency, and throughput properties? Assume all readers and writers are working with the same leaf page.

sent from my phone

On Sep 2, 2015 11:39 AM, "pron" <ron.pr...@gmail.com> wrote:

On Wednesday, September 2, 2015 at 5:57:42 PM UTC+3, Vitaly Davidovich wrote:

How does k scale with an increase in CPUs hammering the datastructure? Also, I'm not sure what cache coherence messages you're talking about, but I'm assuming this is modeling your distributed system as if it was multi-CPU system. Otherwise, the number of CPU cache coherence messages would depend on number of cachelines involved, and not simply kM/n. Perhaps this is part of the disconnect here -- the contention this thread has been talking about is within a single multi-CPU system taking microarch details into account (e.g. cheapest way to communicate between a single writer and multiple reader CPUs).

That doesn't matter -- the result is the same. k is a constant that depends on the structure of the balanced tree, in particular its fanout. As the fanout directly determines the size, in cache lines, of each tree-node update message (it's a cache-aware data structure), the number of CPU bus messages is some constant multiple of the number of update messages (in fact, the fanout can be -- and often is -- chosen so that each tree node occupies exactly one cache line) so the math is the same.

There isa difference between such a distributed system and the intra-CPU case -- it's not contention but the total number of cache misses (on either reads or writes). In the distributed case, the "L1 cache" is very large (the RAM of the entire machine, or of the NUMA socket etc.), it's what's known as a cache-only memory architecture(COMA) design. The concurrent balanced tree can also minimize the total number of "cache misses" in this case, if reads (or writes) have some spatial locality (where spatial is defined by the tree -- e.g., for a B+-tree, it means key ranges, and spatial locality means that most queries are single-range queries; spatial could also mean graph span, if the data structure is designed for graphs). You can achieve the same with sharding if you shard by range, but again, then you don't get dynamic load balancing if you need it (adversary competitiveness, which, BTW, has brought down many MMOs that relied on sharding). In the intra-CPU case, the L1 cache is small relative to the data set each core ends up serving, so this doesn't matter.

Ron

The caveat for the data structures (that I'm familiar with) with that property is an increase in maximum latency but with guaranteed commensurate decrease in high-latency events' frequency. So your latency distributionremains the same (i.e. 99th, 99.99th percentiles etc. are the same), but the maximumincreases. If you can't ever, ever tolerate higher latency -- no matter how rare -- then you need a different data structure for perfect scalability. So the data structures I'm talking about give you perfect scalability w.r.t the required latency distribution, but they may increase the (increasingly rare) worst case.

Ron

On Tue, Sep 1, 2015 at 12:22 PM, pron <ron.pr...@gmail.com>wrote:

On Tuesday, September 1, 2015 at 6:52:58 PM UTC+3, Vitaly Davidovich wrote:

It *is* negative with respect to CPU scalability. It may have positive scalability with data set size, but that's a separate topic. Oftentimes you have fixed (or near fixed) working set, and want to throw more CPUs at the problem. The working set may be fixed because there's already partitioning of load across machines (for a variety of reasons).

What you're basically describing is bulk/batch operations where you can amortize the expensive operation(s) by doing more work in between the expensive operations. No argument that batch/bulk design is good, but I'm not sure I'd consider it a scalability aspect (at least in what I understood this thread to be about).

I think it's the opposite. For batch applications you want the computation to run faster by adding more cores. For online, concurrent applications you usually have some distribution of latency you'd like to maintain, and perfect scalability means not reducing the latency for the same workload, but maintaining the latency for a growing workload by increasing the number of cores linearly with the data set. You'll note that Martin's original definition -- namely no contention -- you get the same kind of scalability. Adding CPUs doesn't decrease latency but preserves it with a growing workload. My definition covers more designs and has the exact same property: namely linearly grow your workload/concurrency while preserving latency.

Perhaps you're referring to my mention of "amortized latency". I did not mean increasing maximum latency, but rather preserving the same required latency distribution. It is true, however, that if your required latency distribution is flat and very, very low, then your choice of architectures is greatly reduced. But that would limit your choice of OSs (or other low-level infrastructure) as well, because even most OSs amortize their latency (and, obviously, most GCs).

Ron

On Tue, Sep 1, 2015 at 10:47 AM, pron <ron.pr...@gmail.com>wrote:

Right, but there's a subtle interaction between the number of cores and size of the dataset which makes what I said true about perfect scalability (when talking about online, concurrent systems).

Suppose you have a data structure (and concurrent balanced trees have this property), where keeping data size the same but increasing the number of cores (i.e. distributing the same number of items across more cores) would increase contention -- so it would seem like you have negative scalability -- however, keeping the number of cores the same and increasing the number of data items reducescontention. Then such a data structure would have the property that if you keep the number of cores linearly related to the size of the data set in a given application, you'd get perfect scalability, as you'd be able to increase your workload by adding cores while maintaining the same throughput per core and amortized latency. When talking about concurrent online workloads, this is the kind of scalability you want (as opposed to data-parallel workloads, where perfect scalability means linearly reducing processing time for a given data set by adding more cores).

--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/bAB3yoji8NE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.

Vitaly Davidovich

unread,

Sep 2, 2015, 12:52:07 PM9/2/15

to mechanical-sympathy

Because that’s the formula: amortized worst-case latency = kM/n
The full derivation is in the blog post I linked to.

I'm talking about strictly increasing the number of CPUs, not the dataset as well. The structure of the tree at some data set size X is going to be the same with increase in CPUs, and contention will go up. I'm trying to understand how you arrived at linear contention given this case. And by linear contention, I don't mean algorithmic theoretical but on an actual machine.

Coherence traffic is equal to kM/n

You’re trying to come up with a mental model, but you need to look at the algorithm itself, because it takes care of your concerns — it most certainly doesn’t work like a queue. The idea is that a balanced tree will cause heavier node-write contention the higher the node is in the tree, but because of the tree’s design, the chance of a mutation is lower the higher the node is, and the two cancel each other out perfectly (note that by “chance” I don’t mean probabilistically, but amortized). The data structure ensures, then, that most tree mutations, then, involve a single core, fewer might involve two cores, and so on, and the whole thing cancels out.

See above -- the tree structure stays the same, only CPU count goes up. I don't see how the tree *ensures* that most mutations involve 1 core vs 2 vs 4 vs X. The data is balanced out in terms of storage and so with some uniform distribution access to the key space, you'd get the spreading behavior, but what if you don't have uniform distribution? That's why I mentioned earlier that assume the workload (reads + writes) are concentrating on same subtrees.

pron

unread,

Sep 2, 2015, 3:35:05 PM9/2/15

to mechanical-sympathy

On Wednesday, September 2, 2015 at 7:52:07 PM UTC+3, Vitaly Davidovich wrote:

I'm talking about strictly increasing the number of CPUs, not the dataset as well. The structure of the tree at some data set size X is going to be the same with increase in CPUs, and contention will go up. I'm trying to understand how you arrived at linear contention given this case.

I said increase in contention, and hence latency. It follows straight from the formula. This kind of design does not help you to reduce latency by adding cores, just preserving latency for growing data by adding cores.

And by linear contention, I don't mean algorithmic theoretical but on an actual machine.

What's the difference?

In practice, on a single socket there's little reason to distribute work among threads in this way because the cache is too small and cache misses dominate contention. There isn't much significance to a core "owning" a subtree because nodes have to be brought in from shared cache or RAM (we're talking millions of elements), and the tree structure guarantees (probabilistically, this time) low contention even if each thread modifies a random subtree (collisions at the leaves are probabilistically rare, and contention at the inner nodes is guaranteed to be low by the amortized structure of the tree). So on a single CPU, we get low (i.e. constant) contention rates, but those are not guaranteed in amortized worst-case. On NUMA or a grid, results so far seem to follow the theoretical model. We hope to be able to publish results in a few months.

See above -- the tree structure stays the same, only CPU count goes up. I don't see how the tree *ensures* that most mutations involve 1 core vs 2 vs 4 vs X.

Because contention can only occur in inner nodes, and inner node writes are always amortized. Ee.g., we can have an inner node write no more frequently than once every 10 leaf writes, and inner nodes at that level can only be contended by at most 2 cores; every no less than 100 leaf writes we can have an inner-node write that may compete with up to 4 cores and so on.

The data is balanced out in terms of storage and so with some uniform distribution access to the key space, you'd get the spreading behavior, but what if you don't have uniform distribution? That's why I mentioned earlier that assume the workload (reads + writes) are concentrating on same subtrees.

Unlike the sharded design, we don't need to assume uniform distribution of transactions over the keyspace. We do, however assume uniform distribution of transactions over actual data items (regardless of their distribution over the keyspace), which makes the tree more stable in the face of adversarial conditions (as is often the case in games, where player might congregate unexpectedly). Whether that assumption is reasonable or not depends on the domain, but it's certainly weaker (i.e. more lenient, or less constraining) than the sharded assumption of uniformity over the keyspace.

Ron

Ron

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/bAB3yoji8NE/unsubscribe.

To unsubscribe from this group and all its topics, send an email to mechanical-sympathy+unsub...@googlegroups.com
...

Vitaly Davidovich

unread,

Sep 2, 2015, 4:12:10 PM9/2/15

to mechanical-sympathy

I said increase in contention, and hence latency. It follows straight from the formula. This kind of design does not help you to reduce latency by adding cores, just preserving latency for growing data by adding cores.

And that's what I've been trying to say the whole time -- the scaling we're talking about here is "I have a problem of size X -- can I get better performance (throughput and/or latency) by throwing more cores at the *exact* same problem of size X, and if so, how does the performance scale with each bump in CPU count", not how it scales with more cores *and* more data. The reason the former is interesting is because it exposes bottlenecks in concurrency management, and that's where lock, lock-free, shared-nothing topics come into play. Improving cache access for larger data set is an interesting problem, but it's an interesting problem even in singlethreaded cases.

What's the difference?
In practice, on a single socket there's little reason to distribute work among threads in this way because the cache is too small and cache misses dominate contention. There isn't much significance to a core "owning" a subtree because nodes have to be brought in from shared cache or RAM (we're talking millions of elements), and the tree structure guarantees (probabilistically, this time) low contention even if each thread modifies a random subtree (collisions at the leaves are probabilistically rare, and contention at the inner nodes is guaranteed to be low by the amortized structure of the tree). So on a single CPU, we get low (i.e. constant) contention rates, but those are not guaranteed in amortized worst-case. On NUMA or a grid, results so far seem to follow the theoretical model. We hope to be able to publish results in a few months.

The difference is that constant factors play a big role in real machines, and the classic asymptotic complexity analyses of various algos and data structures never take machine architecture into account, nevermind blatantly disregarding constant factors (for the most part).

The LLC is a few tens of MB per socket nowadays (Haswell is up to 45MB to be exact), which is a sizeable piece of data given it's a cache (i.e. entire dataset can be several magnitudes larger, but we only care about the hot data). Intel Broadwell has a 128MB L4 cache. And yes, access patterns do matter for cache utilization irrespective of shared-nothing/locking/etc -- it's an orthogonal concern.

Ron

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/bAB3yoji8NE/unsubscribe.

To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com
...

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

pron

unread,

Sep 2, 2015, 4:47:57 PM9/2/15

to mechanical-sympathy

On Wednesday, September 2, 2015 at 11:12:10 PM UTC+3, Vitaly Davidovich wrote:

And that's what I've been trying to say the whole time -- the scaling we're talking about here is "I have a problem of size X -- can I get better performance (throughput and/or latency) by throwing more cores at the *exact* same problem of size X, and if so, how does the performance scale with each bump in CPU count", not how it scales with more cores *and* more data. The reason the former is interesting is because it exposes bottlenecks in concurrency management, and that's where lock, lock-free, shared-nothing topics come into play. Improving cache access for larger data set is an interesting problem, but it's an interesting problem even in singlethreaded cases.

I think the second scalability problem (how to maintain my latency with growing workloads) is more common in online systems, and more important (workload can increase unboundedly, but you can only decrease latency so much).

The difference is that constant factors play a big role in real machines, and the classic asymptotic complexity analyses of various algos and data structures never take machine architecture into account, nevermind blatantly disregarding constant factors (for the most part).

Oh, absolutely, but I wasn't talking about constant factors but about constants. I.e., the algorithm guarantees no more than, say, 0.001 (amortized) contentions per transaction. We're talking adding nanoseconds (or similar relative constants when you go off-machine) to your latency and that added latency remains constant when you increase the number of cores to handle increased workloads. Again, the difference between single-writer and constant-contention is between having zero contentions per transaction and 0.001 (amortized) contentions per transaction. No hidden constant factors.

Ron

Ron

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscribe@googlegr
...

Vitaly Davidovich

unread,

Sep 2, 2015, 5:18:53 PM9/2/15

to mechanical-sympathy

The "how does this processing scale across CPUs for constant data" is a measurement of the concurrency bottlenecks in the design. It's quite possible that you're already at memory capacity of a machine you're on, but not yet using more CPUs - there's nowhere to place more data even, you just want to optimize processing on this machine at its memory capacity (other machines are recruited for larger data sets). I don't think this is atypical of online systems - you load them up data wise (typically pretty easy) and then try to spread the workload over CPUs efficiently (harder).

Once you figure out the scaling properties of the system design, you can then tackle the load balancing problem. Afterall, running the problem of size X in parallel already required some way to divide it up.

In your example, what happens if 1000 concurrent requests all want to structurally mutate the same segment of the tree? You have 64 CPUs that can process these requests.

sent from my phone

Ron

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscribe@googlegr

...

--

You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

pron

unread,

Sep 3, 2015, 5:06:25 AM9/3/15

to mechanical-sympathy

On Thursday, September 3, 2015 at 12:18:53 AM UTC+3, Vitaly Davidovich wrote:

In your example, what happens if 1000 concurrent requests all want to structurally mutate the same segment of the tree? You have 64 CPUs that can process these requests.

Well, one of two things will happen: either that segment of the tree will split and balance the load, or not. If not, it's still better than the sharded approach where 1000 concurrent request hitting the same key-space region. As I said, sharding assumes uniformity over the keyspace, while trees assume uniformity over the actual keys, which is a much weaker assumption. If the weaker assumption is false, so is the stronger one.

As a concrete example, if many players congregate in one region of a game world, that region will split and be simulated by multiple compute nodes. With MMOs like EVE or Second Life, the compute node assigned to the region would just crawl to a halt when people congregate.

Ron

sent from my phone

Ron

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit

...

Richard Warburton

unread,

Sep 3, 2015, 9:36:16 AM9/3/15

to mechanica...@googlegroups.com

Hi,

And by linear contention, I don't mean algorithmic theoretical but on an actual machine.

What's the difference?

Well in theory they're the same, but in practice they're different.

regards,

Richard Warburton

http://insightfullogic.com

@RichardWarburto

Reply all

Reply to author

Forward