The alert reader will notice that such operations are prone to so calledABA Problems
, or a false-positive match, when the value between read and write operations is getting changed fromA
toB
and then back toA
. Although CPU designers have already solved this problem for us by adding a counter alongside with the value being swapped. Every operation will receive a value together with a counter, which both will be later used for when attempting a commit. If you need such guarantees on JVM, you have to use anAtomicStampedReference
.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
I've mostly referred to DCAS/CAS2 (double word compare and swap), which allows you to track the sequence number together with the pointer to the swapped structure. I didn't want to go to deep into the low-level details, and stick to the general concepts..
True. I've mentioned it briefly, maybe I should elaborate a bit more on that one.
--
I'd argue that "state of the art" in scalable design is to have no contention.
I think the scaling talked about here is cpu count growth, not data set size.
sent from my phone
--
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
Suppose you have a data structure (and concurrent balanced trees have this property), where keeping data size the same but increasing the number of cores (i.e. distributing the same number of items across more cores) would increase contention -- so it would seem like you have negative scalability -- however, keeping the number of cores the same and increasing the number of data items reduces contention
Ron
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
It *is* negative with respect to CPU scalability. It may have positive scalability with data set size, but that's a separate topic. Oftentimes you have fixed (or near fixed) working set, and want to throw more CPUs at the problem. The working set may be fixed because there's already partitioning of load across machines (for a variety of reasons).What you're basically describing is bulk/batch operations where you can amortize the expensive operation(s) by doing more work in between the expensive operations. No argument that batch/bulk design is good, but I'm not sure I'd consider it a scalability aspect (at least in what I understood this thread to be about).
On Tue, Sep 1, 2015 at 10:47 AM, pron <ron.pr...@gmail.com> wrote:
Right, but there's a subtle interaction between the number of cores and size of the dataset which makes what I said true about perfect scalability (when talking about online, concurrent systems).Suppose you have a data structure (and concurrent balanced trees have this property), where keeping data size the same but increasing the number of cores (i.e. distributing the same number of items across more cores) would increase contention -- so it would seem like you have negative scalability -- however, keeping the number of cores the same and increasing the number of data items reduces contention. Then such a data structure would have the property that if you keep the number of cores linearly related to the size of the data set in a given application, you'd get perfect scalability, as you'd be able to increase your workload by adding cores while maintaining the same throughput per core and amortized latency. When talking about concurrent online workloads, this is the kind of scalability you want (as opposed to data-parallel workloads, where perfect scalability means linearly reducing processing time for a given data set by adding more cores).
Ron
On Tuesday, September 1, 2015 at 5:08:43 PM UTC+3, Vitaly Davidovich wrote:
I think the scaling talked about here is cpu count growth, not data set size.
sent from my phone
On Aug 30, 2015 10:40 AM, "pron" <ron.pr...@gmail.com> wrote:
--
On Thursday, July 30, 2015 at 10:19:01 AM UTC+3, Martin Thompson wrote:I'd argue that "state of the art" in scalable design is to have no contention.... and I'd argue that state-of-the-art, or "perfect" scalability can be achieved if contention is kept to O(1) in the size of the data, amortized across operations. That significantly widens the scope of scalable data structures possible, as many balanced trees (e.g. B+-trees) can be constructed to have this property.Otherwise, state-of-the-art would only apply to "embarrassingly concurrent" domains...
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
I think it's the opposite. For batch applications you want the computation to run faster by adding more cores. For online, concurrent applications you usually have some distribution of latency you'd like to maintain, and perfect scalability means not reducing the latency for the same workload, but maintaining the latency for a growing workload by increasing the number of cores linearly with the data set. You'll note that Martin's original definition -- namely no contention -- you get the same kind of scalability. Adding CPUs doesn't decrease latency but preserves it with a growing workload. My definition covers more designs and has the exact same property: namely linearly grow your workload/concurrency while preserving latency.
Ron
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
To give a simple but concrete example, say you're ingesting quotes for X stocks. If you run the entire processing chain on 1 CPU and that processing cannot keep up with the ingress rate, you'll end up queuing and having to apply some backpressure somewhere. If you shard these symbols across multiple CPUs, then you can of course run the processing in parallel. Time it takes to process each tick stays the same (assuming perfect parallelism), but latency will go down. The "no contention" aspect comes into play because it's likely some data sharing/communication needs to occur between the CPUs. The less you share/communicate between them, the better the system will scale. This is true even if you didn't care about thread safety and simply had multiple CPUs writing to same cachelines with no atomics/locks/etc.
On Tue, Sep 1, 2015 at 12:22 PM, pron <ron.pr...@gmail.com> wrote:
On Tuesday, September 1, 2015 at 6:52:58 PM UTC+3, Vitaly Davidovich wrote:It *is* negative with respect to CPU scalability. It may have positive scalability with data set size, but that's a separate topic. Oftentimes you have fixed (or near fixed) working set, and want to throw more CPUs at the problem. The working set may be fixed because there's already partitioning of load across machines (for a variety of reasons).What you're basically describing is bulk/batch operations where you can amortize the expensive operation(s) by doing more work in between the expensive operations. No argument that batch/bulk design is good, but I'm not sure I'd consider it a scalability aspect (at least in what I understood this thread to be about).I think it's the opposite. For batch applications you want the computation to run faster by adding more cores. For online, concurrent applications you usually have some distribution of latency you'd like to maintain, and perfect scalability means not reducing the latency for the same workload, but maintaining the latency for a growing workload by increasing the number of cores linearly with the data set. You'll note that Martin's original definition -- namely no contention -- you get the same kind of scalability. Adding CPUs doesn't decrease latency but preserves it with a growing workload. My definition covers more designs and has the exact same property: namely linearly grow your workload/concurrency while preserving latency.Perhaps you're referring to my mention of "amortized latency". I did not mean increasing maximum latency, but rather preserving the same required latency distribution. It is true, however, that if your required latency distribution is flat and very, very low, then your choice of architectures is greatly reduced. But that would limit your choice of OSs (or other low-level infrastructure) as well, because even most OSs amortize their latency (and, obviously, most GCs).Ron
On Tue, Sep 1, 2015 at 10:47 AM, pron <ron.pr...@gmail.com> wrote:
Right, but there's a subtle interaction between the number of cores and size of the dataset which makes what I said true about perfect scalability (when talking about online, concurrent systems).Suppose you have a data structure (and concurrent balanced trees have this property), where keeping data size the same but increasing the number of cores (i.e. distributing the same number of items across more cores) would increase contention -- so it would seem like you have negative scalability -- however, keeping the number of cores the same and increasing the number of data items reduces contention. Then such a data structure would have the property that if you keep the number of cores linearly related to the size of the data set in a given application, you'd get perfect scalability, as you'd be able to increase your workload by adding cores while maintaining the same throughput per core and amortized latency. When talking about concurrent online workloads, this is the kind of scalability you want (as opposed to data-parallel workloads, where perfect scalability means linearly reducing processing time for a given data set by adding more cores).
Ron
On Tuesday, September 1, 2015 at 5:08:43 PM UTC+3, Vitaly Davidovich wrote:
I think the scaling talked about here is cpu count growth, not data set size.
sent from my phone
On Aug 30, 2015 10:40 AM, "pron" <ron.pr...@gmail.com> wrote:
--
On Thursday, July 30, 2015 at 10:19:01 AM UTC+3, Martin Thompson wrote:I'd argue that "state of the art" in scalable design is to have no contention.... and I'd argue that state-of-the-art, or "perfect" scalability can be achieved if contention is kept to O(1) in the size of the data, amortized across operations. That significantly widens the scope of scalable data structures possible, as many balanced trees (e.g. B+-trees) can be constructed to have this property.Otherwise, state-of-the-art would only apply to "embarrassingly concurrent" domains...
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
What limits the interaction for games or makes domain modeling "onerous"? How you load balance across cpus is a domain specific function, orthogonal to scalability (in terms of cpu). The "no contention"/shared-nothing aspect requires you have a single mutator (and thus "owner") of some state with communication channels for many readers. This isn't really a datastructure issue (although they play a role, obviously), but system design.
Can you give a concrete example of what you're referring to?
sent from my phone
Ron
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
What limits the interaction for games or makes domain modeling "onerous"? How you load balance across cpus is a domain specific function, orthogonal to scalability (in terms of cpu). The "no contention"/shared-nothing aspect requires you have a single mutator (and thus "owner") of some state with communication channels for many readers. This isn't really a datastructure issue (although they play a role, obviously), but system design.
Can you give a concrete example of what you're referring to?
sent from my phone
Ron
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
With stocks, you usually know upfront which symbols you'll receive, or you can design your system to know this a priori. You don't need dynamic load balancing - you can design your system to have a cost function based on prior days to deduce expected number of messages. Even if you do dynamic, you can avoid contention by having single cpu responsible for figuring out which worker a stock should get assigned to. Once it decides based on whatever dynamic strategy, it communicates that to the worker and the worker handles the tick. Think of it like, e.g., nginx having a single accept thread that then hands off the file descriptor for the accepted connection to a worker thread, and from that point forward the worker handles all the comms on that fd.
Player A shoots player B can be done in a similar manner by having cpus handling subset of players, and communicating with each other contention free.
I don't quite understand the constant contention point - certainly if you have more cpus participating in the writing (definition of contention), that will not scale well. Increasing the working set to then say you've kept contention constant because your data set size is larger isn't really the same thing. Also, the contention won't be constant if you have cpus contending and increase cpu count, by definition. Even if you increase the data set size and cpu count, it's unlikely that all the data is actually in use all the time.
There are other benefits to single writer, such as easier reasoning/debuggability/determinism/etc. But as I mentioned earlier, the trick is to design your system such that it can do that, and so it's not really a datastructure issue.
sent from my phone
Ron
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
Ron
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
Obviously, if you don't need dynamic balancing then you don't need it. If you do (or if you don't want to carefully design the system for a static strategy), though, you can still have perfect scalability. Static load balancing in MMOs, for example, usually requires a specific game design or various community-management measures that are not necessary if you use good dynamic load balancing.
Message-passing is contention, or, more accurately, it's what we try to avoid when we try to avoid contention. We care about contention because it triggers slow communication (when we analyze concurrent data structures for contention, we count the number of messages required to establish cache-coherence). If you communicate explicitly, your performance isn't any better -- it just uses a different abstraction. For example, in my own simulations, I sometimes use a message-passing abstraction and sometimes shared-memory abstraction. The latter is more convenient when you need transactions.
Obviously, there are many benefits to single-writer; being the only way to achieve perfect scalability is not one of them. It's certainly the simplest, if it works well enough. When it doesn't, algorithms that cause constant contention can still achieve the same scalability with better adversary-competitiveness, and sometimes with less restrictions on the domain model and more possibilities (like multi-entity transactions).
Ron
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
It's not always obvious that you can have good static load balancing, that's part of my point about thinking through the design. I'm also not buying "perfect scalability" by adding contending CPUs but also increasing data set size. As I mentioned, the increased data set size won't necessarily mean an increase in working set size but the extra CPUs will contend no matter what.
With concurrent and lock-free datastructures, they always assume they can be updated from multiple CPUs, and at best use atomic instructions (and at worst granular locks or CAS loops); with single writer/multiple readers you can do this with no atomic instructions (on some archs like x86/64).
It's actually far from simple to design single-writer for anything other than toy examples -- that's again part of my point. It's easier to reach for a concurrent data structure. And again, your definition of constant contention/perfect scalability by virtue of increasing data set size doesn't jive.
Ron
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
They won't. Here's the math (it talks about a distributed cache-coherent grid, but the result is the same). The amortized-worst-case number of cache coherence messages turns out to be kM/n (for a concurrent B-tree) where k is some constant of the data structure, M is the number of compute nodes and n is the size of the data set (and if M/n turn out to be a constant -- more or less -- and they usually are for most systems, you get constant contention).
Yes. But you suggested a message-passing solution to the MMO player-shoots-player problem, which requires multiple writers (i.e. multiple shooters can send a message to the shootee's queue). If the domain allows any incoming event to modify more than a single entity, you just can't ensure those entities are all in the same shard (although a dynamic load-balancing strategy might make it true often enough).
Ron
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
How does k scale with an increase in CPUs hammering the datastructure? Also, I'm not sure what cache coherence messages you're talking about, but I'm assuming this is modeling your distributed system as if it was multi-CPU system. Otherwise, the number of CPU cache coherence messages would depend on number of cachelines involved, and not simply kM/n. Perhaps this is part of the disconnect here -- the contention this thread has been talking about is within a single multi-CPU system taking microarch details into account (e.g. cheapest way to communicate between a single writer and multiple reader CPUs).
Ron
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
I don't think you answered my question about how k reacts to increased number of CPUs modifying the data structure. Specifically, as the number of concurrent writers and readers increases, what are the progress, latency, and throughput properties? Assume all readers and writers are working with the same leaf page.
sent from my phone
Ron
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
Ron
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
Given the same data set, adding CPUs would increase contention and hurt latency linearly
Note, however, that what "data" and "mutation" mean depends on the domain
Ron
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
On Sep 2, 2015, at 7:24 PM, Vitaly Davidovich <vit...@gmail.com> wrote:This is what I'm trying to figure out -- how did you determine it's linear?
By mutation I mean structural changes to the tree. And this gets to the point of contention that I'm talking about. With single writer, there's no contention on mutation because, by definition, it's the sole owner and always makes forward progress with no algorithmic delays. With a shared structure that always has to support the possibility of multiple mutators, there will be a weakening of forward progress (the system will make progress, but the pace will slow down)…
k is constant. It depends on the fanout, which, in turn, depends on the cache-line size. The number of mutating (or reading) CPUs doesn't affect k at all -- it is equal to M. Given the same data set, adding CPUs would increase contention and hurt latency linearly. But the assumption is that you can achieve your required latency with some small dataset, and now want to increase it. Adding CPUs as a constant proportion to the dataset size would keep the latency profile (distribution) exactly the same, but, as you can see in the post, the worst-case will increase (but become rarer, which is why the distribution stays the same).Note, however, that what "data" and "mutation" mean depends on the domain. If the data is location of player characters, then each message coming from each player is a mutation. If, OTOH, your data is some set of stocks and you use the tree for dynamic load balancing, then messages updating some internal value of the stock are not mutators as they don't change the tree structure. A mutation would be adding a new stock. Of course, it doesn't make much sense to use such a data structure in this case because all it does is distribute the number of stocks evenly among threads without exploiting any sort of locality -- a simple round-robin assignment would achieve the same. If, however, it's better somehow for a group of stocks to be handled by the same thread (based on some property), then this could be a good dynamic load-balancing solution.
Ron
On Wednesday, September 2, 2015 at 6:57:00 PM UTC+3, Vitaly Davidovich wrote:
I don't think you answered my question about how k reacts to increased number of CPUs modifying the data structure. Specifically, as the number of concurrent writers and readers increases, what are the progress, latency, and throughput properties? Assume all readers and writers are working with the same leaf page.
sent from my phone
On Sep 2, 2015 11:39 AM, "pron" <ron.pr...@gmail.com> wrote:
On Wednesday, September 2, 2015 at 5:57:42 PM UTC+3, Vitaly Davidovich wrote:How does k scale with an increase in CPUs hammering the datastructure? Also, I'm not sure what cache coherence messages you're talking about, but I'm assuming this is modeling your distributed system as if it was multi-CPU system. Otherwise, the number of CPU cache coherence messages would depend on number of cachelines involved, and not simply kM/n. Perhaps this is part of the disconnect here -- the contention this thread has been talking about is within a single multi-CPU system taking microarch details into account (e.g. cheapest way to communicate between a single writer and multiple reader CPUs).That doesn't matter -- the result is the same. k is a constant that depends on the structure of the balanced tree, in particular its fanout. As the fanout directly determines the size, in cache lines, of each tree-node update message (it's a cache-aware data structure), the number of CPU bus messages is some constant multiple of the number of update messages (in fact, the fanout can be -- and often is -- chosen so that each tree node occupies exactly one cache line) so the math is the same.
There isa difference between such a distributed system and the intra-CPU case -- it's not contention but the total number of cache misses (on either reads or writes). In the distributed case, the "L1 cache" is very large (the RAM of the entire machine, or of the NUMA socket etc.), it's what's known as a cache-only memory architecture(COMA) design. The concurrent balanced tree can also minimize the total number of "cache misses" in this case, if reads (or writes) have some spatial locality (where spatial is defined by the tree -- e.g., for a B+-tree, it means key ranges, and spatial locality means that most queries are single-range queries; spatial could also mean graph span, if the data structure is designed for graphs). You can achieve the same with sharding if you shard by range, but again, then you don't get dynamic load balancing if you need it (adversary competitiveness, which, BTW, has brought down many MMOs that relied on sharding). In the intra-CPU case, the L1 cache is small relative to the data set each core ends up serving, so this doesn't matter.Ron
The caveat for the data structures (that I'm familiar with) with that property is an increase in maximum latency but with guaranteed commensurate decrease in high-latency events' frequency. So your latency distributionremains the same (i.e. 99th, 99.99th percentiles etc. are the same), but the maximumincreases. If you can't ever, ever tolerate higher latency -- no matter how rare -- then you need a different data structure for perfect scalability. So the data structures I'm talking about give you perfect scalability w.r.t the required latency distribution, but they may increase the (increasingly rare) worst case.Ron
On Tuesday, September 1, 2015 at 6:52:58 PM UTC+3, Vitaly Davidovich wrote:It *is* negative with respect to CPU scalability. It may have positive scalability with data set size, but that's a separate topic. Oftentimes you have fixed (or near fixed) working set, and want to throw more CPUs at the problem. The working set may be fixed because there's already partitioning of load across machines (for a variety of reasons).What you're basically describing is bulk/batch operations where you can amortize the expensive operation(s) by doing more work in between the expensive operations. No argument that batch/bulk design is good, but I'm not sure I'd consider it a scalability aspect (at least in what I understood this thread to be about).I think it's the opposite. For batch applications you want the computation to run faster by adding more cores. For online, concurrent applications you usually have some distribution of latency you'd like to maintain, and perfect scalability means not reducing the latency for the same workload, but maintaining the latency for a growing workload by increasing the number of cores linearly with the data set. You'll note that Martin's original definition -- namely no contention -- you get the same kind of scalability. Adding CPUs doesn't decrease latency but preserves it with a growing workload. My definition covers more designs and has the exact same property: namely linearly grow your workload/concurrency while preserving latency.Perhaps you're referring to my mention of "amortized latency". I did not mean increasing maximum latency, but rather preserving the same required latency distribution. It is true, however, that if your required latency distribution is flat and very, very low, then your choice of architectures is greatly reduced. But that would limit your choice of OSs (or other low-level infrastructure) as well, because even most OSs amortize their latency (and, obviously, most GCs).Ron
Right, but there's a subtle interaction between the number of cores and size of the dataset which makes what I said true about perfect scalability (when talking about online, concurrent systems).
Suppose you have a data structure (and concurrent balanced trees have this property), where keeping data size the same but increasing the number of cores (i.e. distributing the same number of items across more cores) would increase contention -- so it would seem like you have negative scalability -- however, keeping the number of cores the same and increasing the number of data items reducescontention. Then such a data structure would have the property that if you keep the number of cores linearly related to the size of the data set in a given application, you'd get perfect scalability, as you'd be able to increase your workload by adding cores while maintaining the same throughput per core and amortized latency. When talking about concurrent online workloads, this is the kind of scalability you want (as opposed to data-parallel workloads, where perfect scalability means linearly reducing processing time for a given data set by adding more cores).
--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/bAB3yoji8NE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.
Because that’s the formula: amortized worst-case latency = kM/n
The full derivation is in the blog post I linked to.
Coherence traffic is equal to kM/n
You’re trying to come up with a mental model, but you need to look at the algorithm itself, because it takes care of your concerns — it most certainly doesn’t work like a queue. The idea is that a balanced tree will cause heavier node-write contention the higher the node is in the tree, but because of the tree’s design, the chance of a mutation is lower the higher the node is, and the two cancel each other out perfectly (note that by “chance” I don’t mean probabilistically, but amortized). The data structure ensures, then, that most tree mutations, then, involve a single core, fewer might involve two cores, and so on, and the whole thing cancels out.
I'm talking about strictly increasing the number of CPUs, not the dataset as well. The structure of the tree at some data set size X is going to be the same with increase in CPUs, and contention will go up. I'm trying to understand how you arrived at linear contention given this case.
And by linear contention, I don't mean algorithmic theoretical but on an actual machine.
See above -- the tree structure stays the same, only CPU count goes up. I don't see how the tree *ensures* that most mutations involve 1 core vs 2 vs 4 vs X.
The data is balanced out in terms of storage and so with some uniform distribution access to the key space, you'd get the spreading behavior, but what if you don't have uniform distribution? That's why I mentioned earlier that assume the workload (reads + writes) are concentrating on same subtrees.
Ron
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/bAB3yoji8NE/unsubscribe.
...To unsubscribe from this group and all its topics, send an email to mechanical-sympathy+unsub...@googlegroups.com
I said increase in contention, and hence latency. It follows straight from the formula. This kind of design does not help you to reduce latency by adding cores, just preserving latency for growing data by adding cores.
What's the difference?
In practice, on a single socket there's little reason to distribute work among threads in this way because the cache is too small and cache misses dominate contention. There isn't much significance to a core "owning" a subtree because nodes have to be brought in from shared cache or RAM (we're talking millions of elements), and the tree structure guarantees (probabilistically, this time) low contention even if each thread modifies a random subtree (collisions at the leaves are probabilistically rare, and contention at the inner nodes is guaranteed to be low by the amortized structure of the tree). So on a single CPU, we get low (i.e. constant) contention rates, but those are not guaranteed in amortized worst-case. On NUMA or a grid, results so far seem to follow the theoretical model. We hope to be able to publish results in a few months.
Ron
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/bAB3yoji8NE/unsubscribe.
...To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
And that's what I've been trying to say the whole time -- the scaling we're talking about here is "I have a problem of size X -- can I get better performance (throughput and/or latency) by throwing more cores at the *exact* same problem of size X, and if so, how does the performance scale with each bump in CPU count", not how it scales with more cores *and* more data. The reason the former is interesting is because it exposes bottlenecks in concurrency management, and that's where lock, lock-free, shared-nothing topics come into play. Improving cache access for larger data set is an interesting problem, but it's an interesting problem even in singlethreaded cases.
The difference is that constant factors play a big role in real machines, and the classic asymptotic complexity analyses of various algos and data structures never take machine architecture into account, nevermind blatantly disregarding constant factors (for the most part).
Ron
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
...To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscribe@googlegr
The "how does this processing scale across CPUs for constant data" is a measurement of the concurrency bottlenecks in the design. It's quite possible that you're already at memory capacity of a machine you're on, but not yet using more CPUs - there's nowhere to place more data even, you just want to optimize processing on this machine at its memory capacity (other machines are recruited for larger data sets). I don't think this is atypical of online systems - you load them up data wise (typically pretty easy) and then try to spread the workload over CPUs efficiently (harder).
Once you figure out the scaling properties of the system design, you can then tackle the load balancing problem. Afterall, running the problem of size X in parallel already required some way to divide it up.
In your example, what happens if 1000 concurrent requests all want to structurally mutate the same segment of the tree? You have 64 CPUs that can process these requests.
sent from my phone
Ron
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscribe@googlegr
--...
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
In your example, what happens if 1000 concurrent requests all want to structurally mutate the same segment of the tree? You have 64 CPUs that can process these requests.
sent from my phone
Ron
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit
...
And by linear contention, I don't mean algorithmic theoretical but on an actual machine.What's the difference?