One of the major challenges of playing together online is timing. Actually, it's THE major challenge. I'm trying to understand how musicians can play in time in a peer-to-peer system like SonoBus, or does it only work when the delay is really small? How well does it work in practice, and what are the limitations?
Server-based systems such as Jamulus deal with this by making the server the time reference. All audio goes to the server, which mixes it and sends the mix back to each player. Everyone hears the same thing, although some hear it later than others. It's each musician's responsibility to listen to the (delayed) mix coming back from the server, and to play with that. This works because your brain can adapt, up to a point (that point seems to be around 50 ms).
Peer-to-peer solutions such as SonoBus eliminate the round trip to the server, which lowers latency while scaling up the bandwidth requirements as the number of musicians increases.
But how do you play in time without a common point of reference? If the delays are really small it might not matter much. But as delay increases, I am listening to me in the present and you in the past, while you listen to you in the present and me in the past.
So how can we play in time? Maybe I am missing something here, but it doesn't look to me like SonoBus currently has a way to introduce a common time reference. Without that, I think it would only work well if the delay is really low.
Couple of thoughts.
1) SonoBus has a metronome function that you can send to others. However this appears to just send the audio, so others hear it with delay, which guarantees they will be out of time. I think it might be more useful if one player's metronome could command the other players' metronomes to click at the same (real) time. Then then players would have a common reference. They would still hear the other players' audio with delay, so I'm not sure how workable that would be in practice, but at least each musician would have a common time reference.
2) Another possibility would be to have each player's SonoBus client time-synchronize the audio streams, including the local user, when it mixes them. To do this it would need to intentionally add different delay amounts to each stream, including the user's own, in order to produce a time-synchronized mix that people could play to. Like playing on a Jamulus server, it would mean each user would need to play to a delayed mix, but the upside is that everyone would hear the same thing. The advantage would be that, by eliminating the server hops, and buffer/decompress/mix/compress cycle at the server, SonoBus would have lower latency and be workable at greater distances.
-Kevin