Phase/skew sensitivity

Don Y

unread,

May 22, 2012, 4:01:55 PM5/22/12

to

Hi,

I have designed "network speakers" that use ethernet to
receive digital sound samples sourced from a remote
server. I.e., the "network" plays the role of "speaker
wire" (it also optionally delivers power to the speakers
but that's immaterial).

Unlike streaming audio to, for example, a "PC" (where
the entire audio stream "terminates" before being
delivered to the listener's ears), each speaker can
be located electrically and physically separated from
all the others.

For example, run a "network drop" to the left side
of your living room and *another* to the *right* side.
Attach two independent "network speakers" to these
two drops. Just like you would two "wired" speakers.

As such, data can arrive at LEFT at a different time
than at RIGHT. This is a result of the sequential
nature of packets on the network media. As well as
delays introduced by network fabric (switch/hub)
and the normal variation of network stacks (the
software that moves the data in and out of the boxes
sitting on the network).

Note that traditional "analog" wiring the differences
between LEFT and RIGHT have to do with the *length*
of the interconnect wires (neglecting group delay
imbalances between one side of the amp and the other).
And, its hard to imagine having left and right
cables differing by huge amounts! :> ("The left
speaker is here and the right speaker is in my
neighbor's back yard!")

But, with packet switched technology, the delays
between packets arriving at one "speaker" and
the other can be larger. And, vary widely!

To get around this, I use a time synchornization
protocol that ensures each speaker shares a common
sense of "what time is it, now". And, thus,
knows when each packet *should* be "played"
(pushed out the D/A converter) so that all speakers
are "in sync".

Of course, I can't get/guarantee *perfect* synchronization
(an analog audiophile could trim his cables to identical
lengths, if so inclined, so the propagation delays were
perfectly matched). Though I can keep the dis-synchronization
from "changing too rapidly". (changes in synchronization
amount to instantaneous changes in *pitch*!)

OK, with all that behind us, how can I come up with a
target figure within which I should strive for "degree
of synchronization"? Where does the lag/phase shift start
becoming audible? Where does it start becoming a *nuisance*??

[Keep in mind that I am not limited to one LEFT and one RIGHT!]

Thx,
--don

Soundhaspriority

unread,

May 22, 2012, 4:09:29 PM5/22/12

to

Humans can distinguish 10 microseconds OR LESS differential from ear to
ear. Applying Nyquist's criteria to the problem, this suggests 5
microseconds between channels, with an additional guard of 3 microseconds, =
2 microseconds.

One might argue that localization is not that critical, and that amplifiers,
transducers, etc, all throw curveballs into the image. If your scheme
results in a constant bias, this will simply shift the image, which happens
with any speaker in any room. But if the error is jitter, which seems likely
in your design, that implies fuzzing of the image, compared to wired
speakers.

Bob Morein
(310) 237-6511

"Don Y" <th...@isnotme.com> wrote in message
news:jpgrbi$a48$1...@speranza.aioe.org...

William Sommerwerck

unread,

May 22, 2012, 4:17:11 PM5/22/12

to

Acoustically, one foot equals 0.9 milliseconds. I don't know for sure, but
I'm pretty certain that if the relative arrival times bounced around in the
+/- 0.9 millisecond range, you'd almost certainly hear it as an unstable
image -- and you might also hear phasing effects at lower frequencies.

I'd try to keep the maximum offset to no more than 0.09 milliseconds -- 90
microseconds, which is equivalent to 1.2 inches (30mm).

Scott Dorsey

unread,

May 22, 2012, 4:20:04 PM5/22/12

to

In article <jpgrbi$a48$1...@speranza.aioe.org>, Don Y <th...@isnotme.com> wrote:
>
>OK, with all that behind us, how can I come up with a
>target figure within which I should strive for "degree
>of synchronization"? Where does the lag/phase shift start
>becoming audible? Where does it start becoming a *nuisance*??

Well, if you move your head one foot, you have just shifted by 1 ms or so.

I think as long as you keep delays contstant to within 1ms, people won't
complain although less can be audible if you are looking for it.
--scott
--
"C'est un Nagra. C'est suisse, et tres, tres precis."

Randy Yates

unread,

May 22, 2012, 4:35:41 PM5/22/12

to

Don Y <th...@isnotme.com> writes:

> Hi,
>
> I have designed "network speakers" that use ethernet

There's your first problem. You're basing a phase-sensitive
system on a communications mechanism that has no determinism
in delivery of data. This is flawed from the outset.
--
Randy Yates
Digital Signal Labs
http://www.digitalsignallabs.com

Don Y

unread,

May 22, 2012, 4:37:43 PM5/22/12

to

On 5/22/2012 1:01 PM, Don Y wrote:

> Of course, I can't get/guarantee *perfect* synchronization
> (an analog audiophile could trim his cables to identical
> lengths, if so inclined, so the propagation delays were
> perfectly matched). Though I can keep the dis-synchronization
> from "changing too rapidly". (changes in synchronization
> amount to instantaneous changes in *pitch*!)
>
> OK, with all that behind us, how can I come up with a
> target figure within which I should strive for "degree
> of synchronization"? Where does the lag/phase shift start
> becoming audible? Where does it start becoming a *nuisance*??
>
> [Keep in mind that I am not limited to one LEFT and one RIGHT!]

(sigh) Peeking at the first set of responses, I realize
I should have elaborated on this aspect, further. Perhaps
a more aggressive example:

Imagine you are at an outdoor (?) concert. There are
stacks of "loudspeakers" (passive devices) everywhere.
Even out in the crowd.

Several different amplifiers fed off the main soundboard
with different "program content" (i.e., the monitor for
the lead vocalist might have a different mix delivered
to it than the main stack on stage left).

Chances are, the loudspeakers are triamped (or more?)
so that the high frequency components are processed
and amplified independent of the low/mid frequencies.
The signals delivered to stacks "out in the audience"
might be delayed so that their sound coincides with
the wavefront arriving from the main stacks at the stage.

Instead of running "miles" of balanced cable around to
the various amps and processors -- or, miles of
*high power* out to remote stacks -- you, instead,
run a network drop to the location of each "bit of
kit". Each then fed via something akin to the
"network speaker" (except without the *speaker*! :> ).

Now, you have dozens (or more) of signals each with
some potential skew with respect to the others "active"
at that time. I.e., the *top* "half" of the left stack's
tweeter array might be driven from *this* device
while the bottom half is driven by another -- potentially
having slightly different characteristics (by intent).

Now, rethink the issue (please) :>

Don Y

unread,

May 22, 2012, 4:43:58 PM5/22/12

to

Hi Randy,

On 5/22/2012 1:35 PM, Randy Yates wrote:
> Don Y<th...@isnotme.com> writes:
>
>> Hi,
>>
>> I have designed "network speakers" that use ethernet
>
> There's your first problem. You're basing a phase-sensitive
> system on a communications mechanism that has no determinism
> in delivery of data. This is flawed from the outset.

Ethernet *itself* has no determinism. But, that doesn't
mean you can't layer deterministic protocols atop it.
For example, IEEE 1588:
<http://en.wikipedia.org/wiki/Precision_Time_Protocol>

There, you can effectively trim your speaker cables
*virtually* to within a fraction of a microsecond of
each other.

Scott Dorsey

unread,

May 22, 2012, 4:46:36 PM5/22/12

to

In article <jpgtem$g5d$1...@speranza.aioe.org>, Don Y <th...@isnotme.com> wrote:
>
>Chances are, the loudspeakers are triamped (or more?)
>so that the high frequency components are processed
>and amplified independent of the low/mid frequencies.
>The signals delivered to stacks "out in the audience"
>might be delayed so that their sound coincides with
>the wavefront arriving from the main stacks at the stage.
>
>Instead of running "miles" of balanced cable around to
>the various amps and processors -- or, miles of
>*high power* out to remote stacks -- you, instead,
>run a network drop to the location of each "bit of
>kit". Each then fed via something akin to the
>"network speaker" (except without the *speaker*! :> ).
>
>Now, you have dozens (or more) of signals each with
>some potential skew with respect to the others "active"
>at that time. I.e., the *top* "half" of the left stack's
>tweeter array might be driven from *this* device
>while the bottom half is driven by another -- potentially
>having slightly different characteristics (by intent).

Yes, this is why the protocols designed for these applications are
deterministic and has deadlines built into packets so that latency is
both low AND constant.

Digigram Ethersound and Cobranet are the two most common protocols used
for the applications today.

Frankly, I don't think it's any real savings over running analogue audio
pairs, until you get to large numbers of channels.

Don Y

unread,

May 22, 2012, 4:49:50 PM5/22/12

to

Hi Bob,

On 5/22/2012 1:09 PM, Soundhaspriority wrote:
> Humans can distinguish 10 microseconds OR LESS differential from ear to
> ear. Applying Nyquist's criteria to the problem, this suggests 5
> microseconds between channels, with an additional guard of 3
> microseconds, = 2 microseconds.

Do you have a reference for this?

> One might argue that localization is not that critical, and that
> amplifiers, transducers, etc, all throw curveballs into the image. If

Yes. Though those all *follow* the availablility of the digital
data (i.e., you are looking at digital streams being emitted
by these devices *before* they move into analog processes)

> your scheme results in a constant bias, this will simply shift the
> image, which happens with any speaker in any room. But if the error is

That assumes all of the sound emitted by that speaker exhibits the
same group delay. If, OTOH, the speaker is a composite of
several speakers -- each driven with a potentially different
source -- then the issue is muddied. See the example I posted
in the followup to my original post.

> jitter, which seems likely in your design, that implies fuzzing of the
> image, compared to wired speakers.

Jitter isn't a real problem. The amount of jitter is determined
by the time constants used in the control loops (i.e., data is
buffered in each such device and only emitted when it "should"
be, temporally). So, it boils down to how stable your timing
synchronization loop happens to be.

Randy Yates

unread,

May 22, 2012, 4:55:03 PM5/22/12

to

Don Y <th...@isnotme.com> writes:

> Hi Randy,
>
> On 5/22/2012 1:35 PM, Randy Yates wrote:
>> Don Y<th...@isnotme.com> writes:
>>
>>> Hi,
>>>
>>> I have designed "network speakers" that use ethernet
>>
>> There's your first problem. You're basing a phase-sensitive
>> system on a communications mechanism that has no determinism
>> in delivery of data. This is flawed from the outset.
>
> Ethernet *itself* has no determinism.

Don,

Did you mean "ethernet" or "TCP/IP"? I read ethernet in your
original post and saw TCP/IP.

> But, that doesn't
> mean you can't layer deterministic protocols atop it.
> For example, IEEE 1588:
> <http://en.wikipedia.org/wiki/Precision_Time_Protocol>
>
> There, you can effectively trim your speaker cables
> *virtually* to within a fraction of a microsecond of
> each other.

Establishing timing and ensuring the timely delivery of data are two
different things. You *might* get this to work with enough caching
to compensate for delayed packets (which is what many of the open
source internet audio players do), but then you've got, er, delay
to deal with.

It just sounds to me, at first blush, that using 802.11.whatever and
TCP/IP is the wrong physical layer solution for your problem. On the
other hand, developing your own deterministic physical layer (perhaps
something based on the TI CC1100 transceivers and SimpliciTI?) is no
small job.

Soundhaspriority

unread,

May 22, 2012, 6:01:21 PM5/22/12

to

"Don Y" <th...@isnotme.com> wrote in message

news:jpgu5c$i35$1...@speranza.aioe.org...

> Hi Bob,
>
> On 5/22/2012 1:09 PM, Soundhaspriority wrote:
>> Humans can distinguish 10 microseconds OR LESS differential from ear to
>> ear. Applying Nyquist's criteria to the problem, this suggests 5
>> microseconds between channels, with an additional guard of 3
>> microseconds, = 2 microseconds.
>
> Do you have a reference for this?
>

Yes: http://www.zainea.com/jorissmithyin98.pdf

Bob Morein
(310) 237-6511

Les Cargill

unread,

May 22, 2012, 6:39:21 PM5/22/12

to

Don Y wrote:
> Hi,
<snip>

>
> To get around this, I use a time synchornization

RTP/RTCP?

> protocol that ensures each speaker shares a common
> sense of "what time is it, now". And, thus,
> knows when each packet *should* be "played"
> (pushed out the D/A converter) so that all speakers
> are "in sync".
>

Yeah, you need a jitter buffer.

> Of course, I can't get/guarantee *perfect* synchronization
> (an analog audiophile could trim his cables to identical
> lengths, if so inclined, so the propagation delays were
> perfectly matched). Though I can keep the dis-synchronization
> from "changing too rapidly". (changes in synchronization
> amount to instantaneous changes in *pitch*!)
>
> OK, with all that behind us, how can I come up with a
> target figure within which I should strive for "degree
> of synchronization"? Where does the lag/phase shift start
> becoming audible? Where does it start becoming a *nuisance*??
>

1 msec. This being said, the Haas limit is generally 10 to 20 msec.

http://en.wikipedia.org/wiki/Haas_effect

> [Keep in mind that I am not limited to one LEFT and one RIGHT!]
>
> Thx,
> --don

--
Les Cargill

PStamler

unread,

May 22, 2012, 7:01:57 PM5/22/12

to

The whole thing sounds like an invitation to Murphy's Law; there are
so many potential ways in which things can go wrong that it strikes me
as a counterproductive setup. Sometimes it makes sense to keep it
simple; in this case, running a single cable from the delay generator
to the speaker site, with the frequency-splitting done at the
receiving end of that cable, is probably the most trouble-proof way to
go. Particular circumstances might point in the other direction --
severe EMI, for example -- but a good transformer-isolated balanced
line can get around that too. For reliability, KISS.

Peace,
Paul

Don Y

unread,

May 22, 2012, 7:19:22 PM5/22/12

to

Hi Randy,

On 5/22/2012 1:55 PM, Randy Yates wrote:
> Don Y<th...@isnotme.com> writes:
>
>>>> I have designed "network speakers" that use ethernet
>>>
>>> There's your first problem. You're basing a phase-sensitive
>>> system on a communications mechanism that has no determinism
>>> in delivery of data. This is flawed from the outset.
>>
>> Ethernet *itself* has no determinism.
>
> Don,
>
> Did you mean "ethernet" or "TCP/IP"? I read ethernet in your
> original post and saw TCP/IP.

"ethernet" -- used loosely. TCP is too "heavy" a protocol.
Mine runs atop UDP -- *without* delivery guarantees (i.e.,
I have to deal with the possibility that data could "disappear";
though you can control the traffic on the physical network to
minimize the possibilities of this. E.g., I wouldn't run
WWW traffic over the same network! :> )

>> But, that doesn't
>> mean you can't layer deterministic protocols atop it.
>> For example, IEEE 1588:
>> <http://en.wikipedia.org/wiki/Precision_Time_Protocol>
>>
>> There, you can effectively trim your speaker cables
>> *virtually* to within a fraction of a microsecond of
>> each other.
>
> Establishing timing and ensuring the timely delivery of data are two
> different things. You *might* get this to work with enough caching
> to compensate for delayed packets (which is what many of the open
> source internet audio players do), but then you've got, er, delay
> to deal with.

My caches are very small. This is to allow the devices to
be physically small and dirt cheap (e.g, the initial design
has the electronics for each "about the size of an ice cube").

But, you could scale the design *up* to incorporate larger
caches (as would be the case in a "professional/commercial"
environment).

> It just sounds to me, at first blush, that using 802.11.whatever and
> TCP/IP is the wrong physical layer solution for your problem. On the
> other hand, developing your own deterministic physical layer (perhaps
> something based on the TI CC1100 transceivers and SimpliciTI?) is no
> small job.

I've leveraged "cheap/ubiquitous" hardware and fabric to make
everything affordable. Again, I control the deployment.
For example, it wouldn't make any sense to also let web traffic
run on the same fabric that you were using to distribute
audio in the "amphitheater" example I mentioned elsewhere!
(would you couple *video* onto your "speaker wires" in your
living room just to save the cost of a separate video cable??)

But, its a different approach than an "internet audio player"
running on a PC -- where you have gobs of memory and gobs of
CPU ... AND DON'T HAVE TO BE SYNCHRONIZED TO THE AUDIO
DELIVERED TO THE GUY IN THE NEXT CUBICLE (even if you are
both "listening" to the same "program material").

Don Y

unread,

May 22, 2012, 7:21:14 PM5/22/12

to

Hi Bob,

On 5/22/2012 3:01 PM, Soundhaspriority wrote:
>
> "Don Y" <th...@isnotme.com> wrote in message
> news:jpgu5c$i35$1...@speranza.aioe.org...

>> On 5/22/2012 1:09 PM, Soundhaspriority wrote:
>>> Humans can distinguish 10 microseconds OR LESS differential from ear to
>>> ear. Applying Nyquist's criteria to the problem, this suggests 5
>>> microseconds between channels, with an additional guard of 3
>>> microseconds, = 2 microseconds.
>>
>> Do you have a reference for this?
>>
> Yes: http://www.zainea.com/jorissmithyin98.pdf

Excellent, thanks! I'll try to take a look through
it later this evening...

My problem is having an understanding of the implementation
technologies but not of the "application technology" (psychoacoustics,
in this case).

Don Y

unread,

May 22, 2012, 7:34:21 PM5/22/12

to

Hi Scott,

On 5/22/2012 1:46 PM, Scott Dorsey wrote:
> In article<jpgtem$g5d$1...@speranza.aioe.org>, Don Y<th...@isnotme.com> wrote:
>>
>> Chances are, the loudspeakers are triamped (or more?)
>> so that the high frequency components are processed
>> and amplified independent of the low/mid frequencies.
>> The signals delivered to stacks "out in the audience"
>> might be delayed so that their sound coincides with
>> the wavefront arriving from the main stacks at the stage.
>>
>> Instead of running "miles" of balanced cable around to
>> the various amps and processors -- or, miles of
>> *high power* out to remote stacks -- you, instead,
>> run a network drop to the location of each "bit of
>> kit". Each then fed via something akin to the
>> "network speaker" (except without the *speaker*! :> ).
>>
>> Now, you have dozens (or more) of signals each with
>> some potential skew with respect to the others "active"
>> at that time. I.e., the *top* "half" of the left stack's
>> tweeter array might be driven from *this* device
>> while the bottom half is driven by another -- potentially
>> having slightly different characteristics (by intent).
>
> Yes, this is why the protocols designed for these applications are
> deterministic and has deadlines built into packets so that latency is
> both low AND constant.

But you don't "need" either of those (having them makes life
easier -- but, doing so at the cost of "special hardware"
works against you in terms of cost, etc.

E.g., if you are playing stored material, you don't care if the
latency is 5 ms or 5s! Nor do you care if it is constant -- as
long as your buffer never "runs dry".

> Digigram Ethersound and Cobranet are the two most common protocols used
> for the applications today.
>
> Frankly, I don't think it's any real savings over running analogue audio
> pairs, until you get to large numbers of channels.

Exactly. Or, if you want to be able to perform certain "common"
processing in those nodes. E.g., if you can handle the time
synchronization issue, then its an obvious extension to be
able to introduce a deliberate offset to the local clock
wrt the "global clock". I.e., you can now delay the reproduction
of the audio signal to be emitted "100 ft from the proscenium"
just by specifying a delay corresponding to the speed of sound
over that distance at that altitude/atmospheric pressure.
(you can even have the devices, themselves, assist you in
empirically tuning that delay "on site").

It also lets you distribute the amplification hardware while
keeping the "mixer/preamp" in a shared (virtual) location.
E.g., If I move into the kitchen, I can arrange for the
music to which I am listening to follow me there. And,
the audio for the television program my other half is watching
to take its place in the living room, etc.

Doing this with "wired" speakers requires a sizable
equipment closet and expensive kit therein that can easily
be controlled automatically.

In a commercial environment, it's easier to put a "smart
speaker" up in the ceiling behind a grill than to have
to run a 70V line everywhere. How do you handle "zones"
so you can break in and "page" someone in the hardware
department without having the page echo through the
entire store (esp if you expect the individual in question
to be *in* the hardware department)?

Don Y

unread,

May 22, 2012, 7:41:54 PM5/22/12

to

Hi William,

On 5/22/2012 1:17 PM, William Sommerwerck wrote:
> Acoustically, one foot equals 0.9 milliseconds. I don't know for sure, but
> I'm pretty certain that if the relative arrival times bounced around in the
> +/- 0.9 millisecond range, you'd almost certainly hear it as an unstable
> image -- and you might also hear phasing effects at lower frequencies.

The delay would be constant. Or, varying at such a slow rate
("subsonic" -- fractional Hz) as to be imperceptible.

> I'd try to keep the maximum offset to no more than 0.09 milliseconds -- 90
> microseconds, which is equivalent to 1.2 inches (30mm).

Piece of cake. I can get a few microseconds without breaking a sweat.
Getting down to a handful of *nanoseconds* requires a bit more kit!

Don Y

unread,

May 22, 2012, 7:45:43 PM5/22/12

to

Hi Les,

On 5/22/2012 3:39 PM, Les Cargill wrote:

>> To get around this, I use a time synchornization
>
> RTP/RTCP?

I use a modified version of PTP. Since I don't have
to "co-operate" with other devices, I can tweek the
protocol to better fit my needs and resources.

E.g., such an application could be easily attacked by
spoofing the timing protocol and coercing one (or more) nodes
to deliberately get out of sync with their peers. Or,
to convince them that they *are* so far out of sync that
they should voluntarily mute their outputs (to avoid
clashing with their peers audibly). Etc.

>> protocol that ensures each speaker shares a common
>> sense of "what time is it, now". And, thus,
>> knows when each packet *should* be "played"
>> (pushed out the D/A converter) so that all speakers
>> are "in sync".
>
> Yeah, you need a jitter buffer.
>
>> Of course, I can't get/guarantee *perfect* synchronization
>> (an analog audiophile could trim his cables to identical
>> lengths, if so inclined, so the propagation delays were
>> perfectly matched). Though I can keep the dis-synchronization
>> from "changing too rapidly". (changes in synchronization
>> amount to instantaneous changes in *pitch*!)
>>
>> OK, with all that behind us, how can I come up with a
>> target figure within which I should strive for "degree
>> of synchronization"? Where does the lag/phase shift start
>> becoming audible? Where does it start becoming a *nuisance*??
>
> 1 msec. This being said, the Haas limit is generally 10 to 20 msec.
>
> http://en.wikipedia.org/wiki/Haas_effect

This seems too big -- given the followup example I posted
to my original post.

Don Y

unread,

May 22, 2012, 7:52:27 PM5/22/12

to

Hi Paul,

Welcome to the future! :> This project is the outgrowth of a
client's need for an ULF distribution system that currently requires
shipping thousands of pounds of cables to far-off locations.

After proposing it, I noticed that I could enhance it (improve
the accuracies and frequency response) so that I could use it
at home to distribute audio throughout the house -- from a central
server ("I want to listen to The Nightly News, now, in the garage
while I am working on the car. And, as I head off to the bathroom,
I'd like that to follow me there.")

From there, it's obvious that tweeking it a bit more would lend
it to application in still other applications.

William Sommerwerck

unread,

May 22, 2012, 7:56:54 PM5/22/12

to

> Now, rethink the issue (please) :>

Given the description of what you're actually doing, I don't see where even
10 ms skew would make much difference.

Les Cargill

unread,

May 22, 2012, 8:28:53 PM5/22/12

to

Don Y wrote:
> Hi Les,
>
> On 5/22/2012 3:39 PM, Les Cargill wrote:
>
>>> To get around this, I use a time synchornization
>>
>> RTP/RTCP?
>
> I use a modified version of PTP.

yeah, I saw downthread after replying.

> Since I don't have
> to "co-operate" with other devices, I can tweek the
> protocol to better fit my needs and resources.
>
> E.g., such an application could be easily attacked by
> spoofing the timing protocol and coercing one (or more) nodes
> to deliberately get out of sync with their peers. Or,
> to convince them that they *are* so far out of sync that
> they should voluntarily mute their outputs (to avoid
> clashing with their peers audibly). Etc.
>

--
Les Cargill

Soundhaspriority

unread,

May 22, 2012, 8:39:42 PM5/22/12

to

"Don Y" <th...@isnotme.com> wrote in message

news:jph71c$8k4$2...@speranza.aioe.org...

Not a problem. Play it safe and over-design. A lot of folks would like to
get rid of the cables.

Bob Morein
(310) 237-6511

MarkK

unread,

May 22, 2012, 10:26:03 PM5/22/12

to

ok
others have said sync within 1 to a few ms is sufficient, I agree.
(note 1 ms is about = to 1 foot of sound travel in air)

you say you can time stamp the packets so they are "displayed" at the right
time and in sync

and you can have sufficent buffer so you don't run dry

but that brings up 2 questions

1) what will you use for a common time base at the various nodes.. GPS??

2) if you have sufficent buffer, you may have excessive latency if this is a
live sound reenfrcment situation.

you might want to invsetigate the MP3 system for video and audio (transports
stream and program stream) , they include time stamp the packets so the
video and audio are "displayed" in sync with each other even though they
travel via different packets

this problem sounds like an analog radio link would be a better solution..

I have seen a big sound (and video) system at the Washingon reflecting pool
with multiple stacks all fed from a radio link (i could see the antenas) and
each had a delay so where-ever you were along the length, the sound from all
the stacks arrived at the same time...

When you were far from the stage, the sound and video were out of sync, but
that is the nature of the beast.

Mark

Arny Krueger

unread,

May 23, 2012, 7:40:44 AM5/23/12

to

"Randy Yates" <ya...@digitalsignallabs.com> wrote in message
news:87r4uc3...@randy.site...

> Don Y <th...@isnotme.com> writes:

>> I have designed "network speakers" that use ethernet

> There's your first problem. You're basing a phase-sensitive
> system on a communications mechanism that has no determinism
> in delivery of data. This is flawed from the outset.

Here we see what for all the world looks like ivory tower black/white
excluded-middle pedantry. It is the same sort of thinking that *proved* that
flight is impossible, back in the 1800s. ;-)

Reality is that picoseconds-level determinism is usually difficult or
impossible with a network like TCP/IP Ethernet, but "get it there in the
next half hour" determinism is frequently observed. ;-)

As always, its all about quantification.

Reality is that the human ear has tremendous acceptance of time shifts
within a potentially useful range.

The actual application could easily be a gigabit LAN that might even be
totally dedicated to just running a few (8 or less) speakers at modest
(44/16) bitrates.

In the modern context, the protocol may even be synchronous, as "Ethernet"
in 2012 means just about anything digital that runs over CAT-5 or CAT 6.

CSMA/CD can be remarkably deterministic if there is only one data source,
which might be the case with this application.

Randy Yates

unread,

May 23, 2012, 9:54:07 AM5/23/12

to

Arny, did you miss the post where I stated I thought Don was speaking of
"WiFi" and not simply "ethernet?" And where I talked about caching? And
where caching isn't feasible in this application?

Scott Dorsey

unread,

May 23, 2012, 10:45:27 AM5/23/12

to

In article <jph8ru$e4r$1...@speranza.aioe.org>, Don Y <th...@isnotme.com> wrote:
>
>Welcome to the future! :> This project is the outgrowth of a
>client's need for an ULF distribution system that currently requires
>shipping thousands of pounds of cables to far-off locations.

Well, there are plenty of those off the shelf, using ethernet, using
circuit-switched protocols on cat-5, and using fibre optics.

There are also some RF solutions available for delayed speaker stacks
at big outdoor concerts.

Personally I recommend the fibre optic systems if you are running a lot of
channels; I can carry half a mile of tactical fibre on my back and pay it
out as I walk. Copper is much, much heavier, even cat-5. You can run a
tank or a lawnmover over the tactical fibre with no problem.

>After proposing it, I noticed that I could enhance it (improve
>the accuracies and frequency response) so that I could use it
>at home to distribute audio throughout the house -- from a central
>server ("I want to listen to The Nightly News, now, in the garage
>while I am working on the car. And, as I head off to the bathroom,
>I'd like that to follow me there.")

70V system is a better application for that sort of thing.

Scott Dorsey

unread,

May 23, 2012, 2:30:36 PM5/23/12

to

In article <jphhqn$en5$1...@dont-email.me>, MarkK <mako...@yahoo.com> wrote:
>and you can have sufficent buffer so you don't run dry
>
>but that brings up 2 questions
>
>1) what will you use for a common time base at the various nodes.. GPS??

There are a couple of ways to do this. Gibson has a patent on one of them.
In general you can have local free-running clocks that you synch up every
once in a while and get away with it because crystals today are very stable.

I do prefer TDR solutions like the Fiberplex Lightviper just for elegance.
The problem with these systems is that you can't run them over existing
ethernet infrastructure and you can't use cheap industrial switches for
routing, etc. But, on the other hand, that's also a huge advantage.

>2) if you have sufficent buffer, you may have excessive latency if this is a
>live sound reenfrcment situation.

Yes. It's not an easy problem, BUT it's a problem that has been solved already
by several people. The AES has a committee that is already arguing about
trying to standardize audio over ethernet. The LAST thing we need is another
competing standard.

> this problem sounds like an analog radio link would be a better solution..
>
>I have seen a big sound (and video) system at the Washingon reflecting pool
>with multiple stacks all fed from a radio link (i could see the antenas) and
>each had a delay so where-ever you were along the length, the sound from all
>the stacks arrived at the same time...

National Events has a system like this which gets used for the Fourth of
July and other events on the mall. Frequency coordination is a major pain
in the neck. They use a Sennheiser IFB system with high gain antennas and
something else to prevent jamming.

>When you were far from the stage, the sound and video were out of sync, but
>that is the nature of the beast.

In the modern line array era, it's possible to arrange things so that there
is a front coverage area and a rear coverage area that are both sending sound
out at the same time, but doing this means making absolutely sure that there
is no leakage from the front area speakers into the rear area and that
means having a big space in-between them which is devoid of people. NASA did
this for shuttle launches for a while.

Frank

unread,

May 23, 2012, 9:15:18 PM5/23/12

to

On 23 May 2012 14:30:36 -0400, in 'rec.audio.pro',
in article <Re: Phase/skew sensitivity>,

klu...@panix.com (Scott Dorsey) wrote:

>The AES has a committee that is already arguing about
>trying to standardize audio over ethernet. The LAST thing we need is another
>competing standard.

Just curious, Scott, but is this AES effort *in addition to* the work
on AVB (Audio Video Bridging) already done by the IEEE?

Reference:

Audio Video Bridging - Wikipedia, the free encyclopedia
http://en.wikipedia.org/wiki/Audio_Video_Bridging

It seems to me that if the AES is going to come up with its own
standard - one that's different from IEEE AVB - then yes, it's
probably one standard too many.

OTOH, if the AES is merely "adopting" IEEE AVB as its own standard,
much perhaps in the way that the ITU-T adopts previously-published
ISO/IEC MPEG standards, that that's a different story and isn't
necessarily bad at all.

--
Frank, Independent Consultant, New York, NY
[Please remove 'nojunkmail.' from address to reply via e-mail.]
Read Frank's thoughts on HDV at http://www.humanvalues.net/hdv/
[also covers AVCHD (including AVCCAM & NXCAM) and XDCAM EX].

Don Y

unread,

May 24, 2012, 3:29:10 AM5/24/12

to

Hi Scott,

On 5/23/2012 7:45 AM, Scott Dorsey wrote:
> In article<jph8ru$e4r$1...@speranza.aioe.org>, Don Y<th...@isnotme.com> wrote:
>>
>> Welcome to the future! :> This project is the outgrowth of a
>> client's need for an ULF distribution system that currently requires
>> shipping thousands of pounds of cables to far-off locations.
>
> Well, there are plenty of those off the shelf, using ethernet, using
> circuit-switched protocols on cat-5, and using fibre optics.

Do they allow a foreign application to COEXIST on the medium
with the *signal*? I.e., if I want to accompany the signal
with command and control data, can I do so? Can I control
the phase and magnitude of the signal at *each* receiving
node with respect to a common time reference? Can I turn
*off* the signal and just use the medium for data transport?

Can I send (a different) signal back up the cable at the same
time?

Can I route power over the same "cable" (or, do I need
to carry power *with* each of those end-node devices?) Is
it as *inexpensive* (or, has someone come up with a
scheme they can protect with a patent)? Can I buy
additional supplies "locally"? ("Hmmm, I need another
500 ft of cable") Can I *discard* those (inexpensive)
supplies instead of having to bring them back with me
(i.e., get them through customs, *again*?)

> There are also some RF solutions available for delayed speaker stacks
> at big outdoor concerts.

There are lots of solutions to *individual* problems that I
posed. OTOH, the solution I proposed can be applied to lots
of *applications*!

Should I be running optical fiber in the walls and ceilings of
my home to get audio (and video) to the various locations that
I want it? Do I then run a *separate* pair of copper conductors
alongside the fibre so I can distribute *power* to each of
those nodes? Or, glue wall-warts around the periphery of
the room to handle each of these devices?

Should I rely on *anything* RF based in an environment already
known to be polluted with wireless transmissions? Your WiFi
has to share a set of channels cooperatively with your neighbor
(and your microwave oven). In the event of interference, those
packets can safely be replayed (many!) seconds later. Would you
tolerate your "HiFi" going silent while it recovers from a
series of dropped packets? What about your TV -- the image
freezing each time the network can't deliver enough packets
*before* they are "needed"?

What happens when I don't want a display or speaker in a
particular location but, instead, want a telephone? Or,
a PTZ camera? Or, a bit of "automation" to, for example,
allow the HVAC to be monitored and controlled? Or, <gasp>
a computer?? :> And, would you want all that traffic on
a "remotely accessible" medium (i.e., the neighbor sitting
in his living room snooping on or interfering with "your"
traffic)?

That's the difference between the approaches cited -- mine
fits *each* of these application domains. And, it's dirt cheap!

> Personally I recommend the fibre optic systems if you are running a lot of
> channels; I can carry half a mile of tactical fibre on my back and pay it
> out as I walk. Copper is much, much heavier, even cat-5. You can run a
> tank or a lawnmover over the tactical fibre with no problem.
>
>> After proposing it, I noticed that I could enhance it (improve
>> the accuracies and frequency response) so that I could use it
>> at home to distribute audio throughout the house -- from a central
>> server ("I want to listen to The Nightly News, now, in the garage
>> while I am working on the car. And, as I head off to the bathroom,
>> I'd like that to follow me there.")
>
> 70V system is a better application for that sort of thing.

So, a 70V distribution amp feeding each "small area"?
I.e., one for the kitchen, another for the dining room,
another for each bedroom, living room, garage, back porch,
etc.?

After all, it's unlikely that the audio signal that I
want distributed to my home office will be the same as the
audio signal that someone else might want to enjoy in
the dining room, etc. Likewise for someone watching TV
in the den.

And, how do we get from the media server to the MANY
distribution amps? Lots of sound cards in a server,
each tethered to a particular amp (feeding a particular
area)? Then, have that media server moving program
material to each sound card, mixing in secondary
audio streams, etc.?

If you think in terms of old technology, you tend to get
all the capabilities (limitations!) of that old technology!

I expect to be able to walk through the house and have
my "program material" follow me (subject to constraints
of other people present). No, I don't want to have to "pause"
the TV and "resume" the program on some *other* TV. I want
the material to *follow* me of its own accord. Without
me having to chase down the "remote" for this particular
TV, "stereo", etc.

If I'm in the bedroom, I want the phone to ring *there*,
not in the kitchen. If someone comes to the front door,
and I'm in the garage, it would be pretty useless for the
doorbell to sound in the *house* since I won't hear it.
Likewise, I would hate like hell to have to trek to the
front door (imagine being in the middle of replacing the
lifters) while briskly cleaning the grease from my hands...
only to discover it's some bozo trying to sell me lawn care
services!

Think about how you would do these sorts of things with
"old technology". Then, think of how much easier it is to
do when you're just routing "homogenous" information
over a shared medium.

[Trust me, I've thought about this for a LONG time! :> ]

Don Y

unread,

May 24, 2012, 3:28:37 AM5/24/12

to

Hi Scott,

On 5/23/2012 11:30 AM, Scott Dorsey wrote:
> In article<jphhqn$en5$1...@dont-email.me>, MarkK<mako...@yahoo.com> wrote:

>> 2) if you have sufficent buffer, you may have excessive latency if this is a
>> live sound reenfrcment situation.
>
> Yes. It's not an easy problem, BUT it's a problem that has been solved already
> by several people. The AES has a committee that is
> already arguing abouttrying to standardize

--^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Where are the inexpensive, ubiquitous products? :>

> audio over ethernet. The LAST thing we need is another
> competing standard.

Standards (and compliance with them) make sense if:
- interoperability is a concern
- the standard, plus whatever tweeks your application needs,
addresses all of your design criteria (size, cost, performance,
reliability, functionality, etc.)

There are standards for how to organize books in a library -- yet
I suspect few of us keep our *personal* bookshelves thusly ordered.
Why bother? We aren't catering to "visiting librarians"!

There are standards dictating the roles and colors of each conductor
in a POTS drop. Yet, these have changed and/or been ignored over
the years -- to the point that you hardly know *what* to expect
if you peek inside your RJ11! "Who cares?! The CO doesn't
care if you swap tip and ring!"

Look at how RS-232 has "changed" over the years -- from ~20
defined signals to as few as *two* (in a simplex application).
Even the *roles* of various signals have changed over the
years (e.g., RTS used to initiate a transmission over the local
TxD; now it is commonly used as a "pacing" signal notifying the
remote end of the link when the local node is "Ready To Receive"!)
How much of a "standard" is something wherein the very functions
implemented on its signals are subject to change??

[Of course, this came about because the standard's original role
evolved as other "users of the Standard" bastardized it to fit
*their* needs. I.e., when the costs of using the standard as
originally intended proved too great for their application.
(having spent some time designing telecom equipment, I can
attest to how many wacky variations of "RS232" *big* equipment
makers rationalized! Weird signals used for handshaking and
pacing: RLSD?? Oddball voltage supplies on pins: "Aw, no one
ever uses SRTS anyway..." etc.)]

Cisco opted to use the "unused pairs" on 10/100Mb networks for
a crude, *proprietary* PoE capability before IEEE 802.3af-2003
was formalized. The "existing standard", at the time, didn't
suit their needs!

"Standards are great! Everybody should have one! :> "

When I decided I was tired of all the various bits of audio and
video kit lying around the house and never having the material
that I wanted *where* I wanted it, it was only natural to think
about a media server. They exist. Just *buy* something and
plug it into the network! "Instantly", the problem will be
solved!

A bit of research and the applicable "standard" was UPnP.
*Any* device can pull whatever it wants off a UPnP server!
Gee, what more could you ask for!

But, wait. Now I am replacing the existing "HiFi"s with
a *different* box. But, it's *still* a box. Still wants
to *sit* someplace ACCESSIBLE so I can push its buttons,
point *its* remote at it, view its *display*, etc. And,
it still needs to *feed* an amplifier. And the amplifier
still needs to be tethered to speakers. And it's yet
another power cord to plug in. Yet another box to
configure. No doubt it will need software updates, etc.

"What am I gaining?" I'm just REPURCHASING all my audio kit
ALL OVER AGAIN (like buying a cassette desk to replace the
turntable; a CD-player to replace the cassette deck; a PMP
to replace the CD player; etc.) And, all I've really gained
for this effort is access to my media library regardless of
location (as long as I purchase one of these boxes for *each*
room in which I might want to hear music!)

I've still got lots of matte black boxes that need to be
dusted every few days. Lots of LED/VFD/PGD displays eerily
lighting up the rooms at night. Even *more* remote controls
to keep track of, etc. I wonder how many of them will be
"clever" and display the "current time" when idle? I wonder
how many of them will use a time service to keep that time
*current*? Will I be running around the house after each
power outage resetting a dozen *more* clocks??? :<

And, if I want to add smarts to the system (i.e., so the music
"follows me" as I move around the house; so the music is
attenuated when I answer the phone; so the "doorbell annunciator"
can be mixed in with the audio signal; etc.), I now have to
hope the folks who made these boxes did so with external command
and control capabilities in mind. And, that it actually *works*
as advertised, is "secure" from hacking, will *keep* working 5/10/15
years from now, etc. If I start a "play list" on *all* of these
boxes simultaneously (think: party), will they all be in sync?
Will they *remain* in sync?

Even without doing much research, it seemed pretty obvious that
my wish list was going to far exceed the capabilities of anything
available! (can you spell "EXPENSIVE custom system"?)

When the ULF distribution system project came along, it seemed
like kismet! A "no brainer"! Sure, there are all sorts of
audio/video containers out there. That's why COTS media servers
and "players" are so costly -- they have to do too much! And,
they are designed as if *they* were controlling the media
stream (it's a "pull" model: they fetch what you want to hear
from the server) which means it will always be difficult getting
two or more of them to "play nicely", together.

If, instead, you adopt a *push* model, you eliminate the need
for a display and controls. Something *else* decides what the
node will "play" and it just sits and waits for "audio" to
come down the pipe! Once the display and controls are gone,
you can change the packaging to something less conventional:
"Hey, why not make it really *tiny* and cram it into a
standard 1 gang electrical box? Something about the size
of an ice cube. Put a wall plate on that Jbox that has a
couple of binding posts to attach the loudspeaker (or an
RCA/SPDIF jack to connect to an amplifier) and *hide* it
in the wall! Don't even need to put it in a pretty *case*
since it is never going to be "exposed"! No more clutter!
No more dusting! Heck, put a couple in the ceiling so
you don't have to have loudspeakers sitting in various
corners of the room -- where would I put them in the
*kitchen*, otherwise? On the *counters*???"
And, since the device only *takes* what you *give* it (and not
the other way around), you can decide to only *give* it files
in a single format. No need to put a slew of different CODECs
into the device. And, have to update them each time someone
comes up with another CODEC-du-jour. Instead, handle that on
the server end where you (conceptually) have more resources!

Ah, but wait! Why use *any* existing "file format"? The network
is just acting as a speaker cable. Pick whatever scheme you want
to push bytes down it. As long as your protocol will operate
within standard fabric, you can leverage the commodity nature of
those items and repurpose them for your own needs! So, we can
pick a suitable CODEC ("standard") and *bend* it to fit our needs.
No reason to include support for a variety of different sample
rates, data formats, metadata, etc. So, strip that cruft out of
the standard...

Deliver power to the devices over the network? OK, here's a
standard (PoE). use that! Ah, but it doesn't allow me to
*control* that power delivery -- it's intended for nodes to
have the power supplied all the time (I surely don't want
all these devices powered up just because someone failed to
consider the possibility of powering them *down*!). And,
PoE only recognizes a device when initially connected. How do
I trick it into supplying some idle current that a "sleeping"
device could use with which to signal its desire to be powered
up? E.g., if I put a VoIP *phone* on a PoE-capable network
drop, it would be nice to only power up the phone when someone
takes it offhook! I need a comm back-channel that runs in the
"absence" of power...

OK, so we'll have to bend *that* standard to coerce it to
provide the features we want.

Let's see... need to ensure each node has a reference timebase
to synchronize against. And, it needs to be very fine-grained.
NTP is out. But, PTP should work! But, PTP has capabilities in
it that really make little sense in this application. How could
the notion of the master (clock) node possibly change while the
system is operating? Is, suddenly, the network speaker in the
dining room going to be "elected" as the new master clock?
And, the media server will have to synchronize to *it*??
"Yeah, right!" (not)

OK, so we'll bend *that* standard to remove all that extra
unnecessary cruft...

So, you end up with the benefits of lots of "standards" and
the engineering thought that went into them -- without the
*costs* of all of the fluff they imply.

Sure, I can't buy a piece of third party kit and expect it
to talk DIRECTLY to my network speakers, individually. But,
it wouldn't have been able to speak to those speakers
regardless -- it would always have had to speak to whatever
was *driving* them. Whatever piece of *digital* kit that was
sitting in the middle. In my case, that's the media server!
*It* can bear the costs of any/all of that "interoperability"
freeing the "speakers" to concentrate on reproducing *sound*.

If the ear cushions on my headphones wear out, I don't expect
to be able to buy "standardized" ear cushions to replace them.
The headphones are standardized AT THE PHONE PLUG. Anything
*within* the headphone assembly is at the discretion of the
manufacturer -- they could opt to use left-handed screws,
yellow wires (instead of the more common BLACK) for the
"ground/return", an L-Pad for a "volume control", etc.

The same is true of my multimedia system: standards apply
at the *outside* interfaces, only!

Don Y

unread,

May 24, 2012, 3:45:37 AM5/24/12

to

Hi Mark,

On 5/22/2012 7:26 PM, MarkK wrote:
> ok
> others have said sync within 1 to a few ms is sufficient, I agree.
> (note 1 ms is about = to 1 foot of sound travel in air)

I'm not sure I agree. Imagine driving a LF amp from one such device
and a HF amp from another. This effectively displaces one or the
other set of drivers (loudspeakers) a foot wrt the other set.

But, I don't have the skills to make that sort of evaluation,
technically (I *did* when I was fresh out of school. But, those
brain cells undoubtedly now contain some bit of useless trivia...)

> you say you can time stamp the packets so they are "displayed" at the right
> time and in sync
>
> and you can have sufficent buffer so you don't run dry
>
> but that brings up 2 questions
>
> 1) what will you use for a common time base at the various nodes.. GPS??

No. The "time" is defined by the server. It need have no bearing
on "current (i.e., wall clock) time". All it needs to be is
consistent across all nodes in the network. For example it
could be thirds of microseconds from "sometime yesterday morning"!

Synchronizing that "current time" is done dynamically with a
servo loop that measures transport delays across the network
(as viewed by each node) and uses these figures to offset the
"actual local time" at which each packet is received (locally).

[This explanation gets tedious because there are several
*different* notions of "time" at play. So, excuse any
hand-waving I use to avoid getting *too* far into the
murky details...]

An unsolicited message arrives at a particular node from the
"master clock" (the master clock is a node on the network that
is deemed to have *The* best notion of system time). Call the
receiving node a "slave", if you wish.

The slave makes a note of the LOCAL TIME (i.e., using a timebase
that *it* maintains, "*in* itself") at which the message was
received. In this message is a notice saying "the current System
Time is 12:34:56.7777" (or whatever). The slave makes a note of
this, as well.

The slave then sends a request message to the master node
in effect saying "and, what time is it *now*?". But, it
records the precise *local* time at which it sends this
message!

[These timestamps need to *correlate* well with the actual
time the message "hits the PHYSICAL wire". I.e., the
network protocol stack and device driver needs to be designed
with this in mind]

The master node replies, "when I received your request, it was
12:34:56.8888" (or whatever). The slave node receives this reply
at some particular local time -- possibly much *later* (not
important!).

Now, the slave node knows that it received the first (sync)
message at a particular local time. It also knows what the
"official" system time was when that message was *sent*. The
difference between this time and the system time reported *in*
the initial message represents the "skew" between the system
time and the local time PLUS THE "real" TIME IT TAKES FOR THE
MESSAGE TO TRAVEL DOWN THE WIRE.

Similarly, the difference between the slave node's local time
at the instant it sent the subsequent *request* for the "more
recent" time and the time that the master node (eventually!)
reported *receiving* it also represents the skew between the
system and local times (though, techically, this is "-skew")
PLUS THE TRANSPORT DELAY.

Combine these two differences and the transport delay cancels
out leaving "twice the skew". The slave node then knows
that it needs to apply a correction to its local time to
cause the skew to be driven to "0", eventually (you don't
do it instantaneously).

This is predicated on the transport delay from A to B being
the same as the delay from B back to A! (which you can
ensure with careful system design).

A side effect of this is the master knows just how out-of-sync
the slave node is! That information is leaked in the
protocol without having to *add* some mechanism to have the
slave (*each* slave!) explicitly REPORT the "observed skew"
(recall this is likely to be different for each slave node!).

So, the master knows when things are "not yet synced up".
And, can also see when "things seem to be falling apart"!
It can use this information to adjust how often it sends
the sync messages. It can also use it to shut down
nodes that appear to be malfunctioning (either by commanding
them to be silent *or* the more draconian step of removing
power from them!)

> 2) if you have sufficent buffer, you may have excessive latency if this is a
> live sound reenfrcment situation.

In my case, not live sound. To do so, you need to have stricter
guarantees on the reliability of the network itself (not something
I need in my "home" environment).

Note that you can increase latency -- within reason -- depending on
what external events with which you are trying to remain synchronous.
E.g., I am now working on a "network display" that allows video to
be similarly "broadcast"/distributed. Obviously, you would
want to ensure the video and audio programs are synchronized.
Much the same as ensuring the audio was synchronized with a
live performance.

However, as the venue grows in size, you realize the audio
and "video" will never be synchronized for all viewers
(unless they are all wearing headphones!) So, it becomes a
matter of degrees -- how unsynchronized can you tolerate?

[Of course, for recorded video, it is easy to keep the two
synchronized -- just ensure each stream is delayed by the same
amount!]

> you might want to invsetigate the MP3 system for video and audio (transports
> stream and program stream) , they include time stamp the packets so the
> video and audio are "displayed" in sync with each other even though they
> travel via different packets
>
> this problem sounds like an analog radio link would be a better solution..

You have the same sorts of problems. Do you share a single
broadcast frequency among all nodes? So, a node 100 ft
further from the transmitter sees its signal ~100ns later
than one proximate to the transmitter. How do you handle
different propagation delays through the radios, etc.?

How do you keep your neighbor from interfering with your
broadcasts? Will you have enough bandwidth (given that
your neighbor may have a similar system -- or, some other
transmitters may share teh same frequency allocation)
to support a number of these "programs" concurrently?

> I have seen a big sound (and video) system at the Washingon reflecting pool
> with multiple stacks all fed from a radio link (i could see the antenas) and
> each had a delay so where-ever you were along the length, the sound from all
> the stacks arrived at the same time...

Yes -- but you have to ensure the stacks "further from the source"
can't be heard *by* you. Otherwise, the wavefront is horribly
(and perceptibly) delayed when it arrives at your ears (*after*
the intended wavefront has *passed*!)

> When you were far from the stage, the sound and video were out of sync, but
> that is the nature of the beast.

Yes. So, if you add a fixed delay to everything, you just force
the closest viewers to see a noticeably larger skew between
these two events/activities than they would, otherwise. Folks
further back would see the similar additional offset -- but,
they were already doomed to seeing a large offset, regardless!

Don Y

unread,

May 24, 2012, 6:22:58 AM5/24/12

to

Hi Bob,

On 5/22/2012 5:39 PM, Soundhaspriority wrote:

>>>>> Humans can distinguish 10 microseconds OR LESS differential from
>>>>> ear to
>>>>> ear. Applying Nyquist's criteria to the problem, this suggests 5
>>>>> microseconds between channels, with an additional guard of 3
>>>>> microseconds, = 2 microseconds.
>>>>
>>>> Do you have a reference for this?
>>>>
>>> Yes: http://www.zainea.com/jorissmithyin98.pdf
>>
>> Excellent, thanks! I'll try to take a look through
>> it later this evening...
>>
>> My problem is having an understanding of the implementation
>> technologies but not of the "application technology" (psychoacoustics,
>> in this case).
>
> Not a problem. Play it safe and over-design. A lot of folks would like
> to get rid of the cables.

Getting the "skew" down is not very difficult in terms of
technology/resources. But, beyond a certain point, you
start running into the idiosyncracies of the network
*fabric*, itself. I.e., characteristics of the particular
network switches through which packets travel, number and
placement of such switches, etc. Or, you have to declare
rules on which switches can be used, how they can be
deployed, etc.

I've put that on hold while I rethink some architectural
issues of the relationship of nodes and server(s) that
have cropped up when I introduced the *video* counterparts
to the system. "Who talks to what" has significant
impact on the server-side resources and the client-side
implementations.

For example, serving video tends to require far more secondary
storage (disk space) than audio -- video libraries being
larger than audio ones. So, while it might be practical to
use a solid state (read only!) medium to store an entire
audio library AND RUN THE AUDIO SERVER 24/7 without an
annoying fan, etc., doing the same with a video library
would be impractical. You would *want* to power it
down when it wasn't actively serving video!

This suggests audio and video would probably like to
originate from different physical servers. That *could*
mean the audio portion of a "video" source material would
need to be delivered to the audio server for distribution
to the "network speakers" while the video is delivered
directly by the video server to the "network displays".

This might not be an effective implementation :-/

Scott Dorsey

unread,

May 24, 2012, 8:40:30 AM5/24/12

to

Frank <fr...@nojunkmail.humanvalues.net> wrote:
>On 23 May 2012 14:30:36 -0400, in 'rec.audio.pro',
>in article <Re: Phase/skew sensitivity>,
>klu...@panix.com (Scott Dorsey) wrote:
>
>>The AES has a committee that is already arguing about
>>trying to standardize audio over ethernet. The LAST thing we need is another
>>competing standard.
>
>Just curious, Scott, but is this AES effort *in addition to* the work
>on AVB (Audio Video Bridging) already done by the IEEE?

YES! Isn't it great to have so many different standards?

>It seems to me that if the AES is going to come up with its own
>standard - one that's different from IEEE AVB - then yes, it's
>probably one standard too many.

Well, we already have maybe a dozen different actual systems that are
being marketed right now for different applications. The sound reinforcement
guys mostly like Cobranet, and a lot of the digital consoles have Cobranet
cards that plug right into them.

The IEEE standard is a nice thing, but Cobranet has been around for nearly
a decade and it's become very common on large touring gigs.

Of course, the installed sound people use a different standard altogether.
And some of the touring guys like the Roland Digital Snake System even though
it does not easily interoperate with non-Roland consoles.

My guess is that in the end after it all falls out, the AES standard will
be a Cobranet spec with an AES rubber stamp on it, but I have been wrong
before about these things.

Scott Dorsey

unread,

May 24, 2012, 8:46:13 AM5/24/12

to

In article <jpknv7$4f0$1...@speranza.aioe.org>, Don Y <th...@isnotme.com> wrote:
>On 5/23/2012 11:30 AM, Scott Dorsey wrote:
>> by several people. The AES has a committee that is
>> already arguing abouttrying to standardize
>
>--^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
>Where are the inexpensive, ubiquitous products? :>

The inexpensive ubiquitous products came out years ago. But they don't
interoperate. So the standards bodies come in at the last minute and
try and clean up the mess.

This is how we got the Pin 2 Hot standard, 25 years after the XLR had
become the de facto standard out there.

>Standards (and compliance with them) make sense if:
>- interoperability is a concern
>- the standard, plus whatever tweeks your application needs,
> addresses all of your design criteria (size, cost, performance,
> reliability, functionality, etc.)

Right. I want to be able to plug a Roland amplifier into a Midas console.
I want to be able to take splits from the Midas into a record truck with a
Tascam in it. I can't do that all with ethernet yet. I can do it easily
with analogue XLRs and a little less easily with MADI.

>Sure, I can't buy a piece of third party kit and expect it
>to talk DIRECTLY to my network speakers, individually. But,
>it wouldn't have been able to speak to those speakers
>regardless -- it would always have had to speak to whatever
>was *driving* them. Whatever piece of *digital* kit that was
>sitting in the middle. In my case, that's the media server!
>*It* can bear the costs of any/all of that "interoperability"
>freeing the "speakers" to concentrate on reproducing *sound*.

It's too late, though, folks already have been doing this stuff for well
over a decade. It's a nice idea, but there's already stuff out there doing
it.

Scott Dorsey

unread,

May 24, 2012, 8:57:12 AM5/24/12

to

In article <jpko08$4f0$3...@speranza.aioe.org>, Don Y <th...@isnotme.com> wrote:
>On 5/23/2012 7:45 AM, Scott Dorsey wrote:
>> In article<jph8ru$e4r$1...@speranza.aioe.org>, Don Y<th...@isnotme.com> wrote:
>>>
>>> Welcome to the future! :> This project is the outgrowth of a
>>> client's need for an ULF distribution system that currently requires
>>> shipping thousands of pounds of cables to far-off locations.
>>
>> Well, there are plenty of those off the shelf, using ethernet, using
>> circuit-switched protocols on cat-5, and using fibre optics.
>
>Do they allow a foreign application to COEXIST on the medium
>with the *signal*? I.e., if I want to accompany the signal
>with command and control data, can I do so? Can I control
>the phase and magnitude of the signal at *each* receiving
>node with respect to a common time reference? Can I turn
>*off* the signal and just use the medium for data transport?

You can dedicate one channel to MIDI or to RS-232 on the Light Viper. With
the Ethernet systems, they will coexist with other Ethernet traffic including
IP, but you have to be very careful to keep your traffic levels low. It would
be better to use a different physical network entirely if you ask me.

Controlling phase and magnitude is the responsibility of the device that
the end, not the interface. Plug it into a digital console, you will have
gain controls and delays on both channels.

>Can I send (a different) signal back up the cable at the same
>time?

Yes, even with the LightViper.

>Can I route power over the same "cable" (or, do I need
>to carry power *with* each of those end-node devices?) Is
>it as *inexpensive* (or, has someone come up with a
>scheme they can protect with a patent)? Can I buy
>additional supplies "locally"? ("Hmmm, I need another
>500 ft of cable") Can I *discard* those (inexpensive)
>supplies instead of having to bring them back with me
>(i.e., get them through customs, *again*?)

Nope, because power supplies are really the largest and most expensive part
of the system in the pro audio world. With a lot of equipment the power
supply is most of the cost of the gear.

>> There are also some RF solutions available for delayed speaker stacks
>> at big outdoor concerts.
>
>There are lots of solutions to *individual* problems that I
>posed. OTOH, the solution I proposed can be applied to lots
>of *applications*!

What you describe sounds like a lot like Cobranet. Look at the Cobranet
ASICs which are available off the shelf and which are easy to integrate into
your products.

>Should I be running optical fiber in the walls and ceilings of
>my home to get audio (and video) to the various locations that
>I want it? Do I then run a *separate* pair of copper conductors
>alongside the fibre so I can distribute *power* to each of
>those nodes? Or, glue wall-warts around the periphery of
>the room to handle each of these devices?

Wall warts are an example of everything wrong with the consumer electronics
world. This is the professional audio world, we do not use wall warts.

And yes, if you want good sound quality, you need to provide power to
amplifiers. Not crappy wall-warts, not a little PoE, but real power.

This is why 70V systems are a big win for distributed sound systems like
that. You send high level audio around, no amps needed at the speakers. Yes,
it costs a fortune to do right because audio transformers that sound good
are very expensive. Quality costs money.

>Should I rely on *anything* RF based in an environment already
>known to be polluted with wireless transmissions? Your WiFi
>has to share a set of channels cooperatively with your neighbor
>(and your microwave oven). In the event of interference, those
>packets can safely be replayed (many!) seconds later. Would you
>tolerate your "HiFi" going silent while it recovers from a
>series of dropped packets? What about your TV -- the image
>freezing each time the network can't deliver enough packets
>*before* they are "needed"?

That's why we use licensed channels and nobody depends on ISM bands for
anything even remotely important.

>That's the difference between the approaches cited -- mine
>fits *each* of these application domains. And, it's dirt cheap!

Is it cheaper than the Cobranet chips?

>So, a 70V distribution amp feeding each "small area"?
>I.e., one for the kitchen, another for the dining room,
>another for each bedroom, living room, garage, back porch,
>etc.?

No, you run one 70V system around. If you need multiple channels, you run
multiple pairs to the switchbox in each room. Gain controls are on the 70V
side. Folks have been doing this since the Navy standardized it in 1940.

>If you think in terms of old technology, you tend to get
>all the capabilities (limitations!) of that old technology!

Get the Cobranet datasheets. How is your system different than Cobranet?

Soundhaspriority

unread,

May 24, 2012, 11:18:49 AM5/24/12

to

"Don Y" <th...@isnotme.com> wrote in message

news:jpl264$urf$1...@speranza.aioe.org...
> Hi Bob,
>
[snip]

>
> This suggests audio and video would probably like to
> originate from different physical servers. That *could*
> mean the audio portion of a "video" source material would
> need to be delivered to the audio server for distribution
> to the "network speakers" while the video is delivered
> directly by the video server to the "network displays".
>
> This might not be an effective implementation :-/

Synchronization between audio and video is not nearly as touchy as between
audio channels. The literature indicates that the touchiness and precision
required for synchronization of audio streams derives from a specialized
complex of CNS neurons that statistically average the data from large
numbers of cilia, and process the differentials between left and right with
a degree of resolution far beyond that of any of the single parts.

But for correlation between picture and sound, there is no such specialized
neural complex, so the standards are relaxed by many order of magnitude. For
24 fps film, 10 ms might be considered "perfect." By the same reasoning, 4ms
might be appropriate for 60 fps material. I suggest you give DTS a call, who
be aware of a relevant standard.

Bob Morein
(310) 237-6511

Don Y

unread,

May 24, 2012, 2:02:54 PM5/24/12

to

Hi Bob,

On 5/24/2012 8:18 AM, Soundhaspriority wrote:
>
> "Don Y" <th...@isnotme.com> wrote in message
> news:jpl264$urf$1...@speranza.aioe.org...

>> This suggests audio and video would probably like to
>> originate from different physical servers. That *could*
>> mean the audio portion of a "video" source material would
>> need to be delivered to the audio server for distribution
>> to the "network speakers" while the video is delivered
>> directly by the video server to the "network displays".
>>
>> This might not be an effective implementation :-/
>
> Synchronization between audio and video is not nearly as touchy as
> between audio channels.

Yes, I understand.

> The literature indicates that the touchiness and
> precision required for synchronization of audio streams derives from a
> specialized complex of CNS neurons that statistically average the data
> from large numbers of cilia, and process the differentials between left
> and right with a degree of resolution far beyond that of any of the
> single parts.
>
> But for correlation between picture and sound, there is no such
> specialized neural complex, so the standards are relaxed by many order
> of magnitude. For 24 fps film, 10 ms might be considered "perfect." By
> the same reasoning, 4ms might be appropriate for 60 fps material. I
> suggest you give DTS a call, who be aware of a relevant standard.

The problem I alluded to isn't synchronization as much as
one of computational complexity.

Imagine if the "video server" had to push the audio portion
of the video recordings *to* the audio server (while simultaneously
delivering the video to the "network display" devices).
The "audio server" now has one or more input channels that have
to be addressed -- in a timely manner! I.e., it now has the
same sort of time constraints that the "network speakers" have
(don't drop packets, don't let them get delayed in your
"incoming" protocol stack, keep track of *when* they need to
get *out* to the network speakers, etc.)

But, the audio server wouldn't just "pass them along". It
could have to mix in other audio (e.g., doorbell annunciator)
and downsample the 48KHz to the 44KHz the audio server
wants to deliver to the network speakers.

[this is *one* possible approach. I am not claiming it is
the RIGHT approach, just showing that there are other
issues involved when you want the audio and video to
coexist -- but in separate boxes]

And,the audio and video servers would then have to cooperate
in how they are *controlled* (externally). I.e., "stop",
"pause", "next chapter", "frame advance", etc.

Suddenly, the audio server has gone from being in an
authoritative position to a combination of authority
and subjugation! It complicates its design.

An alternative approach might be to let the network speakers
"listen" to two or more "audio programs" and have *them*
mix the signals, locally. This gets the audio server out
of the movie business -- the video server can push the
audio portion of the "movie" out AS IF it was a second
audio server.

[again, this is just *another* approach -- which also might
not be "right"]

The point is, dealing with *just* audio is a lot easier to
finalize a system design than having to "mix in" video
support, as well.

But, that's what makes it *fun*! :>

Don Y

unread,

May 24, 2012, 2:50:11 PM5/24/12

to

Hi Scott,

On 5/24/2012 5:57 AM, Scott Dorsey wrote:

> In article<jpko08$4f0$3...@speranza.aioe.org>, Don Y<th...@isnotme.com> wrote:
>> On 5/23/2012 7:45 AM, Scott Dorsey wrote:
>>> In article<jph8ru$e4r$1...@speranza.aioe.org>, Don Y<th...@isnotme.com> wrote:
>>>>
>>>> Welcome to the future! :> This project is the outgrowth of a
>>>> client's need for an ULF distribution system that currently requires
>>>> shipping thousands of pounds of cables to far-off locations.
>>>
>>> Well, there are plenty of those off the shelf, using ethernet, using
>>> circuit-switched protocols on cat-5, and using fibre optics.
>>
>> Do they allow a foreign application to COEXIST on the medium
>> with the *signal*? I.e., if I want to accompany the signal
>> with command and control data, can I do so? Can I control
>> the phase and magnitude of the signal at *each* receiving
>> node with respect to a common time reference? Can I turn
>> *off* the signal and just use the medium for data transport?
>
> You can dedicate one channel to MIDI or to RS-232 on the Light Viper. With

RS232? MIDI? If I want to push a megabyte of data back to the
server, how many HOURS "after the fact" will I be sitting
around *waiting*?

> the Ethernet systems, they will coexist with other Ethernet traffic including
> IP, but you have to be very careful to keep your traffic levels low. It would
> be better to use a different physical network entirely if you ask me.

Exactly. Even more kit to cart around with you.

> Controlling phase and magnitude is the responsibility of the device that
> the end, not the interface.

Again, because you expect that to be a separate device! I've
decided they can easily coexist in the same bit of hardware
for no additional "space", power, cost, etc.

> Plug it into a digital console, you will have
> gain controls and delays on both channels.
>
>> Can I send (a different) signal back up the cable at the same
>> time?
>
> Yes, even with the LightViper.

At *bits* per second?? Not even KB/s or MB/s?? Imagine wanting to
process the "signal" sent *down* the pipe AT the remote end and then
send the processed signal back *up*. E.g., 2x24b@44KHz. At *each*
such node. Then, imagine wanting to actually do some *number
crunching* on the remote end and send those results up in real
time (for display on a manned "console")

>> Can I route power over the same "cable" (or, do I need
>> to carry power *with* each of those end-node devices?) Is
>> it as *inexpensive* (or, has someone come up with a
>> scheme they can protect with a patent)? Can I buy
>> additional supplies "locally"? ("Hmmm, I need another
>> 500 ft of cable") Can I *discard* those (inexpensive)
>> supplies instead of having to bring them back with me
>> (i.e., get them through customs, *again*?)
>
> Nope, because power supplies are really the largest and most expensive part
> of the system in the pro audio world. With a lot of equipment the power
> supply is most of the cost of the gear.

So, you can't deploy that "solution" in all of the applications
that I've described.

OTOH, when you want to deploy my system in a "pro audio" environment
you simply upgrade the power supplies!

>>> There are also some RF solutions available for delayed speaker stacks
>>> at big outdoor concerts.
>>
>> There are lots of solutions to *individual* problems that I
>> posed. OTOH, the solution I proposed can be applied to lots
>> of *applications*!
>
> What you describe sounds like a lot like Cobranet. Look at the Cobranet
> ASICs which are available off the shelf and which are easy to integrate into
> your products.

I looked at CobraNet ~20 years ago. Long before being approached
with this ULF project. Too much money, too little functionality.
It is targeted *solely* at using the network cable as a speaker
wire.

It's a proprietary product. You pay a premium for the monopoly
that is in place. You can't use more ubiquitous off-the-shelf
products to implement your "solution". Anything additional
that you want to do comes at the expense of extra components
unless Cirrus *happens* to think your application domain is
"common enough" that they are willing to develop silicon for it.

You are forced to live with any bugs in that silicon -- no
alternate vendors to choose from nor put competitive pressure
on the *sole* vendor.

You can't offer the technology to other parties who can
then pursue their own specific application requirements.

You can't "talk CobraNet" from a COTS PC, iPad, etc.

It can't be routed (i.e., *I* can push audio across the country
without requiring anything "special" along the way. CobraNet
would require a custom bridge to repackage the audio under IP)

It's more expensive.

Yes, I've looked at CobraNet. :>

>> Should I be running optical fiber in the walls and ceilings of
>> my home to get audio (and video) to the various locations that
>> I want it? Do I then run a *separate* pair of copper conductors
>> alongside the fibre so I can distribute *power* to each of
>> those nodes? Or, glue wall-warts around the periphery of
>> the room to handle each of these devices?
>
> Wall warts are an example of everything wrong with the consumer electronics
> world. This is the professional audio world, we do not use wall warts.

That ONE APPLICATION is pro audio. The same hardware can be
bundled with a quality power supply, put in a pretty box with
a nice little graphic display and lots of blinking lights and
sold for an outrageous price.

*Or*, it can be bolted to the back of a loudspeaker and
plugged into an EXISTING ethernet jack in an office or
storefront to provide audio to that area.

> And yes, if you want good sound quality, you need to provide power to
> amplifiers. Not crappy wall-warts, not a little PoE, but real power.

And what in my design *requires* the use of a wall wart?
What *requires* the use of PoE?

By contrast, can you repurpose *your* solution to work
in the environments I've described? I'll even let you
redefine the audio specifications to something on a par
with POTS!

*That's* the problem with those solutions. They fit *one*
application domain. If you want to do anything *different*,
they don't help but, rather, *hinder*.

> This is why 70V systems are a big win for distributed sound systems like
> that. You send high level audio around, no amps needed at the speakers. Yes,
> it costs a fortune to do right because audio transformers that sound good
> are very expensive. Quality costs money.
>
>> Should I rely on *anything* RF based in an environment already
>> known to be polluted with wireless transmissions? Your WiFi
>> has to share a set of channels cooperatively with your neighbor
>> (and your microwave oven). In the event of interference, those
>> packets can safely be replayed (many!) seconds later. Would you
>> tolerate your "HiFi" going silent while it recovers from a
>> series of dropped packets? What about your TV -- the image
>> freezing each time the network can't deliver enough packets
>> *before* they are "needed"?
>
> That's why we use licensed channels and nobody depends on ISM bands for
> anything even remotely important.

So, I guess that's yet another application domain that
solution can't address. Unlikely to have folks living in a
high-rise *each* using that "licensed" band to distribute
sound and video around their apartments, condos, etc.

>> That's the difference between the approaches cited -- mine
>> fits *each* of these application domains. And, it's dirt cheap!
>
> Is it cheaper than the Cobranet chips?

The ULF design came in at $15 per node. So, my client can
actually chose to *discard* the devices at the site in
<wherever> rather than deal with the hassles and expenses
(shipping) of getting them back to the US! The home design is
closer to $20 -- since it includes a stereo amplifier *in*
the node. What do the CobraNet chips cost? How much do
you have to add to them to get a *real* solution (not just
a chip in an antistatic bag)? What happens to the cost of
the "non-audio" portions of your solution?

>> So, a 70V distribution amp feeding each "small area"?
>> I.e., one for the kitchen, another for the dining room,
>> another for each bedroom, living room, garage, back porch,
>> etc.?
>
> No, you run one 70V system around. If you need multiple channels, you run
> multiple pairs to the switchbox in each room. Gain controls are on the 70V
> side. Folks have been doing this since the Navy standardized it in 1940.

Where am I putting the "switchbox" in my living room? And each
bedroom? Kitchen? Back porch? etc. I assume each of these
can be controlled remotely via some sort of electronic
connection (i.e., over *a* network). And, of course, they must
be *free* lest they add to the overall cost...

Maybe they even have the ability to "clicklessly" switch between
channels (I suspect it wouldn't let me *fade* one source program
out while its replacement fades *in*)

I.e., welcome to 1940! There's a wire recorder in the back
room for those of you wanting to make a recording commemorating
your visit! :>

>> If you think in terms of old technology, you tend to get
>> all the capabilities (limitations!) of that old technology!
>
> Get the Cobranet datasheets. How is your system different than Cobranet?

See above. :>

Don Y

unread,

May 24, 2012, 2:55:37 PM5/24/12

to

Hi Scott,

On 5/24/2012 5:46 AM, Scott Dorsey wrote:
> In article<jpknv7$4f0$1...@speranza.aioe.org>, Don Y<th...@isnotme.com> wrote:
>> On 5/23/2012 11:30 AM, Scott Dorsey wrote:
>>> by several people. The AES has a committee that is
>>> already arguing abouttrying to standardize
>>
>> --^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>
>> Where are the inexpensive, ubiquitous products? :>
>
> The inexpensive ubiquitous products came out years ago. But they don't
> interoperate. So the standards bodies come in at the last minute and
> try and clean up the mess.

That;s because you want to be able to have all those little
boxes talk to each other. If, OTOH, they are part of a single
coherent *system* that you talk to as a whole, then this
issue goes away!

> This is how we got the Pin 2 Hot standard, 25 years after the XLR had
> become the de facto standard out there.
>
>> Standards (and compliance with them) make sense if:
>> - interoperability is a concern
>> - the standard, plus whatever tweeks your application needs,
>> addresses all of your design criteria (size, cost, performance,
>> reliability, functionality, etc.)
>
> Right. I want to be able to plug a Roland amplifier into a Midas console.
> I want to be able to take splits from the Midas into a record truck with a
> Tascam in it. I can't do that all with ethernet yet. I can do it easily
> with analogue XLRs and a little less easily with MADI.

And I want to be able to send a "signal" down a wire. Feed
that to a transducer. Observer the output. Do an FFT on
that signal. And convey the characteristics of that back
to a monitoring console where a human operator can decide
how to proceed -- while simultaneously looking at similar
results from *other* transducers.

How are your "standards" going to help me do that?

>> Sure, I can't buy a piece of third party kit and expect it
>> to talk DIRECTLY to my network speakers, individually. But,
>> it wouldn't have been able to speak to those speakers
>> regardless -- it would always have had to speak to whatever
>> was *driving* them. Whatever piece of *digital* kit that was
>> sitting in the middle. In my case, that's the media server!
>> *It* can bear the costs of any/all of that "interoperability"
>> freeing the "speakers" to concentrate on reproducing *sound*.
>
> It's too late, though, folks already have been doing this stuff for well
> over a decade. It's a nice idea, but there's already stuff out there doing
> it.

See above -- and my other post.

If its been around *so* long, it must have been poorly implemented
or exhorbitantly priced -- since you don't *see* it in homes!
So, I guess it must *not* be "too late"...

Soundhaspriority

unread,

May 24, 2012, 3:50:29 PM5/24/12

to

"Don Y" <th...@isnotme.com> wrote in message

news:jplt4e$8rq$1...@speranza.aioe.org...

A lot of hard thinking is required. In general terms, the question could be:
"Is there always a peer-peer algorithm that is equivalent to a master-slave
algorithm? Can it be proven to be algorithmic?"

I think that, personally, rather than grapple with something so complex, I
would center my thoughts around "pulling" the data, rather than "push." A
master process, executing as a single program would the data off the servers
and combine it. The data would then be pushed out deterministically to the
display and audio devices.

Ordinary desktop computers, executing their local guis, have egregious bugs
in their user interfaces these days, related to multithreading. Well
intentioned programmers, lots of money, all the best intentions, and they
seem to have gotten themselves stuck in a hard place.

Bob Morein
(310) 237-6511

Don Y

unread,

May 24, 2012, 5:33:50 PM5/24/12

to

Hi Bob,

On 5/24/2012 12:50 PM, Soundhaspriority wrote:

>> [again, this is just *another* approach -- which also might
>> not be "right"]
>>
>> The point is, dealing with *just* audio is a lot easier to
>> finalize a system design than having to "mix in" video
>> support, as well.
>>
>> But, that's what makes it *fun*! :>
>
> A lot of hard thinking is required.

Agreed. The "audio only" aspect was simple to implement
as I had already solved that problem for the ULF application
I mentioned. Write some batch converters to massage all the
various file formats present in the music archive into
"myFLAC", move them onto the audio server and you;re done!

Want to support "live audio" (e.g., a radio broadcast from
an FM tuner)? Just transcode on the fly.

The mistake was failing to anticipate the problems that
the video containers would pose. And, how different the
desired implementations would *want* to be (i.e., I want
to listen to audio pretty much all day -- from one source
or another; but, video only when I want to *focus* on it!
Silly to keep a video server running -- with terrabytes
of "software" spinning -- when I'm not likely to be accessing
it!)

> In general terms, the question could
> be: "Is there always a peer-peer algorithm that is equivalent to a
> master-slave algorithm? Can it be proven to be algorithmic?"

I use a peer to peer algorithm for packet *recovery*.
I.e., if node A drops a packet (not technically possible
unless there is some local noise or a glitch in a network
switch), I don't want that node asking the master to
retransmit the packet. Imagine N such nodes all pestering
the master -- the system falls into catastrophic collapse
as missed packets lead to increased retransmit request
which bottleneck at *the* master server which causes it
to be more likely to fail to perform its normal function
in a timely way which causes more slaves to complain
which...

So, instead, the master tells *you* which other nodes
are receiving (subscribing to) the same data stream
that you are using. If you detect a missed packet
(i.e., one that isn't available in your buffer when
you would otherwise expect it to be), then you contact
one of these peers knowing that it should have the
packet available.

Since the master has global knowledge of who uses what
*and* which peers are supporting which *other* peers,
it can judiciously spread this "backup" responsibility
around instead of burdening some particular node(s).

Likewise, it can see which nodes are having problems
(i.e. *requesting* retransmits) and use this to diagnose
problems in the fabric.

> I think that, personally, rather than grapple with something so complex,
> I would center my thoughts around "pulling" the data, rather than
> "push." A master process, executing as a single program would the data
> off the servers and combine it. The data would then be pushed out
> deterministically to the display and audio devices.

If you pull data, you then have multiple clients trying to
pull the *same* data at *almost* the same time. This means
you consume extra network bandwidth to deliver the same
amount of information.

(Imagine two TV's watching the same program in different rooms)

If you *push* it, you can exploit broadcast and multicast technologies
to allow *one* copy of the data stream onto the wire with multiple
"consumers". It also makes it easier for those consumers to
remain in some fixed time relationship to each other since they all
see "the same data" at "the same time (roughly)".

[Note I am not claiming your approach is impossible. Rather,
there are consequences to it that I've evaluated as net
negatives.]

> Ordinary desktop computers, executing their local guis, have egregious
> bugs in their user interfaces these days, related to multithreading.
> Well intentioned programmers, lots of money, all the best intentions,
> and they seem to have gotten themselves stuck in a hard place.

Exactly. its relatively easy to put a PC on a network and
"stream video" (or audio) to it. It can preallocate several
megabytes of buffer to handle dropouts. It doesn't care
if it is reproducing its audio/video in synch with the
same audio/video being streamed into the cubicle next door.
If the OS hiccups becasue some other activity momentarily
overloads it wrt the audio/video task, the user just shrugs.
It's a *toy* to them!

People trying to design *into* that market then end up with the
worst of all worlds compromising their performance, etc.

Nothankyouverymuch! :>

Frank

unread,

May 24, 2012, 6:04:45 PM5/24/12

to

On 24 May 2012 08:40:30 -0400, in 'rec.audio.pro',

in article <Re: Phase/skew sensitivity>,
klu...@panix.com (Scott Dorsey) wrote:

>Frank <fr...@nojunkmail.humanvalues.net> wrote:
>>On 23 May 2012 14:30:36 -0400, in 'rec.audio.pro',
>>in article <Re: Phase/skew sensitivity>,
>>klu...@panix.com (Scott Dorsey) wrote:
>>
>>>The AES has a committee that is already arguing about
>>>trying to standardize audio over ethernet. The LAST thing we need is another
>>>competing standard.
>>
>>Just curious, Scott, but is this AES effort *in addition to* the work
>>on AVB (Audio Video Bridging) already done by the IEEE?
>
>YES!

Thank you for your reply. I don't ever work with networked audio, or
video, but I was nonetheless curious about this.

>Isn't it great to have so many different standards?

Obviously! :)

Les Cargill

unread,

May 24, 2012, 6:30:50 PM5/24/12

to

but that's too easy :) Mount the media drive over the network, and
you're done.

> Ordinary desktop computers, executing their local guis, have egregious
> bugs in their user interfaces these days, related to multithreading.
> Well intentioned programmers, lots of money, all the best intentions,
> and they seem to have gotten themselves stuck in a hard place.
>

There is woefully inadequate separation of concern.

> Bob Morein
> (310) 237-6511
>
>
>

--
Les Cargill

Soundhaspriority

unread,

May 24, 2012, 6:49:10 PM5/24/12

to

"Don Y" <th...@isnotme.com> wrote in message

news:jpm9ft$8le$1...@speranza.aioe.org...

Don,
Coincidentally, I have been thinking about a problem similar in scope. I
can't find a way to make either push or pull work in isolation. In your
case, I thought there was a single consumer, which caused me to think "pull"
might work for you. With multiple consumers, you want to exploit multicast.
I'll split the difference with you. I don't think there is a completely
deterministic solution, but a combination of push and pull might work. I am
reminded of protocol backups with breakout timers. The consumer will rely
primarily on "push", but if a consumer stream buffer runs down, let it do a
"pull."
My app is not as critical as yours. If it glitches, it glitches.
Pristine real time performance might be difficult.

Bob Morein
(310) 237-6511

Soundhaspriority

unread,

May 24, 2012, 6:52:39 PM5/24/12

to

"Les Cargill" <lcarg...@comcast.com> wrote in message
news:jpmcml$o2d$1...@dont-email.me...
> Soundhaspriority wrote:
[snip]>

> but that's too easy :) Mount the media drive over the network, and
> you're done.
>

It depends upon how many consumers he has. I thought there were just to be a
few. If it's a conference venue loaded with A/V augmentation, then I think
Don would like to do something akin to multicasting.

Bob Morein
(310) 237-6511

Les Cargill

unread,

May 24, 2012, 8:31:51 PM5/24/12

to

That would be a good choice, so long as all the NICs involved
fully support it.

Don Y

unread,

May 24, 2012, 10:02:22 PM5/24/12

to

Hi Bob,

On 5/24/2012 3:49 PM, Soundhaspriority wrote:

> Coincidentally, I have been thinking about a problem similar in scope. I
> can't find a way to make either push or pull work in isolation. In your
> case, I thought there was a single consumer, which caused me to think
> "pull" might work for you. With multiple consumers, you want to exploit
> multicast.

For *my* (personal) application, I wanted to get rid of all the
bits of audio-video kit that litter the house. All these black
cases that attract dust. All these cables running along
baseboards. Etc.

And, at the same time, allow me to exploit a centralized "media
source" (whether that is canned/prerecorded program material
or over-the-air boradcasts). I.e., if I want to listen to a
news broadcast IN THE BATHROOM, do I have to bring a radio or
television in there to do so? Or, do I have to wait until I have
finished shaving? Or, turn up the volume (waking everyone in
the house) just so I can hear it from another room over the
masking effect of running water?

Similarly, we wanted to do away with the various "blobs"
hanging on the walls -- the doorbell, the thermostat,
the controls for the swamp cooler, etc. While I make
a living FROM technology, I don't like *seeing* it! :>

When you start rethinking your environment, you can see
how many little things clutter it up. E.g., telephones
in several locations -- yet you're always having to *walk*
to one. Remote controls for various bits of kit -- yet
you can never find the *right* one. Etc.

With the ULF project fresh in mind, it was a no-brainer to
imagine how I could repurpose the same technology to
solve my problems.

Run network drops to the four corners of the living room.
Put a powered speaker in each location with a length
of CAT5 connecting it to a jack in the wall -- instead of
a length of zip cord run along the baseboard!

Put a drop behind the valence above the kitchen sink
to feed a pair of speakers hidden in the same area.
Listen to music, news, other broadcasts while preparing
a meal.

Ditto for each bedroom. Dining room. Garage (while servicing
the cars). Front/back porches. etc. Heck, I even ran a drop
up onto the *roof* (so, I can service the swamp cooler mounted
up there and NOT have to keep calling down, "OK, turn it ON,
now so I can see how things work. Wait, wait! Stop! The
belt isn't on properly!!").

Put a couple of drops *in* the ceiling so speakers placed in
those locations can be used for "announcements" (think:
doorbell annunciator, phone ringer, etc.).

Repeat the exercise with video displays in mind.

And VoIP phones.

And security products.

And home automation kit.

And, I don't want to have to run a separate network for multimedia.
Or home automation. Or telephony. Or "regular computers". Cripes,
there's already more than a mile of wire in the walls as it is!

I sized the system so that I could watch 4 different "video
programs" at the same time (i.e., 4 people in 4 different
rooms -- this seems to be about normal for most american
households! <frown> Mom watching one thing in the living
room, Dad another in the den, and each of two nominal children
watching something ELSE in their respective bedrooms!)

Along with the 4 video programs would come 4 sets of audio.

Yet, the phones need to keep working. The Internet connection
still has to give reasonable performance (you might be
streaming video *off* the Internet!). The surveillance
system still has to be able to record activities in the
yard as well as detect motion. etc.

But, you might also want to be able to pipe the same
"program material" throughout the entire house. So
you can have a "party tape" playing and folks hear
it seemlessly regardless of whether they are
standing around the dining room table, seated in the
living room or milling around in the back yard.

When someone comes to the front door, you would want that
to be "announced" somewhere proximate to your location
(certainly not somewhere absent an persons able to hear
it!)

Likewise, you'd like to view the closed circuit video of
that visitor *where* you are - instead of having to walk to
the front door, etc. Or, close the garage door in case
someone left it open (better yet, have the machine close
it for you automatically at the end of the day). Or, alert
you that the compressor in the freezer chest appears to have
failed as the temperature inside it is steadily climbing
(no, I don't want you to send me an email -- I want you
to wake me up, now!)

None of these things have a natural "point of control".
Or, if they do, it varies depending on what you are trying
to do, where and when.

So, it seemed most prudent to drive things from a
central point (conceptually) which could be commanded
(by some other mechanism) to provide specific services
at specific places.

> I'll split the difference with you. I don't think there is a completely
> deterministic solution, but a combination of push and pull might work. I
> am reminded of protocol backups with breakout timers. The consumer will
> rely primarily on "push", but if a consumer stream buffer runs down, let
> it do a "pull."

I don't know. I'm not really concerned with solving the
"general problem". I want my needs met. <grin> If others
want to take my technology and carry it off into different
directions, they can do so (with my blessings). Someone
might solve a different problem differently -- but using
some of the same technologies.

> My app is not as critical as yours. If it glitches, it glitches.
> Pristine real time performance might be difficult.

I do real-time work for a living. So, that's how I look
at most projects. For example, the idea of streaming a youtube
video to a PC almost makes me gag at the inelegance :-/
OTOH, given the lack of control most folks have over their
PC's and how (well) they operate, this is probably the only
realistic way for them to consume those videos!

<shrug>

Soundhaspriority

unread,

May 24, 2012, 11:02:15 PM5/24/12

to

"Don Y" <th...@isnotme.com> wrote in message

news:jpmp7h$8v0$1...@speranza.aioe.org...

> Hi Bob,
>
> On 5/24/2012 3:49 PM, Soundhaspriority wrote:
>
>> Coincidentally, I have been thinking about a problem similar in scope. I
>> can't find a way to make either push or pull work in isolation. In your
>> case, I thought there was a single consumer, which caused me to think
>> "pull" might work for you. With multiple consumers, you want to exploit
>> multicast.
>
> For *my* (personal) application, I wanted to get rid of all the
> bits of audio-video kit that litter the house. All these black
> cases that attract dust. All these cables running along
> baseboards. Etc.
>

Don,
You certainly have the vision. With your background, perhaps you can
bull it through in a way that committees cannot. It is certainly the case
that the consumer electronic environment suffers from, to borrow a political
term, "balkanization". The concept of interoperability dominates IT, but has
only made inroads into the consumer environment. Part of this has been due
to the immaturity of IT compared with, for example, the kitchen toaster. It
takes a lot of hidden intelligence to make something complicated appear
simple. It takes even more hidden intelligence to make something fix itself
as easily as a wire that's loose in the clip. Errors in state machines tend
to compound, while errors in analog machines tend to die down, which is why
the consumer and IT have had such a difficult time finding mutual love. For
your own use, this might not be a concern.

I hope you succeed. Please report back.

Bob Morein
(310) 237-6511

Don Y

unread,

May 25, 2012, 3:31:22 AM5/25/12

to

Hi Bob,

On 5/24/2012 8:02 PM, Soundhaspriority wrote:

> Don,
> You certainly have the vision. With your background, perhaps you can
> bull it through in a way that committees cannot.

It's not really a fair comparison. I can cut whatever corners
that I don't consider appropriate for my needs whereas standards'
committees try to come up with solutions that are all things
to all people! I can't afford all that.

E.g., should the audio streams be encrypted -- to prevent
someone eavesdropping from stealing material? <shrug>
Perhaps a Standard would provide that sort of support.
In my case, I don't care. I don't protect the content,
just the *protocol* (i.e., I don't want someone to be
able to push "foreign" audio into my speakers)

> It is certainly the
> case that the consumer electronic environment suffers from, to borrow a
> political term, "balkanization". The concept of interoperability
> dominates IT, but has only made inroads into the consumer environment.
> Part of this has been due to the immaturity of IT compared with, for
> example, the kitchen toaster. It takes a lot of hidden intelligence to
> make something complicated appear simple. It takes even more hidden
> intelligence to make something fix itself as easily as a wire that's
> loose in the clip. Errors in state machines tend to compound, while

This is an excellent ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ observation!
I never heard it expressed that simply! It's easier to design
a digital system that WANTS to be unstable (not that it *should*
be unstable!)

> errors in analog machines tend to die down, which is why the consumer
> and IT have had such a difficult time finding mutual love. For your own
> use, this might not be a concern.
>
> I hope you succeed. Please report back.

Thanks. The "plain audio" works. I just haven't decided what the
right way to handle the "audio from video" aspect of the system.
And, I would hate like hell to commit to buying all the parts
only to discover that I need to change the sampling frequency,
or support a deeper buffer or use pink wires, or...

Also, even though there are features that *I* am not using
in the design, I would like to ensure that they will work
for others who choose to exploit them (e.g., being able to
dynamically adjust the delay in the signal).

<shrug> We'll see... the great thing about doing things
"off the clock" is you don't have a client or customer
breathing down your neck forcing you to "settle" for
implementations that you're not happy with! :>

Jonas Eckerman

unread,

May 26, 2012, 8:07:55 AM5/26/12

to

On 2012-05-24 12:22, Don Y wrote:

> audio library AND RUN THE AUDIO SERVER 24/7 without an
> annoying fan, etc., doing the same with a video library
> would be impractical. You would *want* to power it
> down when it wasn't actively serving video!

> This suggests audio and video would probably like to
> originate from different physical servers.

To me it suggests that the audio and video could probably originate from
different storage attached to the same server, where the video storage
can be external to the server (using a dedicated gigabit ethernet
connection for example).

When you want don't need the video, just unmount and power down the
video storage system.

/J
--
Jonas Eckerman
http://www.truls.org/

Don Y

unread,

May 26, 2012, 2:31:48 PM5/26/12

to

Hi Jonas,

On 5/26/2012 5:07 AM, Jonas Eckerman wrote:
> On 2012-05-24 12:22, Don Y wrote:
>
>> audio library AND RUN THE AUDIO SERVER 24/7 without an
>> annoying fan, etc., doing the same with a video library
>> would be impractical. You would *want* to power it
>> down when it wasn't actively serving video!
>
>> This suggests audio and video would probably like to
>> originate from different physical servers.
>
> To me it suggests that the audio and video could probably originate from
> different storage attached to the same server, where the video storage
> can be external to the server (using a dedicated gigabit ethernet
> connection for example).

Or firewire, USB3, SCSI, etc.

> When you want don't need the video, just unmount and power down the
> video storage system.

The effort required to process and distribute video is
considerably more involved than audio. Rather than
designing *a* box that can handle audio and video
concurrently -- and not use or need the video portion
much of the time (imagine using such a system in audio
ONLY applications!) -- design something that can do each
individual thing, well.

E.g., the video system might want to be able to record video
as well (imagine the video source being an over-the-air
broadcast that you are pushing down the network "live";
yet, also recording to allow things like "pausing live
video"). So, that system has needs that the audio might
not have. Or, on a scale that the audio might not need
to support.

I wouldn't waste any money (personally) ensuring that
I could watch movies during a power outage. On the other
hand, I'd be annoyed if I had to sit in a silent house
until the power returned! I wouldn't want to have to
provide backup power to a system that is *capable* of
processing video in real-time (think: fancier CPU) when
all I really need is something that can push bytes off
a solid state disk (or some other read-only medium)
out to "speakers".

Regardless, though, the video has sound encoded at 48KHz
while the audio has sound at 44KHz. Do the network
speakers support *both* sample rates? Just one? What
if you want to mix "regular audio" with "audio from video"?
Do you do that in the speakers? Or, in the audio server?
etc.

These are the sorts of questions that muddy an "audio only"
implementation!

Don Y

unread,

May 26, 2012, 4:20:48 PM5/26/12

to

Hi Bob,

On 5/22/2012 3:01 PM, Soundhaspriority wrote:
>
> "Don Y" <th...@isnotme.com> wrote in message

> news:jpgu5c$i35$1...@speranza.aioe.org...

>>
>> On 5/22/2012 1:09 PM, Soundhaspriority wrote:
>>> Humans can distinguish 10 microseconds OR LESS differential from ear to
>>> ear. Applying Nyquist's criteria to the problem, this suggests 5
>>> microseconds between channels, with an additional guard of 3
>>> microseconds, = 2 microseconds.
>>
>> Do you have a reference for this?
>
> Yes: http://www.zainea.com/jorissmithyin98.pdf

This was an excellent read! Thanks! I had done some research
into spatial sound perception previously. But, just accepted
the issues -- relative delay and frequency responses -- as
"intuitively obvious". It was interesting to see an explanation
of how this could be implemented "in carbon" (vs. silicon!).

Amazing to consider there are folks who actually think
about these things everyday! A single lifetime is far
too short to learn and appreciate all that you might *want*!

<frown>