Segmented-Audio over Websockets to HTML5

1,485 views
Skip to first unread message

Tom Sheffler

unread,
Oct 5, 2013, 11:03:01 AM10/5/13
to autob...@googlegroups.com
We've been experimenting using Autobahn/Websockets to send near
real-time multimedia data from small clients for rendering in HTML5.
This note shares a little bit of our results.  Part of the reason for
this is to see if anyone else is doing anything similar.

HTML5 does not yet define a means for real-time continuous audio
transmission.  When WebRTC is ratified, it will form the basis for
that kind of capability, but it is likely there will remain browser
incompatibilities.  Javascript Media Source extensions are another
promising technology that when delivered will enable browsers to
assemble presentations of media (like audio and video) without
discontinuities or artifacts.  The latest spec
shows collaboration between Google, Microsoft and Netflix.  However,
this feature is only available on an experimental basis in a few
browsers.

Even without these emerging specifications, the current HTML5 browser
has interesting capabilities for multimedia presentation.

It has:

- websockets: a transport medium
- javascript: an efficient interpreter
- event scheduling: both in Javascript and AudioContext
- audio rendering: Audio elements, and the AudioContext

If a continuous audio broadcast is broken into a series of chunks, a
browser can re-assemble the sequence of audio chunks for playback with
timing that is close to the original.  The resulting audio sometimes
has artifacts in the form of occasional clicks between the chunks.
However, it turns out that the quality is remarkably good.  The quality is 
browser-dependent, and it is interesting to compare the results across browsers.

Testing the capability of HTML5 to reassemble a continuous audio
broadcast required the development of three separate things:

1) a broadcasting application (a program to sample a microphone and
turn it into data for sending via HTTP)

2) a server (for receiving the audio data, potentially transcoding
audio for browsers that need it, and for sending through a websocket
to an HTML5 web-page)

3) a Javascript receiving application (HTML5 + Javascript)


The result of these experiments is the Wazwot iPhone App and its
associated web service.


How does Autobahn Websockets fit in?

Autobahn Websockets provides a Resource interface so that it is easy
to mix Websocket objects into a tree of HTTP Resources.  This
capability simplified the construction of the server, since it needs
to handle many broadcasts simultaneously.  In our implementation, each
broadcast "channel" is its own resource with its own HTML and
Websocket components.

We originally chose AB because of my familiarity with Twisted and
Python, but also because of some of the Twisted features it embraces.
In particular we needed flow-control (in the form of an
IPushProducer).  Real-time audio transmission over a web socket needs
a means to monitor TCP back-pressure and adjust the stream.  The
Autobahn/Twisted framework made it pretty easy to implement this
requirement.

Try it out:

Wazwot is basically a research project.  It sends continuous audio and
also continuous image frames.  If anyone on this list is interested in
trying it out, send me a note and I'll shoot you an IOS app promo code
for evaulation.




Tobias Oberstein

unread,
Oct 5, 2013, 12:04:03 PM10/5/13
to autob...@googlegroups.com, Tom Sheffler
Hi Tom,

Am 05.10.2013 17:03, schrieb Tom Sheffler:
> We've been experimenting using Autobahn/Websockets to send near
> real-time multimedia data from small clients for rendering in HTML5.
> This note shares a little bit of our results. Part of the reason for
> this is to see if anyone else is doing anything similar.

Thanks for sharing! This is quite interesting ..

> However, it turns out that the quality is remarkably good. The quality is
> browser-dependent, and it is interesting to compare the results across
> browsers.

Curious: how do browsers stack up rgd quality? Did you test IE10
(desktop and/or WP)?

> Testing the capability of HTML5 to reassemble a continuous audio
> broadcast required the development of three separate things:

What is your fragment size (the size into which you break down the
continous audio)?

Do you dynamically adjust that?

>
> 1) a broadcasting application (a program to sample a microphone and
> turn it into data for sending via HTTP)

So the upstream isn't WebSocket? Any reasons?

> We originally chose AB because of my familiarity with Twisted and
> Python, but also because of some of the Twisted features it embraces.
> In particular we needed flow-control (in the form of an
> IPushProducer). Real-time audio transmission over a web socket needs
> a means to monitor TCP back-pressure and adjust the stream. The
> Autobahn/Twisted framework made it pretty easy to implement this
> requirement.

This - in particular - is satisfying to hear;) Since a) the relevancy of
flow-control and TCP-backpressure adaption by the an app is a topic that
many seem to be unaware of, and b) while the WS protocol was still in
the cooking on the IETF, some wanted to introduce strange features just
to workaround in a half-baked way, since they didn't understand /
couldn't support arbitrary streaming and backpressure-to-app scenarios
and I've been fighting hard to avoid these;)

Twisted of course got it right years ago (providing Producer/Consumer
machinery) and AutobahnPython has embraced that from the very beginning.

For people who wanna read more, here are 2 starters:

http://autobahn.ws/python/tutorials/producerconsumer

https://github.com/tavendo/AutobahnPython/tree/master/examples/websocket/streaming

This even includes an example of a sending a continous, backpressure
controlled stream of data as a single WS message (efffectively rendering
WS a fancy prelude to "raw TCP").

Cheers
/Tobias

Tom Sheffler

unread,
Oct 7, 2013, 11:29:22 AM10/7/13
to Tobias Oberstein, autob...@googlegroups.com
On Sat, Oct 5, 2013 at 9:04 AM, Tobias Oberstein <tobias.o...@gmail.com> wrote:
Hi Tom,

Am 05.10.2013 17:03, schrieb Tom Sheffler:
We've been experimenting using Autobahn/Websockets to send near
real-time multimedia data from small clients for rendering in HTML5.
This note shares a little bit of our results.  Part of the reason for
this is to see if anyone else is doing anything similar.

Thanks for sharing! This is quite interesting ..

However, it turns out that the quality is remarkably good.  The quality is
browser-dependent, and it is interesting to compare the results across
browsers.

Curious: how do browsers stack up rgd quality? Did you test IE10 (desktop and/or WP)?

I've tested IE10 desktop, and it's very very good.  I don't have access to a WP, but would like to try it.
 

Testing the capability of HTML5 to reassemble a continuous audio
broadcast required the development of three separate things:

What is your fragment size (the size into which you break down the continous audio)?

Do you dynamically adjust that?

It currently chooses a 1-second fragment size.  The reassembly buffer requires a few segments before it starts.  The resulting audio latency is 4 to 5 seconds as a result.  These parameters result in a generally good audio experience.  It would be interesting to develop algorithms to optimize latency or quality as a function of network.



1) a broadcasting application (a program to sample a microphone and
turn it into data for sending via HTTP)

So the upstream isn't WebSocket? Any reasons?

A few.  The application is very asymmetric: the broadcaster is not a browser, and the receiver is.  Websockets is a good way to "push" information to a browser (the packets of a stream), so that's a given.

On the broadcaster side I chose to leverage HTTP/1.1 and the connection pools that most platforms offer.  If I had chosen a websocket upload, the application would have to manage disconnections and retries.  With the HTTP request approach, the connection pools mask many of the network issues.  Data packets do arrive out of order at the server as a result.  Request pipelining is also possible.
 

We originally chose AB because of my familiarity with Twisted and
Python, but also because of some of the Twisted features it embraces.
In particular we needed flow-control (in the form of an
IPushProducer).  Real-time audio transmission over a web socket needs
a means to monitor TCP back-pressure and adjust the stream.  The
Autobahn/Twisted framework made it pretty easy to implement this
requirement.

This - in particular - is satisfying to hear;) Since a) the relevancy of flow-control and TCP-backpressure adaption by the an app is a topic that many seem to be unaware of, and b) while the WS protocol was still in the cooking on the IETF, some wanted to introduce strange features just to workaround in a half-baked way, since they didn't understand / couldn't support arbitrary streaming and backpressure-to-app scenarios
and I've been fighting hard to avoid these;)

This lesson was apparent in the application.  Without backpressure monitoring, we could not control latency properly.


Twisted of course got it right years ago (providing Producer/Consumer machinery) and AutobahnPython has embraced that from the very beginning.

For people who wanna read more, here are 2 starters:

http://autobahn.ws/python/tutorials/producerconsumer

https://github.com/tavendo/AutobahnPython/tree/master/examples/websocket/streaming

This even includes an example of a sending a continous, backpressure controlled stream of data as a single WS message (efffectively rendering WS a fancy prelude to "raw TCP").

Cheers
/Tobias


And cheers to you!
-Tom 

Reply all
Reply to author
Forward
0 new messages