Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Audio CODECs
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  6 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Don Y  
View profile  
 More options Feb 10, 1:09 am
Newsgroups: comp.dsp
From: Don Y <t...@isnotme.com>
Date: Thu, 09 Feb 2012 23:09:25 -0700
Local: Fri, Feb 10 2012 1:09 am
Subject: Audio CODECs
Hi,

I'm looking some pointers concerning the design of lossless
audio (plus "silence") codecs.

I want to deploy these on either end of a packet switched
network (coder at server, decoder at client).  I.e., they
are intended primarily for communication bandwidth reduction.
Push content into coder, pass over network, extract content
via decoder, *consume* (and discard).  I.e., the system makes
the network look like a long "virtual wire".

The decoder needs to be *fast*.  Ideally, suitable for on-the-fly
operation (i.e., having to "expand" an entire frame "in place"
is less desirable than being able to expand it AS CONSUMED).

[[[Note:  I am targeting general purpose MCU's, not DSP's!]]]

Smaller frame sizes are better than larger ones (requires
less resources to hold in the client WHILE CONSUMING).  And,
bigger frames mean bigger packets mean supporting fragmentation
and reassembly in the protocol stack, etc.  Alternatively,
makes the data stream more sensitive to dropped fragments.

(Some/much) content can be encoded a priori (e.g., as in
a media server) so the cost of coding or transcoding can
be considerably higher than decoding.  OTOH, it shouldn't
be prohibitively higher precluding any "real-time" use.

The type of source material shouldn't have a dramatic effect on
the efficiency or cost of either coder or decoder (speech, music,
etc. -- don't worry about "white/pink/chartreuse/etc noise")

 From some observations of existing CODECS (open and proprietary):

All try to encapsulate a variety of different source formats:
bits per sample, samples per second, seek points, tags, etc.

All try to apply different compression strategies which are
then encoded in the data stream.

Most seem to treat the source material as discrete "sessions"
(song 1, song 2, etc.) instead of an endless *stream* of content.

It appears that most compression gains come from exploiting
the reduced bandwidth of the difference channel.  This only
works if you have two (related) source channels -- i.e., the
compression is less remarkable for mono sources.

Coder efficiencies (costs) tend to vary, greatly.  Often the
added "expense" results in very little additional gain in
compression (which can't be determined a priori).  While this
isn't significant for "batch" applications (encode, then store
for later distribution), it can be a deal breaker for "live"
content.

So...

In this sort of dedicated application, many of the "features"
of these CODECs are superfluous or redundant.  E.g., you can
probably fix the sample rate and compensate for variations
in source materials in the encoder (this makes it simpler for
the decoder to blindly reproduce that content without concern
for the actual sample rate of the original source).  Ditto
for sample sizes.

But, I'm not sure if you can as easily discard the adaptive
coding (decoding) strategies without knowing more about the
actual signal you will be encountering.

Do certain models/predictors/encodings tend to solve most
of the coding problem -- with the others present to cover
special cases?  E.g., without a difference channel, it
doesn't seem that RLE for the residual would be of much
use (?)

Anything else I've missed as shortcuts to reduce the
complexity of the coder/decoder?  Any risks that these
shortcuts might have lurking inside them?  Pathological
cases that could (realistically) be encountered?

Any suggestions as to developing/acquiring a versatile
test suite with which to gauge performance?  Or, just
pick some of the things that are *likely* to pass down
the wire?

I've implemented this with a couple of different "open"
CODECs and am now trying to determine if there are any
changes that are worthwhile to attempt to improve these
criteria.

Thx!
--don

[Apologies if I don't reply quickly.  I'm rearranging machines
here so my news server is a mess -- and likely to get worse
before it "recovers".  I may try reading news via google just
to keep abreast of anything posted, here. ]


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Vladimir Vassilevsky  
View profile  
 More options Feb 10, 11:34 am
Newsgroups: comp.dsp
From: Vladimir Vassilevsky <nos...@nowhere.com>
Date: Fri, 10 Feb 2012 10:34:40 -0600
Local: Fri, Feb 10 2012 11:34 am
Subject: Re: Audio CODECs

Don Y wrote:
> I'm looking some pointers concerning the design of lossless
> audio (plus "silence") codecs.

The design is trivial: backwards adaptive predictor followed by
conventional Huffman coder.

> I want to deploy these on either end of a packet switched
> network (coder at server, decoder at client).  I.e., they
> are intended primarily for communication bandwidth reduction.

Loseless audio compressor is hardly useful in this scenario, as it does
not guarantee a fixed bandwidth.

> The decoder needs to be *fast*.

Then omit backward adaptation. Transmit forward prediction coefficients
over the channel.

> [[[Note:  I am targeting general purpose MCU's, not DSP's!]]]

Like what, for example?

> Smaller frame sizes are better than larger ones (requires
> less resources to hold in the client WHILE CONSUMING).  And,
> bigger frames mean bigger packets mean supporting fragmentation
> and reassembly in the protocol stack, etc.  Alternatively,
> makes the data stream more sensitive to dropped fragments.

That's irrelevant.
Large buffers will be needed to cope with the variable data rate.

> The type of source material shouldn't have a dramatic effect on
> the efficiency or cost of either coder or decoder (speech, music,
> etc. -- don't worry about "white/pink/chartreuse/etc noise")

The compression ratio is going to be only about 50% or so.
Then why bother about compression at all?

>  From some observations of existing CODECS (open and proprietary):

> All try to encapsulate a variety of different source formats:
> bits per sample, samples per second, seek points, tags, etc.

> All try to apply different compression strategies which are
> then encoded in the data stream.

The codecs are LOSELESS. Once the session is started, you can't drop any
  data.

> Most seem to treat the source material as discrete "sessions"
> (song 1, song 2, etc.) instead of an endless *stream* of content.

So you will have to parse the uninterruptible stream from very beginning
in the last year. If you loose a packet, you are lost.

> So...

[...]

So.

I am tired of your bleat. Stop here and do anything useful other then
spewing internet with nonsense.

Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
brent  
View profile  
 More options Feb 10, 12:18 pm
Newsgroups: comp.dsp
From: brent <buleg...@columbus.rr.com>
Date: Fri, 10 Feb 2012 09:18:41 -0800 (PST)
Local: Fri, Feb 10 2012 12:18 pm
Subject: Re: Audio CODECs
On Feb 10, 11:34 am, Vladimir Vassilevsky <nos...@nowhere.com> wrote:

> I am tired of your bleat.

Ha ha.  To me, it seems that you Live for this kind of "bleat".

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
glen herrmannsfeldt  
View profile  
 More options Feb 10, 2:36 pm
Newsgroups: comp.dsp
From: glen herrmannsfeldt <g...@ugcs.caltech.edu>
Date: Fri, 10 Feb 2012 19:36:16 +0000 (UTC)
Local: Fri, Feb 10 2012 2:36 pm
Subject: Re: Audio CODECs

Vladimir Vassilevsky <nos...@nowhere.com> wrote:

(snip)

> The design is trivial: backwards adaptive predictor followed by
> conventional Huffman coder.
>> I want to deploy these on either end of a packet switched
>> network (coder at server, decoder at client).  I.e., they
>> are intended primarily for communication bandwidth reduction.
> Loseless audio compressor is hardly useful in this scenario, as it does
> not guarantee a fixed bandwidth.

(snip)

>> All try to apply different compression strategies which are
>> then encoded in the data stream.
> The codecs are LOSELESS. Once the session is started, you
> can't drop any  data.

Not very useful in the real world, though. Even back to the
beginning of CDs, there is error concealment when error
correction fails.

There used to be stories (maybe still are) of testing CD players
with a CD with a black wedge on it. The wedge blocks the light
for an ever increasing length of time each revolution. You can
then listen, and see at what point the error correction fails,
and how well the concealment sounds. Thinks like that used to
be in reviews for CD players, but maybe not anymore.

With VoIP, you never know what will happen on the net, in terms
of delayed or lost packets. Some kinds of concealment is needed.

-- glen


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Don Y  
View profile  
 More options Feb 10, 8:08 pm
Newsgroups: comp.dsp
From: Don Y <t...@isnotme.com>
Date: Fri, 10 Feb 2012 18:08:28 -0700
Local: Fri, Feb 10 2012 8:08 pm
Subject: Re: Audio CODECs
Hi Vladimir,

On 2/10/2012 9:34 AM, Vladimir Vassilevsky wrote:

> Don Y wrote:

>> I'm looking some pointers concerning the design of lossless
>> audio (plus "silence") codecs.

> The design is trivial: backwards adaptive predictor followed by
> conventional Huffman coder.

If it was "trivial", one-size-fits-all, someone would have
designed, patented, and commercialized it -- and retired
to some sunny beach to drink "tuna" coladas and watch
cute young things frolick in the surf.

The fact that there are so many different CODECs testifies to
the non-triviality of the task.

>> I want to deploy these on either end of a packet switched
>> network (coder at server, decoder at client). I.e., they
>> are intended primarily for communication bandwidth reduction.

> Loseless audio compressor is hardly useful in this scenario, as it does
> not guarantee a fixed bandwidth.

It doesn't *have* to guarantee a fixed bandwidth.  It just has to
nominally afford some reduction in required bandwidth to offset
the implementation cost.

>> The decoder needs to be *fast*.

> Then omit backward adaptation. Transmit forward prediction coefficients
> over the channel.

>> [[[Note: I am targeting general purpose MCU's, not DSP's!]]]

> Like what, for example?

Like whatever the implementor decides is appropriate!  If
you can't design hardware, you might port it to a PC platform.
If you've got a DSP in an existing product, port it there.
Etc.

Tying an implementation to a particular hardware platform is
"premature optimization".  Figure out what needs to be done
(with your best effort) and use that to determine the minimum
requirements for any hosting platform.

>> Smaller frame sizes are better than larger ones (requires
>> less resources to hold in the client WHILE CONSUMING). And,
>> bigger frames mean bigger packets mean supporting fragmentation
>> and reassembly in the protocol stack, etc. Alternatively,
>> makes the data stream more sensitive to dropped fragments.

> That's irrelevant.
> Large buffers will be needed to cope with the variable data rate.

No.  The model that you chose for the coder (and thus, decoder)
determines the resources that will be needed.

Silly example:

You're going to be passing (pure) sine waves down the wire.
I can encode that as 4 values:  amplitude, frequency, phase
and duration.

The decoder can take those four values and reconstruct an
equivalent sine wave with almost *0* resources at its
disposal.

This is why the knowledge of folks with first-hand experience
is worthwhile.  What models work best in which circumstances, etc.
An encoder for speech is not going to be as effective encoding
"music".  (speech has lots of silence).  OTOH, it might work
well encoding a (single) *singer*.

>> The type of source material shouldn't have a dramatic effect on
>> the efficiency or cost of either coder or decoder (speech, music,
>> etc. -- don't worry about "white/pink/chartreuse/etc noise")

> The compression ratio is going to be only about 50% or so.
> Then why bother about compression at all?

Because compressing is cheaper than running another wire or
upgrading the communications fabric to handle higher bandwidths.
Save a dollar on the processor and spend thousands on more cable?

>> From some observations of existing CODECS (open and proprietary):

>> All try to encapsulate a variety of different source formats:
>> bits per sample, samples per second, seek points, tags, etc.

>> All try to apply different compression strategies which are
>> then encoded in the data stream.

> The codecs are LOSELESS. Once the session is started, you can't drop any
> data.

That doesn't directly depend on the strategy used in doing the
compression.  Rather, that depends on how the protocol handles
errors/dropouts.

OTOH, a CODEC that has to handle music *and* speech might chose
to use different strategies to represent that data based on its
knowledge/examination of the data stream.

>> Most seem to treat the source material as discrete "sessions"
>> (song 1, song 2, etc.) instead of an endless *stream* of content.

> So you will have to parse the uninterruptible stream from very beginning
> in the last year. If you loose a packet, you are lost.

No.  Only if the coder stores no "absolute state" in the data stream.
This doesn't seem to be the case for the codecs that I've examined.
It would be a burden to implementors for that reason as well as
making the stream "unseekable" (or, only seekable at elevated
expense) -- your decoder would have to process all of the "skipped
over" content in order to accurately track state.

>> So...

> [...]

> So.

> I am tired of your bleat. Stop here and do anything useful other then
> spewing internet with nonsense.

Add me to your kill file.  Or, discipline yourself not to open
any posts with my name on them.  I don't try to "elude" filters.
I always post from the same IP, use the same news service, etc.
I'm sure someone like you should be able to figure out how to
rid yourself of these "unpleasant distractions".

If not, ask one of the kids in the neighborhood to show you how...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Don Y  
View profile  
 More options Feb 10, 9:09 pm
Newsgroups: comp.dsp
From: Don Y <t...@isnotme.com>
Date: Fri, 10 Feb 2012 19:09:02 -0700
Local: Fri, Feb 10 2012 9:09 pm
Subject: Re: Audio CODECs
Hi Glen,

On 2/10/2012 12:36 PM, glen herrmannsfeldt wrote:

I'm not even asking for the decoder to *fix* those errors,
dropouts, etc.  As long as it can INDICATE where errors
exist so that I can take my own remedial action.

But, the point of my statement ("different compression
strategies") was to elicit comments as to why certain
strategies/encodings are preferable to others AND IN
WHICH CIRCUMSTANCES.  I.e., why isn't a single strategy
employed?  Or, why only <some_number>?

> There used to be stories (maybe still are) of testing CD players
> with a CD with a black wedge on it. The wedge blocks the light
> for an ever increasing length of time each revolution. You can
> then listen, and see at what point the error correction fails,
> and how well the concealment sounds. Thinks like that used to
> be in reviews for CD players, but maybe not anymore.

> With VoIP, you never know what will happen on the net, in terms
> of delayed or lost packets. Some kinds of concealment is needed.

Yes.  Though you can do things to minimize the "discomfort"
of those errors -- within reason.  E.g., reproducing a
signal with periodic dropouts at a "high" frequency (e.g.,
so the signal sounds to be gated off, often) is probably
more annoying than dropping the connection.  Or, muting
until signal delivery is stable enough for "more pratical"
use.

If the decoder can indicate problems, "something" can
be done to resolve them in a manner that is appropriate
to the application.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »