Fwd: [codec] questions to be answered

0 views

Skip to first unread message

Alexander Chemeris

unread,

Nov 4, 2009, 7:45:52 AM11/4/09

to village-...@googlegroups.com

Hi list,

I'm forwarding an interesting part of conversation from IETF's codec
mailing list here - I think it may be interesting for village-telco
participants. I've included three a-mail in this forward, so you need
to read this mail bottom-up.

[codec] mailing list is IETF pre-WG effort aiming at creating new
audio codec for IP-based interactive applications. It attracted a great
attention from both open-source community (Jean-Marc included)
and from proprietary codec developers (Skype, Spirit, etc) who're
taking an effort to publish some of their codecs as RFCs and are
ready to contribute in the new codec development.

I also believe it would be great to compactly formulate village-telco
specific requirements for the voice codec and post it there. This
task is in my queue for quite some time and if anyone can help
me formulating and participate in following discussion on the codec
list, I would greatly appreciate this.

PS Second CODEC BoF will be held during next IETF'76 and
as usual, everyone will be able to participate with audio streaming
and jabber. It will be 13:00-15:00 November 12, 2009.
https://datatracker.ietf.org/meeting/76/agenda.html

---------- Forwarded message ----------
From: Michael Knappe <mkn...@juniper.net>
Date: Tue, Nov 3, 2009 at 21:50
Subject: Re: [codec] questions to be answered
- Hide quoted text -
To: "b...@brianrosen.net" <b...@brianrosen.net>, "ron.ev...@gmail.com"
<ron.ev...@gmail.com>, "mram...@cisco.com" <mram...@cisco.com>,
"ho...@uni-tuebingen.de" <ho...@uni-tuebingen.de>, "co...@ietf.org"
<co...@ietf.org>

The language sensitivity was empirical from work in early voip
deployments I was involved with in international markets. Always made
an attempt to stay below 100 ms if at all possible. On large
deployments, the 5 percent 'outliers' become very important in terms
of customer acceptance of new technology (new at the time). You could
counter that since the early days of voip, cell phone quality has
changed the delay acceptability bar as well with longer delays
becoming more the norm. What I really like about the proposed codec
activity is moving internet communications towards being the
high-quality choice (frequency range, number of channels, etc) rather
than the low-cost quality compromise that voip began its life as.

The 150 ms 'knee' was exactly that, with the impact on conversation
grew increasingly until you hit the 'CB radio' half-duplex asymptote
past the 400-600 ms range. Agreed that below 150 ms, the quality
impact curve quickly for the average listener/speaker levels out
quickly. You raise an interesting point about argumentative speech and
low delay, involved multi-party conference communications should keep
delay as low as possible.

Codec algorithmic delay is of course just a fraction of the end-to-end
delay equation, with the relative percent contribution of network
quality (both absolute delay and jitter) increasing as you optimize
codec delay. Beyond the scope of the proposed WG, but worth keeping in
mind in assessing the impact of codec delay on its own.

Mike

----- Original Message -----
From: Brian Rosen <b...@brianrosen.net>
To: Michael Knappe; ron.ev...@gmail.com <ron.ev...@gmail.com>;
mram...@cisco.com <mram...@cisco.com>; ho...@uni-tuebingen.de
<ho...@uni-tuebingen.de>; co...@ietf.org <co...@ietf.org>
Sent: Tue Nov 03 12:09:24 2009
Subject: Re: [codec] questions to be answered

Could you cite the research for that?

We did some work in that space and discovered there were no cultural biases;
the timing was a factor in having arguments, interruptions, or fast change
of speaker, where turn taking behavior is not as controlled as it is
usually. With delay longer than 150, arguments weren't possible naturally,
and behavior suffered significantly. We only tested a limited range of
cultural differences, but they included US/Canada, Europe and Asia.
Granted, arguments in say, many Asian environments were much less common,
but our work suggested that it still mattered a whole lot. Of course if the
entire interaction is one way (a lecture with no questions), then delay is
not important as long as lip sync is achieved.

We did not find any advantage of having delay less than around 150. Of
course, there is a range of human responses, so that 150 is a 95% or so
threshold. We found the affect of delay is a cliff: once you go over the
edge, you can't converse normally (under some amount of stress), and making
it, say, 250ms vs 400 ms didn't help much. We noticed no difference under
100ms down to effectively zero.

Lip sync was an issue, but factoring out the delay for the audio, we did not
notice any delay issues for the video itself until it got really big (up
around 1/2 sec). Then you see some artifacts.

Our research (it was at Marconi, not where I am presently) was not
published, but it was pretty thorough. We also concluded that delay was
much more important than image quality: in our experiments, a QCIF image
with low delay was preferred overwhelmingly compared to a very high quality
image with lots of delay.

Brian

On 11/3/09 1:46 PM, "Michael Knappe" <mkn...@juniper.net> wrote:

> Agreed, thanks.
>
> One quick note on the 150 ms end-to-end requirement. That is a general
> rule-of-thumb (and ITU specified) requirement for interactive speech
> responsiveness and ease of conversation, but one that digging deeper is
> dependent on language, cultural, and use case factors. There are conference
> situations which are more forgiving to longer delays, and conversely language
> types that require tighter delay to avoid emotional misinterpretation of
> verbal acknowledgements.
>
> Mike
>