At 37bits/sec, A Wide band ( upto 32KHz) Speech Codec

Nimo

unread,

Sep 12, 2011, 2:57:45 AM9/12/11

to

Hi there,

After a long time, back to my group.

well, will keep the stuff to the point, if you have any doubts,
I'm always here to help you..

// A New Speech Codec Based upon Advanced Tensor Basis & Galerkin
Techniques //

ALGORITHM BITRATE(s) MOS QUALITY SUBJECTIVE OPINION DELAY

*** 37 bps 5 Transparent
imperceptible 10th part delay in a sec

I'm getting CD quality "speech" at 37 bits / sec.

A checkmate to G.series stuff, AMR, speex etc.

Any Ideas, immediately, I mean as fast as possible I can go out with
this technology.

Important Links:-

Academic / Research
http://www.ircc.iitb.ac.in/IRCC-Webpage/patent273.jsp

Commercial
http://www.voiceage.com/index.php
http://www.sipro.com/

greetings
so long
nimo
This is to Thomas, pls shoot..

Jim Leonard

unread,

Sep 13, 2011, 10:52:19 AM9/13/11

to

On Sep 12, 1:57 am, Nimo <azeez...@gmail.com> wrote:
> I'm getting CD quality "speech" at 37 bits / sec.

Functional example please?

Nimo

unread,

Sep 13, 2011, 10:36:59 PM9/13/11

to

Sorry, I didn't get you ...?

so long
nimo

tom st denis

unread,

Sep 14, 2011, 7:33:58 AM9/14/11

to

decoder + sample compressed stream please.

Tom

Industrial One

unread,

Sep 14, 2011, 9:01:24 AM9/14/11

to

There is no possible way speech can be encoded at any recognizable
quality at only 37 kbps unless it was a text-to-speech routine because
37 kbps is barely enough to even encode flowing text losslessly.

Industrial One

unread,

Sep 14, 2011, 9:32:21 AM9/14/11

to

It appears I mispoke. That paragraph of mine above is exactly 200
bytes and takes about 14-15 seconds to recite aloud, that is about 14
bytes/s or 112 bits/s. Just where the hell do you expect to store the
very complex information such as my voice, intonation, how often I
pause, how fast I talk etc.?

Nimo

unread,

Sep 14, 2011, 9:51:47 AM9/14/11

to

wait baby, getting US patent and then game starts..

Jim Leonard

unread,

Sep 14, 2011, 10:12:06 AM9/14/11

to

On Sep 14, 8:51 am, Nimo <azeez...@gmail.com> wrote:
>
> > decoder + sample compressed stream please.
>

> wait baby, getting US patent and then game starts..

I think you'll save yourself a lot of grief and heartache if you
create a decompressor + sample compressed stream *before* you bother
with patents. Until you do, you have nothing worth protecting with a
patent.

Alex Mizrahi

unread,

Sep 14, 2011, 11:00:37 AM9/14/11

to

>> There is no possible way speech can be encoded at any recognizable
>> quality at only 37 kbps unless it was a text-to-speech routine because
>> 37 kbps is barely enough to even encode flowing text losslessly.
>
> It appears I mispoke. That paragraph of mine above is exactly 200
> bytes and takes about 14-15 seconds to recite aloud, that is about 14
> bytes/s or 112 bits/s. Just where the hell do you expect to store the
> very complex information such as my voice, intonation, how often I
> pause, how fast I talk etc.?

We can put it in other way: 37 bits/second gives you 137*10^9 possible
different seconds of speech. Does that match number of different sounds
human can make in a second?

Nimo

unread,

Sep 14, 2011, 9:50:56 AM9/14/11

to

1. When a distinguished but elderly scientist states that something is
possible,
he is almost certainly right. When he states that something is
impossible, he is very probably wrong.
2.The only way of discovering the limits of the possible is to venture
a little way past them into the impossible.
3.Any sufficiently advanced technology is indistinguishable from
magic.

Clarke's three laws.

wait few days(yes, not weeks just daysl. ICQ is going to be in
history ..)

Earl_Colby_Pottinger

unread,

Sep 14, 2011, 3:22:30 PM9/14/11

to

Clarke's Laws are over-ridden by the Idiom of 'Fool me once, shame on
you; fool me twice, shame on me', lame claims like your's always fail
- ALWAYS!

Why if you really had something did you not prepare in advance and
have it ready before announcing it? If you only needed days, why not
wait those few days be posting.

Like all the lamers before you, you hoped to get people praising you
over something you never had. I predict that a week from now you will
have some weak excuse why you can't release a test suite. Lamers
like you always do that all the time, year after year.

Words are just hot air, working code is KING!

Willem

unread,

Sep 14, 2011, 3:39:52 PM9/14/11

to

Earl_Colby_Pottinger wrote:
) On Sep 14, 9:50?am, Nimo <azeez...@gmail.com> wrote:
)> 1. When a distinguished but elderly scientist states that something is
)> possible,
)> ?he is almost certainly right. When he states that something is
)> impossible, he is very probably wrong.
)> 2.The only way of discovering the limits of the possible is to venture
)> a little way past them into the impossible.
)> 3.Any sufficiently advanced technology is indistinguishable from
)> magic.
)>
)> Clarke's three laws.
)>
)> ? ?wait few days(yes, not weeks just daysl. ICQ is going to be in
)> history ..)
)
) Clarke's Laws are over-ridden by the Idiom of 'Fool me once, shame on
) you; fool me twice, shame on me', lame claims like your's always fail
) - ALWAYS!
)
) Why if you really had something did you not prepare in advance and
) have it ready before announcing it? If you only needed days, why not
) wait those few days be posting.
)
) Like all the lamers before you, you hoped to get people praising you
) over something you never had. I predict that a week from now you will
) have some weak excuse why you can't release a test suite. Lamers
) like you always do that all the time, year after year.
)
) Words are just hot air, working code is KING!

Speech compression at 37kbps seems quite plausible, although
the number 37 seems a bit arbitrary. I would expect 32 or 40.

ADPCM speech compression does 12, 24, 32 or 40kbps, for example.

I'd worry more about the 'CD quality' claim, which is rather more
subjective.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Peter Schepers

unread,

Sep 14, 2011, 3:47:22 PM9/14/11

to

On 14/09/2011 3:39 PM, Willem wrote:
> Speech compression at 37kbps seems quite plausible, although
> the number 37 seems a bit arbitrary. I would expect 32 or 40.

But the claim is 37 _bits_ per second (as the subject header says), not
kbps.

PS

Jim Leonard

unread,

Sep 14, 2011, 4:26:01 PM9/14/11

to

On Sep 14, 8:50 am, Nimo <azeez...@gmail.com> wrote:
> Clarke's three laws.

Ah yes, the wonderful and talented science FICTION writer.

Did you offer the above as proof your claims are fiction?

Industrial One

unread,

Sep 14, 2011, 7:45:04 PM9/14/11

to

Given there are 6 billion people on the planet each whom have their
own unique voice and that a few words can fit into one second, where
there are 50,000 common words in English alone, just one language out
of many and that there are limitless different combinations of
intonation, pauses, slurs and stutters, slowing down/speeding up
speech I would say hell yeah there are way more than 137 billion
different possible combinations in one second of speech.

Text is not even possible to losslessly encode at 37 bps in typical
cases and you believe it can be done with audio? Put the crackpipe
down, nigga. You's hallucinatin'.

glen herrmannsfeldt

unread,

Sep 14, 2011, 10:08:27 PM9/14/11

to

Industrial One <industr...@hotmail.com> wrote:

(snip on audio voice compression to 37bits/s)

> Given there are 6 billion people on the planet each whom have their
> own unique voice and that a few words can fit into one second, where
> there are 50,000 common words in English alone, just one language out
> of many and that there are limitless different combinations of
> intonation, pauses, slurs and stutters, slowing down/speeding up
> speech I would say hell yeah there are way more than 137 billion
> different possible combinations in one second of speech.

Well, it only takes 33 bits to describe which of the 6 billion
people is speaking. If some bits in the beginning describe the
voice of the person speaking, those bits don't have to be resent
for every word. So 37 is a little low, but if you only indicate
phonemes, and previously the specifics of the voice of the specific
person, it could be pretty low.

> Text is not even possible to losslessly encode at 37 bps in typical
> cases and you believe it can be done with audio? Put the crackpipe
> down, nigga. You's hallucinatin'.

Some people speak (and read) slower than others.

-- glen

Industrial One

unread,

Sep 15, 2011, 10:28:06 AM9/15/11

to

On Sep 15, 2:08 am, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:

> Industrial One <industrial_...@hotmail.com> wrote:
>
> (snip on audio voice compression to 37bits/s)
>
> > Given there are 6 billion people on the planet each whom have their
> > own unique voice and that a few words can fit into one second, where
> > there are 50,000 common words in English alone, just one language out
> > of many and that there are limitless different combinations of
> > intonation, pauses, slurs and stutters, slowing down/speeding up
> > speech I would say hell yeah there are way more than 137 billion
> > different possible combinations in one second of speech.
>
> Well, it only takes 33 bits to describe which of the 6 billion
> people is speaking.

You don't get it. 6 billion is not an upper limit, there could be 600
billion tomorrow and they would still have distinct voices. You can't
compress contents just by indexing it for the same reason that you
can't compress all 750,000 existing movies in the world to 20 bits.
You would have to include the library containing the contents. In this
case, you would need 6 billion 22 khz audio samples. That's about a
264 GB library + the 37 bits per second for whatever I wanna compress.
Oh wait, it won't recognize the 6,000,000,001st person born tomorrow
because his voice profile isn't in the library. Damn! Back to where
we've started with a 22 khz mono .WAV recording which gives you the
freedom to record whatever you want perfectly because it doesn't care
about the content, only asks for a measly 352 kilobits of info per
second.

> If some bits in the beginning describe the
> voice of the person speaking, those bits don't have to be resent
> for every word. So 37 is a little low, but if you only indicate

A voice profile would probably be at least 100 KB, just for the
characteristics of the vocal chords.

> phonemes, and previously the specifics of the voice of the specific
> person, it could be pretty low.

You would still be missing the intonation info of the person so they
would end up sounding like a robotic text-to-speech program like
Microsoft Sam.

> > Text is not even possible to losslessly encode at 37 bps in typical
> > cases and you believe it can be done with audio? Put the crackpipe
> > down, nigga. You's hallucinatin'.
>
> Some people speak (and read) slower than others.
>
> -- glen

At 37 bps it would take 43 seconds to read 40 words, thats about 3/4
of a second per syllable. Nobody except a retard talks that slow.

glen herrmannsfeldt

unread,

Sep 15, 2011, 1:43:00 PM9/15/11

to

Industrial One <industr...@hotmail.com> wrote:

(snip, and previous snip, on audio voice compression to 37bits/s)

>> > Given there are 6 billion people on the planet each whom have their
>> > own unique voice and that a few words can fit into one second, where
>> > there are 50,000 common words in English alone, just one language out
>> > of many and that there are limitless different combinations of
>> > intonation, pauses, slurs and stutters, slowing down/speeding up
>> > speech I would say hell yeah there are way more than 137 billion
>> > different possible combinations in one second of speech.

>> Well, it only takes 33 bits to describe which of the 6 billion
>> people is speaking.

> You don't get it. 6 billion is not an upper limit, there could be 600
> billion tomorrow and they would still have distinct voices. You can't
> compress contents just by indexing it for the same reason that you
> can't compress all 750,000 existing movies in the world to 20 bits.

You do have to be careful as to what the problem is. You can
compress the movies down if I happen to live next to a video store.
Then you only need enough bits to tell me which DVD to grab.

> You would have to include the library containing the contents. In this
> case, you would need 6 billion 22 khz audio samples. That's about a
> 264 GB library + the 37 bits per second for whatever I wanna compress.
> Oh wait, it won't recognize the 6,000,000,001st person born tomorrow

OK, lets ignore the OP's 37b/s and consider how close one can
come with how many bits.

>> If some bits in the beginning describe the
>> voice of the person speaking, those bits don't have to be resent
>> for every word. So 37 is a little low, but if you only indicate

> A voice profile would probably be at least 100 KB, just for the
> characteristics of the vocal chords.

That sounds a little larger than I would have suggested, but it
depends on how close you want to get. I will guess that you can
get close enough for someone to recognize the person with less.

>> phonemes, and previously the specifics of the voice of the specific
>> person, it could be pretty low.

> You would still be missing the intonation info of the person so they
> would end up sounding like a robotic text-to-speech program like
> Microsoft Sam.

Or like Watson on Jeopardy! (rerun last night, if you missed it).

So add some more bits for intonation.

> At 37 bps it would take 43 seconds to read 40 words, thats about 3/4
> of a second per syllable. Nobody except a retard talks that slow.

But it isn't off by a huge factor. Also, you can still do
ordinary text compression on it.

You can cache the vocal tract characteristics for future calls, too.

-- glen

Industrial One

unread,

Sep 15, 2011, 4:21:06 PM9/15/11

to

On Sep 15, 5:43 pm, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:

> Industrial One <industrial_...@hotmail.com> wrote:
>
> (snip, and previous snip, on audio voice compression to 37bits/s)
>
> >> > Given there are 6 billion people on the planet each whom have their
> >> > own unique voice and that a few words can fit into one second, where
> >> > there are 50,000 common words in English alone, just one language out
> >> > of many and that there are limitless different combinations of
> >> > intonation, pauses, slurs and stutters, slowing down/speeding up
> >> > speech I would say hell yeah there are way more than 137 billion
> >> > different possible combinations in one second of speech.
> >> Well, it only takes 33 bits to describe which of the 6 billion
> >> people is speaking.
> > You don't get it. 6 billion is not an upper limit, there could be 600
> > billion tomorrow and they would still have distinct voices. You can't
> > compress contents just by indexing it for the same reason that you
> > can't compress all 750,000 existing movies in the world to 20 bits.
>
> You do have to be careful as to what the problem is. You can
> compress the movies down if I happen to live next to a video store.
> Then you only need enough bits to tell me which DVD to grab.

Doesn't change the fact that you've compressed nothing. The DVDs
remain 4.7 gigs.

> > You would have to include the library containing the contents. In this
> > case, you would need 6 billion 22 khz audio samples. That's about a
> > 264 GB library + the 37 bits per second for whatever I wanna compress.
> > Oh wait, it won't recognize the 6,000,000,001st person born tomorrow
>
> OK, lets ignore the OP's 37b/s and consider how close one can
> come with how many bits.

You're still operating from a wrong premise. When you really get down
to it you'll just end up where you've started and realize that
losslessly you can only compress it by half and end up with 176 kbps.

Think about it, how many samples per second do you need to reproduce
high-quality sound for speech and catch even the highest-pitched queer
voice? 22,050, that's already a bitrate in the kilobits. How many bits
do you need per sample for a faithful amplitude resolution that will
represent every possible loud or quiet element? 16 bits. 352,800 to
index which of the possible 2^352800 combos our specific recorded
sound is. These are some reality numbers for you, mang.

There are many many people out there who you will never meet or listen
to any of their speeches so naturally you wouldn't give a shit if your
audio library wouldn't be able to compress their spoken words but the
fact remains that they do exist, and 2^352800 potentially exist, not
2^37. The minute your compressor discriminates, the minute it fails.

> >> phonemes, and previously the specifics of the voice of the specific
> >> person, it could be pretty low.
> > You would still be missing the intonation info of the person so they
> > would end up sounding like a robotic text-to-speech program like
> > Microsoft Sam.
>
> Or like Watson on Jeopardy! (rerun last night, if you missed it).
>
> So add some more bits for intonation.

That would be a hell of a lot of bits. Intonation is highly complex,
context-dependant and for the most part unique to each person. That
Watson robot isn't even the best example of robotic speech as his
voice was clearly programmed with modern techniques to make him sound
as natural as possible, his intonation is noticeably reduced but not
completely lacking.

> > At 37 bps it would take 43 seconds to read 40 words, thats about 3/4
> > of a second per syllable. Nobody except a retard talks that slow.
>
> But it isn't off by a huge factor. Also, you can still do
> ordinary text compression on it.

Last I recall, 7-zip with maximum settings only compresses text by
about half. 37 bps is a reading speed of less than one word per
second, and even two words per second is slow bordering on legally
retarded.

> You can cache the vocal tract characteristics for future calls, too.

Irrelevant.

Thomas Richter

unread,

Sep 18, 2011, 8:08:44 AM9/18/11

to

On 12.09.2011 08:57, Nimo wrote:
> Hi there,
>
> After a long time, back to my group.
>
> well, will keep the stuff to the point, if you have any doubts,
> I'm always here to help you..
>

> // A New Speech Codec Based upon Advanced Tensor Basis& Galerkin

> Techniques //
>
>
> ALGORITHM BITRATE(s) MOS QUALITY SUBJECTIVE OPINION DELAY
>
> *** 37 bps 5 Transparent
> imperceptible 10th part delay in a sec
>
>
>
> I'm getting CD quality "speech" at 37 bits / sec.
>
>
> A checkmate to G.series stuff, AMR, speex etc.
>
>
> Any Ideas, immediately, I mean as fast as possible I can go out with
> this technology.

Does this refer to the link you gave? I don't see a MOS of 5 on this patent.

But allow me to make a couple of comments:

- If you provide a MOS score, you should specifically say what the task
was you gave to the observers: Understand the words (text-to-speech
would suffice), get the pronounciation (vocoding possible), recognize
the speaker?

- The MOS scores are very sloppy, no error bars. Unclear how many
observers were used to perform the experiments.

What is "CD speech" quality? The quality of a vocoder can be very high,
yet it might be not what has been asked for. The G.series are for
natural speech representation in a non-parametric way, allowing to
identify the speaker etc... Of course parametric coding is known, and
text-to-speech is known, but still, if I make a phone call, I would be
very disappointed if all I get would be a vocoder I talk to, even if the
quality is very high, and thus none of these techniques ended the ITU-T
G-series of standards.

Thus, at least, I can accuse the original "inventors" of making very
unscientific comparisons, or not telling me what the actual intend of
the work should be.

Greetings,
Thomas