My best guess at this point is going system-on-a-chip, exemplified by
Cell, Xbox360, or merging the GPU with a few CPUs. Revisiting on-chip
RAM might be promising too. But what more interesting options are
there?
(Can someone _please_ come up with an effective way to resume clock
scaling? :-)
Best,
Thomas
--
Thomas Lindgren
"Ever tried. Ever failed. No matter. Fail again. Fail better."
Most common workloads will struggle to make effective use of even
4 cores. Oh, yes, the bloatware will fill them up, but the end result
won't be any more useful or run faster.
Very, very few workloads can make use of 8 or more general-purpose cores
without much better memory subsystems than we have at present. Quite a
lot would be much better off with two general purpose cores and a SIMD
('vector', SSE-on-steroids etc.) core - it would be every bit as useful
and a lot simpler.
|> My best guess at this point is going system-on-a-chip, exemplified by
|> Cell, Xbox360, or merging the GPU with a few CPUs. Revisiting on-chip
|> RAM might be promising too. But what more interesting options are
|> there?
Well, yes, that that won't help with performance.
|> (Can someone _please_ come up with an effective way to resume clock
|> scaling? :-)
Wrong group. Try alt.miracle.working.
Regards,
Nick Maclaren.
Over the longer term, there is no need (Begin:: tounge in cheek)
Cosider a futuristic machine: Using 200-odd pins to support a DDR-III
interface running at 1600 MHz. Consider, further, a multiplexing
device between this DRAM and the on-die Memory Controller. So, now we
have the DRAM pins on the CPU die running in the 4 GHz range and can
achieve a peak DRAM BW of ~50 GBytes/sec of bandwidth per socket.
Consider, further, a 4-socket system, and ask the question: "How many
processors will this system support"?
If you want minimal degredation of the individual CPUs (e.g. near
linear scaling) current 3 GHz CPUs need about 10 GBytes/sec of DRAM BW/
CPU (and delivered in a low latency manner--but I digress). So, with a
4-socket system that enables 200 GBytes/sec of peak DRAM BW, one can
feed ONLY 20 CPUs (or 5 core threads per die)
If you are willing to tollerate moderate degredation of the individual
CPUs (e.g. good scaling) current 3 GHz CPUs need about 3 GBytes/sec of
DRAM BW/CPU. So a 4-socket system that enables 200 GBytes/sec of DRAM
BW, one can feed a maximun of 60-odd CPUs (or 15 per die)
Note: at present no interconnect can move this much data around, so
lots of pins will have to be thrown in this direction, and improving
fabric pin speeds is paramount. And then, one still has to apply data-
compression to the fabric pins.
So, given the current pin limitations of 1000-odd pins/package, there
is no need to contemplate systems with more than 60-odd current-
performance processors and chips with more than 15 (or so).
We are reaching the point where silicon technology is enabling an
exponential increase in the number of cores on a die, but at best we
can expect a linear increase in the number of pins. So, pretty soon we
should feel the impact of hitting this brick wall at full speed......
Take away the pins used for the interconnect and dedicate them to DRAM
and that one-packaged-part system can support only 30 CPUs. Hardly a
way forward.
What are the options::
A) One option is to increase the size of the biggest on-die cache to
keep the required DRAM BW constant as the number of CPU cores grow
exponentially. In order that each CPU core has about the same L-big
footprint, when one doubles the number of cores on a die, one needs to
double the size of the L-big. But in order to keep the DRAM BW
constant, the old rule of thumb is that one needs to quad the cache
size to half the miss rate. So to double the number of CPUs and have
each run as efficiently as before, the L-big cache size has to go up
8X for every doubling of the core count! With a 2MB cache being about
the size of the current big cores, and the current L-size of 2MBytes,
this is obviously unscalable.
B) the other option is to double the number of pins each processor
generation. This is obviously unmanufacturable.
C) learn to make money playing professional golf.
(/end: Tounge in cheek)
Back to your question: Can we get back to frequency scaling?
A) Yes, but only if we get rid of all those features that burn power
faster than they deliver performance. In the past, and on this NG, I
have described a simplified design point where all speculation and out-
of-orderness is removed from a CPU core. The end result is a core than
performs at half fo the rate of the great big core, but occupies only
8% of the die footprint of that big core. In addition, it operates at
2Watts/core down from the 35W/core of the big ones.
B) Once the core gets down into the 2W/core range, one can apply power
to drive this core from 3 GHz towards 6 GHz (depending upon how much
power the design is constrained). Even at this point, it will
substantially underperform the big core by 25% (while consuming only 8-
ish Watts).
C) So, frequency scaling can be done, but 1) not with the cores as
they are today, 2) you are still going to have to learn to program
them en masse. but
D) You won't ever have to learn to program more than about 64.
Mitch
There are some very common workloads that stick out. The simple
forwarding of ip packets given a large enough routing table is one
of these.
>Very, very few workloads can make use of 8 or more general-purpose cores
>without much better memory subsystems than we have at present. Quite a
>lot would be much better off with two general purpose cores and a SIMD
>('vector', SSE-on-steroids etc.) core - it would be every bit as useful
>and a lot simpler.
I sit just in the middle of some processing that can utilise lots
of processing on service processors. It is as simple as transcoding
of sound. Multiple 32-way off-board solutions are used to feed
the monster today.
I would ve very happy to see code improvements, but some of the
better organisations in the world has met the wall in terms of
reducing cpu time.
>|> My best guess at this point is going system-on-a-chip, exemplified by
>|> Cell, Xbox360, or merging the GPU with a few CPUs. Revisiting on-chip
>|> RAM might be promising too. But what more interesting options are
>|> there?
>
>Well, yes, that that won't help with performance.
>
>|> (Can someone _please_ come up with an effective way to resume clock
>|> scaling? :-)
>
>Wrong group. Try alt.miracle.working.
-- mrr
In practice, I think it will still struggle :-(
The problem is moving enough data, including access to the routing table,
to justify the cores. While I agree that a large number of slow cores
is as good as a smaller number of fast ones (Sun have that right), I
don't see that CPU per se is the bottleneck.
|> >Very, very few workloads can make use of 8 or more general-purpose cores
|> >without much better memory subsystems than we have at present. Quite a
|> >lot would be much better off with two general purpose cores and a SIMD
|> >('vector', SSE-on-steroids etc.) core - it would be every bit as useful
|> >and a lot simpler.
|>
|> I sit just in the middle of some processing that can utilise lots
|> of processing on service processors. It is as simple as transcoding
|> of sound. Multiple 32-way off-board solutions are used to feed
|> the monster today.
Yes, but that is exceptional. You have explained why it is, and I
accept that, but the fact remains that it is a real outlier.
Regards,
Nick Maclaren.
[moving ip packets]
>
>In practice, I think it will still struggle :-(
>
>The problem is moving enough data, including access to the routing table,
>to justify the cores. While I agree that a large number of slow cores
>is as good as a smaller number of fast ones (Sun have that right), I
>don't see that CPU per se is the bottleneck.
Even the route lookups become a problem in it's own right once we
move to 500k+ routes, and ipv6 and all of that. This will hammer
a b-tree or similar with lookups.
>|> >Very, very few workloads can make use of 8 or more general-purpose cores
>|> >without much better memory subsystems than we have at present. Quite a
>|> >lot would be much better off with two general purpose cores and a SIMD
>|> >('vector', SSE-on-steroids etc.) core - it would be every bit as useful
>|> >and a lot simpler.
>|>
>|> I sit just in the middle of some processing that can utilise lots
>|> of processing on service processors. It is as simple as transcoding
>|> of sound. Multiple 32-way off-board solutions are used to feed
>|> the monster today.
>
>Yes, but that is exceptional. You have explained why it is, and I
>accept that, but the fact remains that it is a real outlier.
It is just the fuel of the future of the telephone industry. Everything
is going to be transcoded a few times once it hits the Internet, or the
mobile network. And "normal" phone lines will turn VoIP on the inside.
The cpu burners have only just started. The mobile industry is moving
from GSM06.10 FR and 06.20 HR which work at 20 and 40 mips to GSM 06.60 EFR
and GSM 06.90 AMR which need hundreds of mips per channel. The IP crowd
tends to go for iLBC and G.729. iLBC works at around 200 mips or 60 mflops,
or G.729 that can work at a long range from 40 mips to 200 mflops.
The whole industry is reeling under all this signal processing. It is
determining the shape of 3G phones and their huge batteries.
It may have been an outlier, but it is becoming important.
My task for the next week is to layout and document how to power and
ventilate racks full of these processors.
-- mrr
> A bit of speculation here. I would expect that the performance gains
> from multicore start tapering off at 8-32 cores per chip (or even
> sooner). That would mean we will be running out of steam fairly
> quickly (5 years or so?). So, my question is, what will be next?
Well, I work on something that bashes memory really hard, floating point
secondarily, and can make reasonable use of a few threads, say four.
What I'd like to see is ludicrously enormous on-chip caches: 64MB,
256MB, or even bigger. They should be quite buildable in a few years,
and the ability to get large amounts of code and data into cache at the
same time could speed things up quite a lot. It's a brute-force solution
to the memory performance barrier, but when you're dealing with billions
of transistors, a certain amount of brute force may be appropriate.
--
John Dallman, j...@cix.co.uk, HTML mail is treated as probable spam.
If it's NP-complete, more hardware is not going to improve the
performance much.
This sounds like an interesting problem. Can you supply more details?
It depends on the application - some will nicely use 128 cores, some
will not even gain anything from 2.
> |> (Can someone _please_ come up with an effective way to resume clock
> |> scaling? :-)
>
> Wrong group. Try alt.miracle.working.
I don't think the odds are that bad.
The problem isn't that it is not possible to make transistors faster
than silicon transistors.
Gallium Arsenide is faster.
And Indium Arsenide Phosphide is faster yet.
The problem is that it isn't yet possible to make very big and complex
chips out of those materials; a PDP-8 with a very high clock rate will
still be beaten by a 360/195 with a much slower one. And the *first*
Pentiums were a lot like the 360/195 architecturally.
But the big news today is Hafnium! Because that will allow making
silicon chips with smaller circuits, by supplying a more effective
insulating material that works with existing silicon CMOS technology.
So we will get another little dribble of clock scaling while we wait
for the other materials to move up from thousands of transistors to
millions.
John Savard
Instead of going multicore, I had thought of making a core big and
complex - and SIMD/vector instructions was one of the major additions.
Apparently I wasn't totally out to lunch.
http://www.quadibloc.com/arch/arcint.htm
of course.
John Savard
Sounds enormous today, yes.
But by the time we can get them, your needs will probably have gone yet
further and those 256MB caches will again seem just too small.
Stefan
> We are reaching the point where silicon technology is enabling an
> exponential increase in the number of cores on a die, but at best we
> can expect a linear increase in the number of pins. So, pretty soon we
> should feel the impact of hitting this brick wall at full speed......
>
> Take away the pins used for the interconnect and dedicate them to DRAM
> and that one-packaged-part system can support only 30 CPUs. Hardly a
> way forward.
If the hardware can't avoid making some parts of "memory" run at
significantly different speeds (relative to any given processor), then it
might as well expose the difference and allow programmers to explicitly
partition problems between fast (on-chip, or as good as) storage (for
which 32-bit addressing might *always* be sufficient) and slow storage
(which, like IPv6 addressing, might benefit from a 128-bit address space
even if it never approaches 2**64 bytes in total size).
So option (D) is to back off the switch to 64-bit addressing, since what
is really needed is a mixed model with both 32-bit (near) and 128-bit
(far) pointers.
Damn few can use 128 cores with any projected memory subsystem; the
latter becomes the bottleneck for almost all real workloads long
before that.
|> > |> (Can someone _please_ come up with an effective way to resume clock
|> > |> scaling? :-)
|> >
|> > Wrong group. Try alt.miracle.working.
|>
|> I don't think the odds are that bad.
|>
|> The problem isn't that it is not possible to make transistors faster
|> than silicon transistors.
|>
|> Gallium Arsenide is faster.
|>
|> And Indium Arsenide Phosphide is faster yet.
|>
|> ...
Yeah. Five times faster. About four years of scaling - for a change
that will take a decade to introduce.
You didn't ask about faster transistors, but about resuming scaling.
There are a LOT of technologies that can deliver a small factor of
the former, but none that I know of can deliver the latter.
Regards,
Nick Maclaren.
a) Are you hoping the multi-core thing will go away?
or
b) Looking for something totally different that will require
changing yet again all the software that hadn't even been
fully adapted to multi-core by that point?
>
> My best guess at this point is going system-on-a-chip, exemplified by
> Cell, Xbox360, or merging the GPU with a few CPUs. Revisiting on-chip
> RAM might be promising too. But what more interesting options are
> there?
>
> (Can someone _please_ come up with an effective way to resume clock
> scaling? :-)
>
a)
I think the question is premature since multi-core isn't really here
yet. Having a hardware feature that is present but not being exploited
doesn't count as being here. It's Intel's problem still. And it will
take time for software to adapt. Software has an unbelievable amout of
interia. Parallelizing a few low hanging apps doesn't count.
--
Joe Seigh
When you get lemons, you make lemonade.
When you get hardware, you make software.
Well, there was quite a bit of publicity about how Intel and IBM
announced breakthroughs in the use of Hafnium to produce insulators
that would allow smaller silicon circuits, so some scaling has also
been made possible, apparently.
John Savard
> C) So, frequency scaling can be done, but 1) not with the cores as
> they are today, 2) you are still going to have to learn to program
> them en masse. but
>
> D) You won't ever have to learn to program more than about 64.
You see massive ||ism going nowhere at all, or are you referring just to our
lifespan with it?
>
> Mitch
>
Regards
Klaus
While 64MB on-chip could even be feasible this year by means of 65nm-edram
mcp from a manufacturing point of view, i doubt total adressable market
would justify such a solution. Maybe others could comment if
cache-controllers of current designs could make (good) use of it - and as
well on whether a Torrenza-solution would make sense? (processor-surrogate
comes to mind)
Regards
Klaus
> Thomas Lindgren wrote:
> > A bit of speculation here. I would expect that the performance gains
> > from multicore start tapering off at 8-32 cores per chip (or even
> > sooner). That would mean we will be running out of steam fairly
> > quickly (5 years or so?). So, my question is, what will be next?
>
> a) Are you hoping the multi-core thing will go away?
> or
> b) Looking for something totally different that will require
> changing yet again all the software that hadn't even been
> fully adapted to multi-core by that point?
Neither.
> > My best guess at this point is going system-on-a-chip, exemplified by
> > Cell, Xbox360, or merging the GPU with a few CPUs. Revisiting on-chip
> > RAM might be promising too. But what more interesting options are
> > there? (Can someone _please_ come up with an effective way to resume
> > clock
> > scaling? :-)
> >
>
> a)
Am I reading some weird prickliness on your side, or is it just my
imagination? I'd prefer to express my opinions myself.
> I think the question is premature since multi-core isn't really here
> yet. Having a hardware feature that is present but not being exploited
> doesn't count as being here. It's Intel's problem still. And it will
> take time for software to adapt. Software has an unbelievable amout of
> interia. Parallelizing a few low hanging apps doesn't count.
Hardware designers work on a different schedule. From what you write,
it sounds like developers are having trouble with exploiting even
today's 2-4 core CPUs, but for the sake of argument, let's assume a
bit of headroom. Even if we get one or two more doublings in
cores/chip before running out of steam, I would still expect computer
architects to already be looking at alternatives to just increasing
the number of cores, because they want to use the real estate
productively and they will soon start building the successor chips.
The inertia of software just makes scaling up the number of cores a
less attractive option.
> On Mar 7, 10:30 am, Thomas Lindgren <***********@*****.***> wrote:
> > (Can someone _please_ come up with an effective way to resume clock
> > scaling? :-)
>
/.../
> Back to your question: Can we get back to frequency scaling?
>
> A) Yes, but only if we get rid of all those features that burn power
> faster than they deliver performance. In the past, and on this NG, I
> have described a simplified design point where all speculation and out-
> of-orderness is removed from a CPU core. The end result is a core than
> performs at half fo the rate of the great big core, but occupies only
> 8% of the die footprint of that big core. In addition, it operates at
> 2Watts/core down from the 35W/core of the big ones.
>
> B) Once the core gets down into the 2W/core range, one can apply power
> to drive this core from 3 GHz towards 6 GHz (depending upon how much
> power the design is constrained). Even at this point, it will
> substantially underperform the big core by 25% (while consuming only 8-
> ish Watts).
>
> C) So, frequency scaling can be done, but 1) not with the cores as
> they are today, 2) you are still going to have to learn to program
> them en masse. but
>
> D) You won't ever have to learn to program more than about 64.
OK, interesting.
The main problem seems to be getting things on/off chip, the old Von
Neuman bottleneck. Can the problem of limited pin bandwidth feasibly
be overcome or ameliorated by something exotic, like optics to the
chip? (I have no idea whether it's practical to integrate a
transceiver.)
Or could one reduce the number of pins by pulling the whole system
onto the chip and 'just' connect relatively low-bandwidth peripherals
more or less directly?
> > Back to your question: Can we get back to frequency scaling?
>
> > A) Yes, but only if we get rid of all those features that burn power
> > faster than they deliver performance. In the past, and on this NG, I
> > have described a simplified design point where all speculation and out-
> > of-orderness is removed from a CPU core. The end result is a core than
> > performs at half fo the rate of the great big core, but occupies only
> > 8% of the die footprint of that big core. In addition, it operates at
> > 2Watts/core down from the 35W/core of the big ones.
>
> > B) Once the core gets down into the 2W/core range, one can apply power
> > to drive this core from 3 GHz towards 6 GHz (depending upon how much
> > power the design is constrained). Even at this point, it will
> > substantially underperform the big core by 25% (while consuming only 8-
> > ish Watts).
>
> 1) Do you assume an optimization of A) and B) was design-paradigm for Power
> 6?
Cannot comment.
> > C) So, frequency scaling can be done, but 1) not with the cores as
> > they are today, 2) you are still going to have to learn to program
> > them en masse. but
>
> > D) You won't ever have to learn to program more than about 64.
>
> You see massive ||ism going nowhere at all, or are you referring just to our
> lifespan with it?
I am refering to the basic problem that the performance gained through
massive parallelism has little to do with the count of cores or
threads and a lot to do with bandwidths existing outside of the
processors core complexes.
Mitch
> This sounds like an interesting problem. Can you supply more details?
Sadly, no. I can speak here much more freely if I don't reveal who I
work for - not defence or anything like that, just one of relatively few
companies that do this specific thing. Some manufacturers have worked
out that I'm the same John Dallman as they've dealt with and haven't got
offended, but my employers are sometimes kind of cautious.
I should point out that I'm not expecting to fit the whole dataset into
that 256MB cache. That's enough for medium-complicated stuff, but we
have been supporting 64-bit platforms for years. Every so often I like
to say "We consider 32-bit addressing to be the hallmark of legacy
technology" to a manufacturer who hasn't quite grasped the idea that
individual users, on workstations, can need more than 4GB of RAM for a
single process, and want it at a sane price.
The customers build monolithic models until either their machines get
too slow - a very fuzzy criterion, but significant swappage is always
"too slow" - or they hit a hard limit. With 64-bit processors now a
commonplace, the hard limits have mainly gone away, and we get bug
reports with multiple gigs of supporting data.
>
> (Can someone _please_ come up with an effective way to resume clock
> scaling? :-)
>
Photons. Also solves the memory bandwidth problem.
Robert.
The intertia of software will make changing to any new direction in
computing architecture a less attractive option. Especially if your
business plan depends on rapid adoption to the new architecture.
I don't think adoption to multi-core architecture is so much of a
software developers problem as a business model problem. It's just
that software vendors don't feel as urgent a need to exploit multi-core
as fast as Intel would like.
Amen.
64MB on-chip memory and 16-32 processors would solve some of the
problems I face very nicely. Even 32MB and 4 processors will make
a nice headway.
-- mrr
There are several problems in the marketplace that can benefit from
that solution. Video recoding, all audio transcoding, packet forwarding
and filtering, and some parts of video encoding all work on relatively
small amounts of data at any one time. Even a full IPv4 table fits
in L2 cache.
>So option (D) is to back off the switch to 64-bit addressing, since what
>is really needed is a mixed model with both 32-bit (near) and 128-bit
>(far) pointers.
The fast memory only needs some tagging. A fast-mmap call to tell the
OS we want it, and fmalloc/ffree in user space.
Another alternative is to expose the difference to the OS, and let
the OS page between fast on-chip memory and DRAM paging space; and
use the exisiting code for locking in ram etc.
If the paging load of such a scheme is too great we can give the
processor a hardware assist similar to the one we have today in
cache handling.
-- mrr
Mine are pretty linear, but with a high baseline.
Assume for a minute that a quarter of the world's fixed-line
telephony is going to move to VoIP using links with strong
bandwidth constraints.
Current codecs use 20-40 mips. Next generation codecs give
a lot better sound in adverse conditions, at a cost of up to
200 mips per call.
Such calls are often recoded any number of times along the call
path. 10 encodings and decoding is not uncommon. Let is settle for
4 as an average. You cannot reasonably expect both ends of
a call to use the same encoding either, as the encodings are
exploding in number.
That is a 300 million landlines, times 4 times 200 mips.
Or 40 million dual Xeon's, just for the carrier recoding.
The code that executes is 150-300k in size, and the data is
less than 1k at a time.
And we haven't started at video recoding yet. Or the signal
processing for the radio links. This is a similar problem,
well bounded, parallises well, but still gobbles computation
power.
-- mrr
Any chance of the organisations concerned getting their act together,
standardising on a small number of codecs, designed at least
secondarily for computational efficiency, and encoding/decoding
only at entry/exit?
Nah. Nick, don't be naive.
Any ballsed-up design can consume an arbitrary amount of resources
(insert your unfavourite example here). There is no point in just
providing more parallelism, because it will simply be soaked up by
the bloatware. And then future generations will use 2,000 mips, and
need 100 encodings and decodings :-(
This is precisely what has happened with GUIs, though the primary
resources they waste are memory, cache and bandwidth - but the
increasing number of context switches is becoming a major pain
again. Parallelism would help with the latter, a bit.
A massive increase in parallelism may solve your problem in the short
term, but will not help in the long term. The ONLY long term solution
is to sort out the protocols.
You may call me old and cynical. I am.
Regards,
Nick Maclaren.
Just enable it. Most modern systems support that, today. The size
of physical memory is transparent to the applications, and there is
no difficulty in running a 32-bit application on a system with vastly
more memory.
|> If the paging load of such a scheme is too great we can give the
|> processor a hardware assist similar to the one we have today in
|> cache handling.
IBM do, and have done for many decades.
Regards,
Nick Maclaren.
A) it was a 360/91 or a 370/195, there was no 360/195
B) both of these 91/195 machines were tomasulu scheduled (e.g.
reservation station) machines,
C) the *first* Pentium was a 2-wide in-order superscalar processor
with an on-die coprocessor interface to a medium performance FPU.
D) The Pentium PRO, on the other hand, has lots of the characteristis
of the Tomasulu scheduled 370/195.
So, it was the *SECOND* Pentiums that had the characteristics you
describe.
> I don't think adoption to multi-core architecture is so much of a
> software developers problem as a business model problem.
Have you tried adapting any large existing codes to multiple threads?
It is quite a significant software development problem. There is a
business issues too: the single-threaded codes haven't stopped working
and there are always new things wanted in them, but adapting code to
multi-threaded is very often hard.
> It's just that software vendors don't feel as urgent a need to exploit
> multi-core as fast as Intel would like.
Chip marketing people have to sell people on supporting multi-core,
because that's what they have to sell. One of their major tactics is to
tell everyone, over and over again, that it isn't hard. They're
marketing people: they're used to dealing with motivational problems,
not technical ones. The idea that it really is hard no matter what you
believe isn't something that marketing departments want to face up to.
Me, I have a weird mental twist. The more that marketing people tell me
things, the less I believe them.
It's hard in the sense that you'll have to change a lot of code to
use multi-threaded design patterns. But those patterns are well
known. Maybe not by the employess at some of these firms because
knowing multi-threading isn't one of the more important skills to
have on your resume. In fact it's the least important. You can
have no threading skills at all and have no problem finding employment
if you have the hot du jour skills, webby 2.0 stuff and the like.
>
>
>>It's just that software vendors don't feel as urgent a need to exploit
>>multi-core as fast as Intel would like.
>
>
> Chip marketing people have to sell people on supporting multi-core,
> because that's what they have to sell. One of their major tactics is to
> tell everyone, over and over again, that it isn't hard. They're
> marketing people: they're used to dealing with motivational problems,
> not technical ones. The idea that it really is hard no matter what you
> believe isn't something that marketing departments want to face up to.
>
> Me, I have a weird mental twist. The more that marketing people tell me
> things, the less I believe them.
>
This was in a email I got from Intel.
Show Us Your Threads — win an Intel® Core™ 2 Duo iMac*
You express yourself through solving problems and developing code, don't you?
Show your peers your passion for fashion. Share your own unique sense of style.
Pictures of you in clothes that were fashion "must haves" or clothes that make
you (or others) smile. We know that some of you only wear free shirts — show
us your best tradeshow attire! Prizes will be awarded weekly to the entry with
the most community votes. The contest ends on March 26, 2007 with a grand
prize of an Intel® Core™ 2 Duo processor-based Apple iMac* with a 20" screen.
Join in the fun and show us your threads! Read on ›
At first I though it was another coding contest, hopefully less insane
than the TopCoder one. But programmer fashion? They're hoping that
people go out and buy Intel multi-core computers because they saw pictures
of people wearing cool clothing?
Oh, yes, there was.
Although a few of the later 360/195s shipped actually came with black-
painted fronts *like* the 370/195 after they got around to adding the
new 370 instructions to the hardwiring.
John Savard
Er, no. Firstly, while some patterns are known, it is not known how
to convert many important operations into them.
Secondly, it turns out to be VERY hard to convert most serial codes
to parallel, and if often ends up being easier to redesign from
scratch. Thirdly, it is VERY hard to program parallelism correctly,
and the methods that make it easier to add parallelism usually end
up making it harder to do it correctly :-(
Regards,
Nick Maclaren.
If I recall, it was somewhere between those positions. My memory says
that it started out being called a 360/195 but sort-of turned into the
370/195. I am pretty certain that this was more happenstance than
planned.
Whatever the name, neither of the 360/91 nor 370/165 were really part
of the 360/370 series, but more dead-end offshoots.
Regards,
Nick Maclaren.
Grrk. While I agree with most of your points, they rather make
light of an intrinsically hard problem. Remember that a lot of the
global data is hidden inside 'library' functions, objects, files and
what-nots that are used globally. Designing them so that they are
efficiently thread-safe and yet provide the desired (serial) global
interface is not easy, or even always possible.
Regards,
Nick Maclaren.
I think we talk past each other here.
I am unaware that current cpu's can have the contents of on-chip
memory put there under program control. All I see are caches.
>|> If the paging load of such a scheme is too great we can give the
>|> processor a hardware assist similar to the one we have today in
>|> cache handling.
>
>IBM do, and have done for many decades.
Do you have a pointer to a description ?
-- mrr
Many of them can enable explicit cache control, but I don't know of
any operating systems that allow it for unprivileged applications.
But your point is primarily terminological, anyway. If I were to
call the cache the memory, the memory the backing store, a cache line
a page, and the cache loading mechanism hardware paging support, in
what way would I have not met your request?
Seriously.
Regards,
Nick Maclaren.
Will not happen, and for good technical reasons.
We will see telephony delivered over a multitude of physical
media. The codec is an important part of the media adaption,
because we push technology; especially radio; very far here.
There are concious, sane desicions about throwing compute
power at the problems. It is by far the best, cheapest and
most reliable solution, but is has some scalability challenges.
This is a complex problem that computers are very good at.
For getting better sounding mobile phones, AMR is a very
sensible move. It is a step on the path to integrate all the
mobile codecs into a meta-codec that can adapt to conditions,
so you can have toll quality sound on a good link at 17 kilobits
with 40 mips used on the codec, or you can have near-toll
quality on a somewhat lossy link at 9 kilobits with 250 mips
applied.
On the internet losses are not such a big problem, but
jitter is. There G.729 makes all sense. It also fits nicely
over medium to good quality WiFi links.
G.729 was designed as a meta-codec that only describes the
physical LPC model and it's digial representation at various
bit rates. The encodings and decodings can cut corners and
use low-end FFT's as an approximation to 15 mips, or they can
do the whole shabang with floating point and use 200 mflops.
They will interoperate.
For good links where you want better-than-toll quality,
G.722 and iLBC is the choice. Then we have the old 'POTS'
codecs, Alaw/Ulaw and ADPCM in various shapes. These will
not go away for another hundred years.
You may be able to negotiate interoperable codecs from end
to end. This happens in around 15% of all cases, when you
call between similar media. For the rest, transcoding is
done. If you call from Skype to a mobile the call is probably
coded from the PC hardware's SLIN to Speex, then recoded
to G.729 as it hits voip infrastructure, recoded to Alaw when
it transits the phone network, and recoded to AMR if it is a
3G phone.
The G.729 to alaw to AMR can probably be done away with,
so the path becomes SLIN - SPEEX - G729 - AMR.
These codecs follow the infrastructure they run on. They
are all sane choices. Having CPU usage because of transcoding
needs is the smallest technical challenge.
Video will always have different media and resolutions.
Full transcoding is probably not worthwhile, but a mid-level
transcoding e.g. vector remapping to a new resolution is
doable inside H.263 with around 50 mips of processing power.
All of these standards are designed to be implementable with
little extra data stored. They are parallisable at the outset.
The code for these codecs are from 100k to a megabyte in size,
it all fits in a modern L2 cache. Data is piped directly in
and out, may not even touch memory.
People asked for what we all use the parallel mips for.
Now you know one more application.
If anyone would like to have a look at code you are free
to do so, except for the G729 applications which are mired
in a mess on licenses.
Improving GSM 06.10 encoding by 10% will extend the battery life
of 500 million mobiles by several hours each. The power
savings alone are large.
-- mrr
Oh, yes, but bits is bits. Once a sound has been converted to bits,
the only reason to recode is to reduce the number of bits transferred
or when converting back to analogue. You can't reasonably claim that
you need a different compression technique because the bits are being
transmitted over a different medium!
If you DO claim that, please explain. I am not interested in why
it IS done that way but why it MUST be done that way. I.e. the
mathematics and science, not the historical and current practice.
|> There are concious, sane desicions about throwing compute
|> power at the problems. It is by far the best, cheapest and
|> most reliable solution, but is has some scalability challenges.
|>
|> This is a complex problem that computers are very good at.
Sorry, that is wrong. It is a complex problem that computers WERE
very good at. The killer is that scalability of raw performance
with time is dead, and that now scalability comes with massive
increases in power utilisation. You have heard of global warming :-(
|> On the internet losses are not such a big problem, but
|> jitter is. There G.729 makes all sense. It also fits nicely
|> over medium to good quality WiFi links.
I have selected this point out. Once you have gone digital, you
don't have loss problems - you get error and jitter problems only.
I fail to see that there is a massive difference between any form
of digital transmission. If there is, I should like to know why
(see above).
||> You may be able to negotiate interoperable codecs from end
|> to end. This happens in around 15% of all cases, when you
|> call between similar media. For the rest, transcoding is
|> done. ...
All right. Let's assume that the codec depends on the initial
medium. I fail to see why any more than two recodings are needed.
One from the sender to the transport, and the other from the
transport to the receiver. As I understand it, all modern
transports are going digital, and bits are bits.
Again, I am asking about the mathematics and science, not the
historical and current practice. I quite accept that people do
it that way, because that is the way that they do it. And you
and I can't change that with a snap of our fingers!
Regards,
Nick Maclaren.
But some stuff *is* possible and there's relatively little interest in
exploiting that even.
There's a company out there whose product is a debugger for multi-threaded
programs. There's an option to check memory accesses against validly
allocated memory. They use an AVL tree for bounds checking. Even with
an rwlock, they're getting scalability problems. There are lock-free
solutions to this problem but you have to be aware of the state of the
art. Apparently the skills necessary to implement a muli-threaded
program debugger don't require being aware of the state of the art.
And/or the pain factor isn't high enough yet.
i got called into participate in looking at doing dual i-stream
370/195 i.e. dual i-stream, dual PSW, dual set of registers, etc
... but same/single pipeline, work in the pipeline had single bit A/B
flag as to which i-stream the work belonged to. project never
actually shipped a product. the issue was that most codes ran about
half peak thruput because most branches would drain the pipeline. dual
i-stream had some probability of keeping pipeline filled.
past posts with any mention of smp activity (including 195 dual
i-stream)
http://www.garlic.com/~lynn/subtopic.html#smp
comment that I remember (vaguely, long ago and far away) being told
was that biggest transition from 360/195 to 370/195 was some of the
new fangled instruction retry that was showing up in 370s. claim was
that 195 had large number of components and along with the rate that
195 operated, the mean-time-between (soft) failure of some component
was on the order of ? (small number of) weeks (2-8?), which could be
masked by new fangled hardware instruction retry logic (besides the
couple new instructions announced with original 370 ... predating any
of the 370 virtual memory announcement).
SJR continued to operate 370/195 service running MVT until '78
(when it was replaced with 168-3 & MVS)
To: wheeler
Date: 12/18/78 08:33:18
lynn--
Now that the /195 is gone, vm has acquired another printer (address
001) for class y output. Somehow, no one bothered to do a sysgen to
add the new printer. XXXXXX knows about this, but i doubt he knows
what to do. Could you see to it that the gen is done?
Now for some good news, the vm 168 has been approved along with the
additional head count. Question now is, when will it arrive? Mgmnt
trying to establish a firm ship date (tenative date is in
July). Obviously, they are trying to move up that date.
... snip ...
And for a whole lot of topic drift, SJR had been running a vm 158 in
the datacenter and the computer science group had an additional vm
145.
http://www.garlic.com/~lynn/subtopic.html#systemr
From: wheeler
Date: 01/17/80 08:12:39
There are/were three different projects. A long time ago and far away,
some people from YKT come up to CSC and we worked with them on the 'G'
updates for CP/67 (running on a 67) which was to provide virtual 4-way
370 support. Included as part of that was an additional 'CPU' (5-way)
for debug purposes which was a super set of the other four and could
selectively request from CP that events from the 4-way complex be sent
to the super CPU for handling (basically it was going to be a debug
facility). All of this was going to be for supporting the design &
debug of a new operating system. When FS started to get into trouble
the YKT project was canceled (something like 50 people at its peak)
and the group was moved over to work on FS in hopes that they might
bail it out.
2) One or two people from that project escaped and got to SJR. Here,
primarily Vera Watson, wrote initial R/W shared segment support for VM
(turns out it was a simple subset of the 'G' updates) in support of
System R (relational data base language). Those updates are up and
running and somewhat distributed (a couple customers via joint studies
and lots of internal sites).
3) Independent of that work, another group at SJR has a 145 running a
modified VM with extremely enhanced modifed timer event support. The
objective is to allow somebody to specify what speed CPU and how fast
the DASD are for a virtual machine operator test. VM calculates how
fast the real CPU & DASD are and 'adjust' the relative occurance of
all events so that I/O & timer interrupts occur after the appropriate
number of instructions have been executed virtually. These changes do
not provide multiple CPU support
... snip ...
Turns out that at least one of the other people involved in the "G"
cp67 activity was 195 engineer (and I believe was the vector that got
us involved in the 195 dual i-stream stuff, again, long ago and far
away)
and of course lots of past posts mentioning FS (future system)
http://www.garlic.com/~lynn/subtopic.html#futuresys
and of course by the time of the above email, Vera had already found
her resting place on Anapurna ... old post
http://www.garlic/~lynn/2002g.html#60 Amiga Rexx
other references:
http://lomaprieta.sierraclub.org/lp0012_TheSeventies.html
http://www.mcjones.org/System_R/SQL_Reunion_95/sqlr95-Vera.html
> Apparently the skills necessary to implement a muli-threaded
> program debugger don't require being aware of the state of the art.
> And/or the pain factor isn't high enough yet.
Can you recommend a decent book or other source on the actual state of
the art? I've tried reading Sun's research papers on lockless
algorithms, and concluded that either I was misundstanding them utterly,
or they'd redefined their problems to the point where the solution wasn't
applicable to anything else.
Nick, if you have someone there who could provide consultancy, I'd be
interested.
> Nick, if you have someone there who could provide consultancy, I'd be
> interested.
Hang on, I'm overcommitting there. I don't have the authority to hire
consultants, or set the priorities for implementing more or better
multi-threading. If there is someone who can provide consultancy, I can
try to get those things to happen, but I can't do them on my own
authority.
Don't worry. I read it as you would like to discuss that, with no
commitment. I have sent Email.
Regards,
Nick Maclaren.
Seriously Nick, this is exactly what I've suggested at least a couple of
times here on c.arch:
A modern cpu is effectively, from a programming viewpoint, close to
identical to a 20-30 year old mainframe. This includes tape drives which
we call hard disks today. This is of course just a way to pretend that
they can be used for random access.
Terje
--
- <Terje.M...@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"
And then you have h264 which can require nearly unbounded history in the
form of reference frames, as well as hundreds of cabac contexts. :-(
With 1080p each luminance frame is 2 MB, so a simple interpolation
between two frames requires 4 MB source and 2 MB target cache memory.
This is _not_ easy to fit today. :-(
Fortunatly most video transcoding does not have to back out to the
DCT stage, and can stay at the vector layer and just rearrange vectors.
This is why people stick to H.263, but some recoding is needed when
you have different resolutions at the endpoints.
This is only an order of magnitude worse than audio transcoding.
-- mrr
You make the fallacy to think that bits are perfect.
Transmission systems are not perfect, even if they are digital.
Using WiFi as an example. We make a perfectly good IP packet
inside the IP layer, and present it to the MAC layer for transmission.
It is chopped up, encoded, and transmitted using oodles of computational
cycles to drive the transmitter while listening at the receiver to
compensate for all tha various radio issues.
Even more processing is used in the receiveing end; canceling out
echoes from multipath, recovering bit forms directly out of thin air
with just a little whiff of RF in it.
Lots of the little cell packets are lost, but WiFi is very good
at retransmitting. This generates jitter.
At the other end there is a decoder that wants to take the packet,
decode it and play it for a human. This is a hard realtime process.
If the sound slot is not there by the time it is to be played, you
get clipping.
So, instead of throwing more radio computation on the problem we
make concessions for lossyness in the codec. Instead of transmitting
samples of amplitude, we make a model of the sound image the ear picks
up. There are four different such models.
These models have advantages in that they don't sound as bad when
packets arrive too late or not at all. We can do packet loss concealment.
See ITU-T G.729B or ETSI/3GPP GSM 06.90 for a thorough treatise
on the subject.
They have different characteristika. GSM and other cell radio
have lots of small, lost cells. So the codec has the parameters
rigged so it works well under those circumstances. G.729 is more
adapted to jitter, so if a packet arrives a little late it does
not affect playback so much.
Both of these keep working under pretty degraded network connections.
They work way beyond the point where you can expect to have a
functioning TCP connections.
In short, this is why GSM and VoIP work at all on adverse links like
GSM in the streets of Manhattan, or VoIP to your friend in Russia.
30% packet loss, 300ms jitter and GSM keeps giving legible speech.
12% packet loss and 700ms jitter, and G.729 is still with you.
Then there are two old coding methods for old style phone networks,
without much compression. DECT phones are among these, using ADPCM.
And then there is iLBC that is mostly compression of a wideband signal.
It is popular inside companies because it sounds really clear and
good. It should. It uses a lot of GSM's representation tricks, but
do it on a wideband signal with full floating point and largely
does it without packet loss.
>If you DO claim that, please explain. I am not interested in why
>it IS done that way but why it MUST be done that way. I.e. the
>mathematics and science, not the historical and current practice.
No, we could have our APMS and NMT phones back.
>|> There are concious, sane desicions about throwing compute
>|> power at the problems. It is by far the best, cheapest and
>|> most reliable solution, but is has some scalability challenges.
>|>
>|> This is a complex problem that computers are very good at.
>
>Sorry, that is wrong. It is a complex problem that computers WERE
>very good at. The killer is that scalability of raw performance
>with time is dead, and that now scalability comes with massive
>increases in power utilisation. You have heard of global warming :-(
For the handsets Moore's law is still holding. The state of the
art are 400MHz ARM-9s.
>
>|> On the internet losses are not such a big problem, but
>|> jitter is. There G.729 makes all sense. It also fits nicely
>|> over medium to good quality WiFi links.
>
>I have selected this point out. Once you have gone digital, you
>don't have loss problems - you get error and jitter problems only.
>I fail to see that there is a massive difference between any form
>of digital transmission. If there is, I should like to know why
>(see above).
Digital packets can get lost just as easily as analog packets do.
You just have a lot better chance to regenerate a working stream.
You must unthink TCP. TCP retransmits. It is unusable in hard
real time. Audio is the hardest real time there is, next to paying
out salaries. So you must regenerate and conceal packet loss
from what you have.
Such devices have been sitting next to your ear for a few decades
already. They are just getting much better at it with more processing
power.
>
>||> You may be able to negotiate interoperable codecs from end
>|> to end. This happens in around 15% of all cases, when you
>|> call between similar media. For the rest, transcoding is
>|> done. ...
>
>All right. Let's assume that the codec depends on the initial
>medium. I fail to see why any more than two recodings are needed.
>One from the sender to the transport, and the other from the
>transport to the receiver. As I understand it, all modern
>transports are going digital, and bits are bits.
>
>Again, I am asking about the mathematics and science, not the
>historical and current practice. I quite accept that people do
>it that way, because that is the way that they do it. And you
>and I can't change that with a snap of our fingers!
Be careful what you ask for. I can dig up the background papers
from ITU and ETSI. Unless you have a serious need to go there
you do want to stay away from it.
But, if you can improve the GSM coding speed by 10% you will
be seriously rich.
-- mrr
> This was in a email I got from Intel.
> Show Us Your Threads — win an Intel® Core™ 2 Duo iMac*
>
> You express yourself through solving problems and developing code, don't
> you?
> Show your peers your passion for fashion. Share your own unique sense of
> style.
> Pictures of you in clothes that were fashion "must haves" or clothes that
> make
> you (or others) smile. We know that some of you only wear free shirts —
> show
> us your best tradeshow attire!
[...]
> At first I though it was another coding contest
[...]
WTF is up with that? What happens if your a good programmer that happens to
have a "crappy" sense of so-called:
" 'current' fashion trends "
??? Is this for real or SPAM? WTF?!
[...]
> They're hoping that people go out and buy Intel multi-core computers
> because
> they saw pictures of people wearing cool clothing?
Has to be SPAM....!
:^O
Why not post on comp.programming.threads??? If your question cannot be
answered there, well, send me an e-mail.
http://groups.google.com/group/comp.arch/msg/5fc331fcae43bf42
http://groups.google.com/group/comp.arch/msg/0a7d9ea4cff0b0ce
Here is something else you can use:
http://groups.google.com/group/comp.lang.c++.moderated/browse_frm/thread/65c9c2673682d4cf
http://appcore.home.comcast.net/vzoom/round-1.pdf
http://appcore.home.comcast.net/vzoom/round-2.pdf
The memory allocator I invented here:
http://groups.google.com/group/comp.arch/browse_frm/thread/24c40d42a04ee855
Is "state-of-the-art", well, IMHO at least... BTW, can you find a 100%
lock-free distributed multi-threaded 100% software-based and user-space
general purpose allocator algorithm that can beat the my algorithm? The
"current" implementation of Hoard is not going to do it...
OK, why can't you use obvious digital techniques, like fancy ECC or just
RAID style extra packets?
If you often get bursts of dropped packets, bursts long enough that
using horizontal redundancy to cover them would introduce unacceptable
delay, then I'll accept your premise about needing a different style of
encoding.
> But, if you can improve the GSM coding speed by 10% you will
> be seriously rich.
Improve on a given HW platform (asm/microcode/fpga?) or in general by
improving the high-level (C presumably?) reference codec?
Yup. You, me and people a lot more eminent than either of us! I can't
remember which Big Name I first saw make the statement, but it was a
LONG time ago, and it was so obviously right that I adopted his position.
Regards,
Nick Maclaren.
You are making a mistake to think that a bit can be anything other
than 0 or 1!
|> Transmission systems are not perfect, even if they are digital.
That is very true, and I said just that.
What I understand you to be claiming is that there is no option but
to convert back to analogue and to a different digital format half a
dozen or more times per trip. I don't believe that, but I do believe
that it is done that way.
|> Both of these keep working under pretty degraded network connections.
|> They work way beyond the point where you can expect to have a
|> functioning TCP connections.
Nuts. I have had working TCP connexions with 80% packet loss, and ones
with tens of seconds of jitter. It wasn't even too unpleasant to work
in traditional line mode, either, but end-to-end full-screen and GUIs
were unusable. I have used TCP over a 9600 baud acoustic modem that
had dropped to 4800 because the line was so noisy and lossy.
|> 30% packet loss, 300ms jitter and GSM keeps giving legible speech.
|> 12% packet loss and 700ms jitter, and G.729 is still with you.
And there are ancient techniques to swap loss for jitter and jitter
for loss, that have been standard practice for 3/4 of a century now.
There is VERY rarely any need to convert back into analogue in the
middle of a transmission.
|> >Sorry, that is wrong. It is a complex problem that computers WERE
|> >very good at. The killer is that scalability of raw performance
|> >with time is dead, and that now scalability comes with massive
|> >increases in power utilisation. You have heard of global warming :-(
|>
|> For the handsets Moore's law is still holding. The state of the
|> art are 400MHz ARM-9s.
Damn Moore's Law - that merely talks about real estate. If you mean
Not-Moore's Law (i.e. clock speed), you assuredly AREN'T going to
see 4 GHz handsets in 5 years time. No way, Jose. The fire risk,
for a start. You aren't even going to see 10 times as many cores
in most devices, though that is technically easier.
|> You must unthink TCP. TCP retransmits. It is unusable in hard
|> real time. Audio is the hardest real time there is, next to paying
|> out salaries. So you must regenerate and conceal packet loss
|> from what you have.
Damn TCP. Its retransmission algorithm is a crock. There are many
other ways to deliver reliable and semi-reliable transmission without
converting to and from analogue (except at the end points).
|> Be careful what you ask for. I can dig up the background papers
|> from ITU and ETSI. Unless you have a serious need to go there
|> you do want to stay away from it.
If they are like most of the similar ones I have seen, they will be
hiding the fact that the industry is heading in the wrong direction
behind a mass of details of how hard it is to progress in the
direction they have chosen.
|> But, if you can improve the GSM coding speed by 10% you will
|> be seriously rich.
Quite. And, if you can point out that they could achieve their
objectives much better and more simply by changing direction, you
will be ignored. That is the way of the world.
Regards,
Nick Maclaren.
I didn't see much about practical implementation on the syllabus,
and saw precisely nothing about even designing for debuggability.
Regards,
Nick Maclaren.
Done already. GSM predates the very fancy ECCs and rely on a row/col ECC,
but Edge, 3G and WiFi use very sophisticated ECCs. The sophistication
of radio signal processing is very high, and they use all tricks in the
book. 3G recovers all the multipath signals it can get it's hands on, and
reconstructs the cells independently. Therefore 10ms of speech may bounce
off building A and the next 10ms bounce off building B.
The basic problem is that by the time you get the parity packet of the
raid it is too late. The audio is already played. Ideally you shouldn't
have a RTT of more than 140 milliseconds, because the human conversation
tends to get broken up. When you deal with quantisations of 20 ms you
have 100ms left for transmission, jitter and retransmits. If you do
three-way raid5 the raid packets trail the first audio by 60ms, and
that alone is just on the edge of the playout window; to say nothing
of transmission and jitter. GSM does fall back to 280 ms in bad conditions.
This is hard real-time. 10% packet loss in a GSM network is a normal
state of affairs if you are in a car/train and is moving. WiFi signal
process more (that is why it is so power hungry) so packet loss is
lower, but the radio noise and processing is still there.
GSM 06.60 EFR and G.729E/F/G use some "raid-like" tricks, in that packets
do encode some state for the previous pracket, so a better playout can
be acheived even with a missing packet. G.729 also handles that a packet
arrives a little late, while a part of the audio has been played.
This is often the case on the Internet.
It all plays games with the human ear, and conceals the dropouts by
playing the same sound pattern slightly attenuated over time until
told ofterwise by incoming packets. Since speech is mostly phonemes
at a rate of 2-3 per second this works pretty well.
>If you often get bursts of dropped packets, bursts long enough that
>using horizontal redundancy to cover them would introduce unacceptable
>delay, then I'll accept your premise about needing a different style of
>encoding.
>
>> But, if you can improve the GSM coding speed by 10% you will
>> be seriously rich.
>
>Improve on a given HW platform (asm/microcode/fpga?) or in general by
>improving the high-level (C presumably?) reference codec?
I was thinking of algorithmic performance over what has already been
done. The reference codec is made for clarity of understanding and is
not very efficient; especially the GSM 06.10 one. The tu-berlin one
is the one used as a baseline.
Standards and reference code is available on the websites of the
ITU, ETSI and 3GPP. Only ETSI permit unlimited distribution, ITU and
3GPP require that you download yourself and don't redustribute.
-- mrr
Well, the design error in the above is easy to spot.
RAID techniques are complete and utterly inappropriate for such uses,
and always obviously were. The first statement of your last paragraph
summarises it nicely - if a technique can't reach its design objective
even on paper, then it is obviously unsuited for the purpose. Even for
disk uses, the defects of RAID have been known for 3-4 decades, and it
is blindingly obvious that they will affect real-time audio much worse.
The obvious type of ECC for a real-time stream takes its ideas from
sequential sampling (statistics). Each packet includes an ECC for
itself and its N previous packets, preferably designed so that multiple
successive slight failures still allow recovery.
Note that, given your figures, that would allow trivial, perfect
recovery for up to one packet in 7 lost, and I am quite certain that a
competent design could handle 50% loss, provided that at least 4 packets
in every 7 get through. Yes, that would need double the bandwidth of
a raw audio stream. It's all good 1920s stuff.
BUT TO REPEAT: the above is referring to the underlying engineering
problem and not the politics. Nobody is denying that it would be a
major upheaval, and even proposing such a design would cause a major
political storm.
Regards,
Nick Maclaren.
Er, roughly - those figures don't quite match each other :-)
Regards,
Nick Maclaren.
No, the signal is not converted back to analogue. It changes
digital representations. At the very worst fallback mode it changes
to a representation called Slinear, which is a 15-bit series of
sound samples. Used a lot by sound chips.
All codecs try to go direct, i.e. stay within the
GSM LPC model, but simulate and recode for another codec. You
don't have to do that for G.729, the basic model and representation
is the same.
Or you may have to code between LPC models, ITU to GSM, and then recode for
e.g. G,729 (or G.728, or G.723.1; or even the wideband G.722).
Your prayers about reducing the codecs have mostly been heard, though.
The codecs in use are mainly :
Slinear : The 15-bit format the sound chips like. 112 kilobits.
G.711 A-law and ulaw : the mainstay of the old phone system. 56 or 64 kilobits
Semi-logarithmic coding of 13-bit samples into 7 or 8 bits.
Variuos ADPCM (G.726 and G.727) : DECT and other old-style phone systems 16-40 kb
The dialects are usually just on the frame level, i.e. confusion
between big and little endians etc. Like a compressed A-law.
G.729 : Everywhere on the Internet. 4.3-12 kilobits, Handles jitter _very_ well.
i.e. handles occational dropouts of 20-60 ms audio very well.
Is a meta-codec that describes a sound model and its data representation.
GSM, many representations, 3.8 - 17.3 kilobits. Handles cell dropout _very_ well.
i.e. handles many non-contigous 10ms drops well. Has also become a meta-
codec with the appearance of the successor AMR.
There is a lot of work ongoing to reconcile the ITU (G.729 etc) and GSM/3GPP
(06.10 etc) LPC models, and bring out effective recoders between the formats.
Otherwise encodings to/from G.711 (old phone system) and Slinear (sound chips)
are performed often. ADPCM is a compression than work on a Law-like signal,
think of it as compressed A-law.
>|> Both of these keep working under pretty degraded network connections.
>|> They work way beyond the point where you can expect to have a
>|> functioning TCP connections.
>
>Nuts. I have had working TCP connexions with 80% packet loss, and ones
>with tens of seconds of jitter. It wasn't even too unpleasant to work
>in traditional line mode, either, but end-to-end full-screen and GUIs
>were unusable. I have used TCP over a 9600 baud acoustic modem that
>had dropped to 4800 because the line was so noisy and lossy.
But your TCP can retransmit, right ? You are not running hard
realtime applications like audio on top of it. You do not get garbage
on your screen when the packets are delayed 100ms or lost .
Even for streaming purposes TCP degrades visibly at 3% packet loss.
>|> 30% packet loss, 300ms jitter and GSM keeps giving legible speech.
>|> 12% packet loss and 700ms jitter, and G.729 is still with you.
>
>And there are ancient techniques to swap loss for jitter and jitter
>for loss, that have been standard practice for 3/4 of a century now.
>There is VERY rarely any need to convert back into analogue in the
>middle of a transmission.
Yep, Signals never go to analogue, except at playback to a human.
And, even then they probably go via Slinear on the way to the chip
that post-process it to sound a bit nicer.
>|> >Sorry, that is wrong. It is a complex problem that computers WERE
>|> >very good at. The killer is that scalability of raw performance
>|> >with time is dead, and that now scalability comes with massive
>|> >increases in power utilisation. You have heard of global warming :-(
>|>
>|> For the handsets Moore's law is still holding. The state of the
>|> art are 400MHz ARM-9s.
>
>Damn Moore's Law - that merely talks about real estate. If you mean
>Not-Moore's Law (i.e. clock speed), you assuredly AREN'T going to
>see 4 GHz handsets in 5 years time. No way, Jose. The fire risk,
>for a start. You aren't even going to see 10 times as many cores
>in most devices, though that is technically easier.
Handsets see a revolution already. 400 Mhz arm-9s are mannah from
heaven for designers used to 16 Mhz 8-bitters. But they DID squeeze
DSP-assisted GSM into the latter. With 400 Mhz and a real nice 32bitter
the world is at their doors. They use it for WiFi, AMR and lots of
other stuff. Even TV on the mobile.
This poses a real challenge on the back-end server farms that get
the GSM cells, the IP frames with G.729, POTS frames with G711 and
possibly ADPCM frames from DECT.
Yes, there are DECT SIP gateways that deliver the DECT cells directly
over RTP, and avoid going to G711 over ISDN or to analogue. This
makes the server transcoding work larger, not smaller. But, when such
a DECT phone talks to another DECT phone it will have end to end
ADPCM from handset to handset. (Planet is one brand of such phones).
You can get the GSM cells out of GSM networks with the right peering
agreements. They come packaged in V.110 inside ISDN.
You already have G711 from the ISDN lines, and G.729 in VoIP.
To get from here to there you must have a codec arbitrator of last resort
insidea network, one that offers to handle everything. This arbitrator becomes
a huge beast that needs stacks of processor cards.
>|> You must unthink TCP. TCP retransmits. It is unusable in hard
>|> real time. Audio is the hardest real time there is, next to paying
>|> out salaries. So you must regenerate and conceal packet loss
>|> from what you have.
>
>Damn TCP. Its retransmission algorithm is a crock. There are many
>other ways to deliver reliable and semi-reliable transmission without
>converting to and from analogue (except at the end points).
>
>|> Be careful what you ask for. I can dig up the background papers
>|> from ITU and ETSI. Unless you have a serious need to go there
>|> you do want to stay away from it.
>
>If they are like most of the similar ones I have seen, they will be
>hiding the fact that the industry is heading in the wrong direction
>behind a mass of details of how hard it is to progress in the
>direction they have chosen.
>
>|> But, if you can improve the GSM coding speed by 10% you will
>|> be seriously rich.
>
>Quite. And, if you can point out that they could achieve their
>objectives much better and more simply by changing direction, you
>will be ignored. That is the way of the world.
Which direction should that be, preciesly ? Use more procesing power
on the radio, where they are pretty close to a brick wall of exploding
computing complexity already ? Code G.711 everywhere?
You want to keep the codecs as long as possible. This makes greater
demands on the core, not smaller.
The calls that go VoiP-VoIP or GSM-GSM or G711-G711 mostly bypass the
VoIP carrier. The VoIP carrier has to be there as glue.
-- mrr
I should have said "what is effectively analogue".
|> All codecs try to go direct, i.e. stay within the
|> GSM LPC model, but simulate and recode for another codec. You
|> don't have to do that for G.729, the basic model and representation
|> is the same.
Simulating and recoding is (as far as the data flow is concerned)
equivalent to converting via analogue.
|> >|> Both of these keep working under pretty degraded network connections.
|> >|> They work way beyond the point where you can expect to have a
|> >|> functioning TCP connections.
|> >
|> >Nuts. I have had working TCP connexions with 80% packet loss, and ones
|> >with tens of seconds of jitter. It wasn't even too unpleasant to work
|> >in traditional line mode, either, but end-to-end full-screen and GUIs
|> >were unusable. I have used TCP over a 9600 baud acoustic modem that
|> >had dropped to 4800 because the line was so noisy and lossy.
|>
|> But your TCP can retransmit, right ? You are not running hard
|> realtime applications like audio on top of it. You do not get garbage
|> on your screen when the packets are delayed 100ms or lost .
Read what you posted again. I was responding to that.
And, yes, you do get garbage under those circumstances, if you are
using a GUI - because `protocols' like X fail horribly if packets get
out of sequence (which includes ones getting delayed for long enough
for the user to get another interaction in before they are processed).
|> Even for streaming purposes TCP degrades visibly at 3% packet loss.
I said that its recovery was a crock.
|> >Quite. And, if you can point out that they could achieve their
|> >objectives much better and more simply by changing direction, you
|> >will be ignored. That is the way of the world.
|>
|> Which direction should that be, preciesly ? Use more procesing power
|> on the radio, where they are pretty close to a brick wall of exploding
|> computing complexity already ? Code G.711 everywhere?
Use appropriate techniques for 21st century telephone technology, and
stop attempting to prop up archaic methods.
It won't happen - any more than it does in other areas of IT :-(
Regards,
Nick Maclaren.
> Note that, given your figures, that would allow trivial, perfect
> recovery for up to one packet in 7 lost, and I am quite certain that a
> competent design could handle 50% loss, provided that at least 4 packets
> in every 7 get through. Yes, that would need double the bandwidth of
> a raw audio stream. It's all good 1920s stuff.
But is that the best use of precious bandwidth?
--
Mvh./Regards, Niels Jørgen Kruse, Vanløse, Denmark
As far as audio is concerned, bandwidth has not been a problem for a
long time. Video is another matter.
The human ear cannot take in more than a very small information rate.
It is hard to encode sounds efficiently, because the ear is a VERY
non-digital device, but you still don't need much by video and data
standards.
Regards,
Nick Maclaren.
Perhaps I should repeat what I said a few postings back, as it may not
be clear which comment refers to this.
I accept that we may need a large collection of weird techniques for
a variety of handset to exchange circumstances. I am not convinced,
given how much CPU power you can put in a small space, but let's not
argue that one. Assume that we do.
I do NOT accept that more than one protocol is needed for transmission
between exchanges, except in VERY unusual circumstances; almost all
of your arguments for multiple codecs refer to the handset-exchange
link and archaic methods of exchange-exchange transmission. My points
about the recovery being a solved problem refer to the end-to-end
exchange-exchange stages.
That would mean that at most two recodings were needed for all normal
connexions. I cannot see any technical justification for more, though
I accept that there are historical ones, and political ones that block
getting there.
Regards,
Nick Maclaren.
> WTF is up with that? What happens if your a good programmer that
> happens to have a "crappy" sense of so-called:
>
> " 'current' fashion trends "
> ??? Is this for real or SPAM? WTF?!
I got it too. If it was spam, it was a pretty convincing imitation of
the layout and style of the mails that Intel sends out to registered
developers.
> http://appcore.home.comcast.net/vzoom/round-1.pdf
> http://appcore.home.comcast.net/vzoom/round-2.pdf
Sadly, "patent pending" on the footer of your PDFs isn't encouraging.
When one has no idea of what the terms & conditions for using it might
be, and reckons some US lawyers might well try to make a case for
infringement if we'd read the papers and then implemented anything in
vaguely the same field, the logical course of action for a commercial
organisation is to avoid reading it. Of course, I may be being paranoid,
but US intellectual property law is pretty aggressive.
I am not replying to any particular point Morten has made but to thank
him for the solid, fact filled exposition of the problems that he, and
others doing what he does, face. I find it very interesting to learn
about things I know very little about and find comp.arch a good source
for that, especially threads like this one.
I don't expect that Nick is necessarily right about his points, but his
posts do serve to force further exposition, and for that I am grateful. :-)
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
Humm.. Yeah, your right. Well, what do you think of the "teaching strategy"
I briefly outlined here:
http://groups.google.com/group/comp.arch/msg/0a7d9ea4cff0b0ce
It all about the implementation...
;^)
Chris should just follow standard corporate practices and not
publicize the fact that the algorithms have patents applied for.
If you want to avoid patent entanglements, you need to stick to
algorithms published before 1990. Unless you're just worried
about triple damages, e.g. $50 million is ok but $150 million
is not.
--
Joe Seigh
When you get lemons, you make lemonade.
When you get hardware, you make software.
No. I suggest you go read the documents in question.
The recoding inside the GSM family does not do filtering, hysterisis,
fft nor prediction. Some renormalisation of polynominals are done to
fit new encodings. You will still use 20+ mips going between e.g. HR and
EFR, but not the 140+ mips expected of a full decode/encode.
Between G.729 and the GSM family it gets trickier, but it is still doable
with ~50 mips, not the expected 200 for a full decode/encode.
The large challenge is how to represent the holes in the other encoding
domain. That is where you may have to back out and do prediction and fft
again.
All of this is a far way from old GSM 06.10 that our ears have heard so
many times.
>|> >|> Both of these keep working under pretty degraded network connections.
>|> >|> They work way beyond the point where you can expect to have a
>|> >|> functioning TCP connections.
>|> >
>|> >Nuts. I have had working TCP connexions with 80% packet loss, and ones
>|> >with tens of seconds of jitter. It wasn't even too unpleasant to work
>|> >in traditional line mode, either, but end-to-end full-screen and GUIs
>|> >were unusable. I have used TCP over a 9600 baud acoustic modem that
>|> >had dropped to 4800 because the line was so noisy and lossy.
>|>
>|> But your TCP can retransmit, right ? You are not running hard
>|> realtime applications like audio on top of it. You do not get garbage
>|> on your screen when the packets are delayed 100ms or lost .
>
>Read what you posted again. I was responding to that.
>
>And, yes, you do get garbage under those circumstances, if you are
>using a GUI - because `protocols' like X fail horribly if packets get
>out of sequence (which includes ones getting delayed for long enough
>for the user to get another interaction in before they are processed).
??? TCP will deliver in sequence, ordered; or not at all.
>|> Even for streaming purposes TCP degrades visibly at 3% packet loss.
>
>I said that its recovery was a crock.
>
>|> >Quite. And, if you can point out that they could achieve their
>|> >objectives much better and more simply by changing direction, you
>|> >will be ignored. That is the way of the world.
>|>
>|> Which direction should that be, preciesly ? Use more procesing power
>|> on the radio, where they are pretty close to a brick wall of exploding
>|> computing complexity already ? Code G.711 everywhere?
>
>Use appropriate techniques for 21st century telephone technology, and
>stop attempting to prop up archaic methods.
>
>It won't happen - any more than it does in other areas of IT :-(
Let's see; the major work for AMR was done 2000-2002, and was finalised
in 2003, with a small "fix release" in 2005.
G,729 was originally done in 1995-96, first version (of the "meta-codec")
in 1996; and new encode/decode methods (all interoparable without recoding)
yearly since then.
Hardly ancient stuff. Then someone comes up with ideas like G.727 and G.728
dreamt up by mathematicians. Probably brilliant ideas, but they never took off.
If you have better ideas of encoding sound the world will like to hear.
These problems are _hard_, and lots of time is spent on the problems.
-- mrr
Let's see; a simple case with no "extra" encodings. Just the ones that
make technical sense.
I call home on a GSM link from a location with good coverage, and start
to move away from the base station.
The sound chip in the handset outputs Slinear. More or less raw samples,
but with some filtering and peak dampening. Just now you are riding the
train in a tunnel, so the base station has negotiated you to a GSM HR
06.20 codec at 4.3 kilobits. The base station started the negotiation
with you at 13.8 kilobits when you were at the station placing the call,
and this is what has been entered into the location register of the
mobile operator. To hide this infraction, and not spend time interrupting
the call, the base station now transcodes. It stays inside the GSM
domain and just renormalises the GSM polynominals.
The number was looked up in the number database as belonging to a VoIP
operator, and, as luck would have it the Mobile operator has a link there.
The GSM cells are stuffed into V.110 trunks, and picked up in the
other end. The VoIP operator's software has identified this customer
as having abysmal broadband quality, so a transcode to G.729 is done
to compensate.
This particular customer uses a SIP enabled DECT base station. So the
call is transcoded yet again, and transits DECT space as ADPCM, and
is again transcoded into Slinear at the handset and delivered to a
sound chip.
That is a pretty normal scenario on a VoIP to GSM call. In todays
world there would probably have been a G.711 link between the operators,
but that is ancient technology. In this case, one of the operators got
to use their native format, so there are only 5 transcodings.
In real life there would be 6.
All of these things happen all the time, and the systems keep
negotiating all the time.
Over a few beers, we had four people gathered who had intimate
knowledge of how GSM, Internet and VoIP operation worked in "our"
networks. We decided to count out how many computers were involved
in a call like the one just described. We got well past 200.
-- mrr
With the idea that you balance power on the processor and the radio section to
achieve constant perceived quality at the lowest draw on the mobile's battery?
Is that cross-balancing already done? Are there things you can have the
non-power-constrained fixed infrastructure do that relieves the mobiles?
> You may be able to negotiate interoperable codecs from end
> to end. This happens in around 15% of all cases, when you
> call between similar media. For the rest, transcoding is
> done. If you call from Skype to a mobile the call is probably
> coded from the PC hardware's SLIN to Speex, then recoded
> to G.729 as it hits voip infrastructure, recoded to Alaw when
> it transits the phone network, and recoded to AMR if it is a
> 3G phone.
>
> The G.729 to alaw to AMR can probably be done away with,
> so the path becomes SLIN - SPEEX - G729 - AMR.
I hear noises from Deutsche Telekom that they are going to convert their
fixed-line infrastructure to VoIP in the mid-term, to save a lot of money,
they say.
> Video will always have different media and resolutions.
> Full transcoding is probably not worthwhile, but a mid-level
> transcoding e.g. vector remapping to a new resolution is
> doable inside H.263 with around 50 mips of processing power.
I know video better than audio - one of the major problems is that video
transcoding is almost always injective, i.e., has the potential to loose
information that later stages cannot get back. Is that also a problem in your
area?
> Improving GSM 06.10 encoding by 10% will extend the battery life
> of 500 million mobiles by several hours each. The power
> savings alone are large.
But the manufacturers might object 8-)...in my experience, one of the major
reasons for getting a new handset is that the old one's battery will no longer
take a charge, and get a spare is well-nigh impossible or unaffordable,
compared to get a new one. If you now extend the battery life...
Jan
You allude to it later - but the important point compared to other content
(including video) is that you have severe limits on round-trip delay. So the
usual trick of caching at the receiver to ride out drop-outs etc. is out.
Jan
Sounds like it's not quite at the point of using a speech-tuned MP3-based
model of the human auditory system and only sending those parameters, which
should be able to cover most drop-outs even better.
Jan
The optimisation is normally more tilted to having an operational network.
All the important parameters; i.e. frequency, timings slots, effect, codec etc
is chosen by the base station, or possibly by a back-end computer controlling
the base station.
Mobiles near base stations are told to send on low effect and use standard,
simple codecs. As conditions deteriorate the handsets get louder and are
told to use other codecs, if they are able to. They may get moved around
to better frequencies as well. All of this goes on while you are talking.
>
>> You may be able to negotiate interoperable codecs from end
>> to end. This happens in around 15% of all cases, when you
>> call between similar media. For the rest, transcoding is
>> done. If you call from Skype to a mobile the call is probably
>> coded from the PC hardware's SLIN to Speex, then recoded
>> to G.729 as it hits voip infrastructure, recoded to Alaw when
>> it transits the phone network, and recoded to AMR if it is a
>> 3G phone.
>>
>> The G.729 to alaw to AMR can probably be done away with,
>> so the path becomes SLIN - SPEEX - G729 - AMR.
>
>I hear noises from Deutsche Telekom that they are going to convert their
>fixed-line infrastructure to VoIP in the mid-term, to save a lot of money,
>they say.
As I said, a cluefully implemented VoIP network over good infrastructure
is a lot better than a classic ISDN network.
>> Video will always have different media and resolutions.
>> Full transcoding is probably not worthwhile, but a mid-level
>> transcoding e.g. vector remapping to a new resolution is
>> doable inside H.263 with around 50 mips of processing power.
>
>I know video better than audio - one of the major problems is that video
>transcoding is almost always injective, i.e., has the potential to loose
>information that later stages cannot get back. Is that also a problem in your
>area?
Audio transcodings lose data all the time, they are even worse; sometimes
they introduce "false positives"; i.e. generate phonemes that never were
in the original stream. You have to push the mentioned LPR codecs pretty
far for this to happen, though. G.723.1 is a lot worse in this respect.
>
>> Improving GSM 06.10 encoding by 10% will extend the battery life
>> of 500 million mobiles by several hours each. The power
>> savings alone are large.
>
>But the manufacturers might object 8-)...in my experience, one of the major
>reasons for getting a new handset is that the old one's battery will no longer
>take a charge, and get a spare is well-nigh impossible or unaffordable,
>compared to get a new one. If you now extend the battery life...
No, the manufacturers want to squeeze as much functionality they can
into the limited battery life.
-- mrr
But there is an approximate local correspondance in the various frames
involved, isn't there? So you need to be able to push three streams with a
context of, say, 128 kB each through a pipe (analogous to blocking when doing
a matrix multiply). That should be quite doable.
Jan
Ah, similar to MS bloatware consuming all advances in PC computational power 8-)?
Jan
As long as you don't have too much motion and/or too many reference
(source) frames, that works. OTOH you also need to implement the
frame-wide filtering functions, which are real cache-busters. (Like a
vertical filter across all block boundaries.)
So, what do we do instead with the additional transistors that Moore's
law gives us? Higher integration would be one obvious answer, and AMD
is going there with CPU/GPU combos on a chip. But GPUs eat memory
bandwidth even more happily than CPU cores do (with current high-end
cards having 67-86GB/s nominal bandwidth (or is it twice as much
(DDR)?)), so by adding a GPU, the number of CPU cores on the die that
can be used beneficially is reduced quite a bit. Integrating other
peripherals requires pins that can then not be used for connecting to
RAM, so this is another option that would reduce the number of useful
cores on the die. Still, these approaches would be ways to produce
computers at lower cost (instead of more capable computers), so they
will be taken.
But on the high-performance front, the only thing I see that die area
can be used for that reduces the pin/bandwidth demands rather than
increasing them is cache.
- anton
--
M. Anton Ertl Some things have to be seen to be believed
an...@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html
[...]
Humm, that particular practice seems a bit too devious... Explain idea;
don't mention patent; person A reads the idea; person B witness the act;
person A implements the idea and uses it commercially; person A finds out a
year down the line that he has been using something that is patented for
which he has NO licenses to be legally using it... Than can be bad!
:^0
First things first... Can they prove you read the whole document? What if
you just read the abstract and did not read the claims and/or the patent
teachings?
> Of course, I may be being paranoid,
> but US intellectual property law is pretty aggressive.
Its aggressive. However, if somebody makes a trivial change your algorithm,
and patents it such that it blocks you from making any improvements or even
using your stuff?
Bad...
:^(...
>
> A bit of speculation here. I would expect that the performance gains
> from multicore start tapering off at 8-32 cores per chip (or even
> sooner). That would mean we will be running out of steam fairly
> quickly (5 years or so?). So, my question is, what will be next?
>
> My best guess at this point is going system-on-a-chip, exemplified by
> Cell, Xbox360, or merging the GPU with a few CPUs. Revisiting on-chip
> RAM might be promising too. But what more interesting options are
> there?
>
This might be the way it goes for the next couple years towards
heterogeneous computing:
http://www.edn.com/index.asp?layout=article&articleid=CA6423621&partner=
enews
http://www.drccomputer.com/pdfs/DRC_RPU110_datasheet.pdf
While most of such solutions will never make it onto the chip, let
alone into the die, those easy enough to make use of for
vanilla-computing could make it.
> (Can someone please come up with an effective way to resume clock
> scaling? :-)
Tounge in cheek, yes: Plenty of room to scale - ASICs. ;-).
>
> Best,
> Thomas
--
Regards
Klaus
> First things first... Can they prove you read the whole document?
> What if you just read the abstract and did not read the claims and/or
> the patent teachings?
Can I prove that I did not? I'm afraid that my attitude to US law has
become that I want nothing to do with it that I can reasonably avoid. Far
too often, it seems to be a question of who has the most money and/or
the best press relations. I'm very nearly as cynical about the law of my
native country, by the way.
> First things first... Can they prove you read the whole document?
Irrelevant. Even if you've never heard of that patent, you'd be infringing
if you happen to use the same technique.
You're confusing it with copyright.
Stefan
>Irrelevant. Even if you've never heard of that patent, you'd be infringing
>if you happen to use the same technique.
Relevant. In the US at least, there are different penalty for knowing
and unknowing infringement. And you don't have to cite prior art that
you don't know about, but you must cite prior art that you do know
about.
For these reasons, it is typical for Valley law firms to counsel
engineers to not read patents.
-- g
Doesn't help at all. The USPTO granted way too many patents for things that
were common pratice back then - they even grant patents for new business
models on old ideas. RIM had to pay half a billion dollars to NTP for a
patent the USPTO themselves were about to invalidate. The patent NTP had on
was about sending mail to the receiver instead of polling from the sender.
I.e. they patented SMTP, something clearly falling into the "algorithms
published before 1990".
My advice: If you get caught violating a patent, and care about the money,
hire a gun (i.e. some ex-KGB agent from Russia), and make sure no heir
survives. Ways cheaper ;-).
--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/
Well, yes, but ....
If you teach kiddies one technique without making it very clear that it
is one of dozens and, more importantly, teaching them the practical
techniques needed, they will think that it is the be-all and end-all
of that subject.
Regards,
Nick Maclaren.
We are ENTIRELY at cross-purposes, because we are using the same words
(both in perfectly reasonable ways) with no overlap of meaning. I am
not going to bore people by trying to explain yet again.
|> >Read what you posted again. I was responding to that.
|> >
|> >And, yes, you do get garbage under those circumstances, if you are
|> >using a GUI - because `protocols' like X fail horribly if packets get
|> >out of sequence (which includes ones getting delayed for long enough
|> >for the user to get another interaction in before they are processed).
|>
|> ??? TCP will deliver in sequence, ordered; or not at all.
A single TCP stream will, yes. The problem is that there ISN'T a single
stream, even when X uses TCP - and it usually uses a variety of different
methods, including human coordination.
|> >Use appropriate techniques for 21st century telephone technology, and
|> >stop attempting to prop up archaic methods.
|> >
|> >It won't happen - any more than it does in other areas of IT :-(
|>
|> Let's see; the major work for AMR was done 2000-2002, and was finalised
|> in 2003, with a small "fix release" in 2005.
Sigh. You don't seem to have been around the block as often as I have.
The fact that the work was done recently doesn't mean that it was done
appropriately for the current AND FUTURE requirements. GOOD standards
are designed for the requirements of the period following their actual
use, which can be tricky.
So many people seem to use the motto "We deliver solutions to yesterday's
problems tomorrow."
Regards,
Nick Maclaren.
I have chosen this as an example of the mistake - and, make no mistake,
it IS a mistake.
The above approach will doubtless improve quality when the database is
correct, but it will be very unreliable. Firstly, such classifications
are very load-dependent, and may be true for most of the time, but not
all of it. Secondly, a lot of operators use temporary alternative
routing, often hired from other ones, which can completely change the
characteristics without warning.
The following is why I say that it is a mistake:
You have explained a phenomenon I have observed for some time, and
could not understand. I have severe hearing loss (55-60dB up to
2,000 cycles, dropping to 100+ dB at 8,000), compensated in childhood
by using alternative neural pathways, so that I am not apparently
deaf. Hence I am very sensitive to some kinds of loss, but NOT the
ones that most people are! I am VERY sensitive to timing issues,
for example, but not at all to high-frequency interference.
Telephone acoustic quality improved, almost everywhere, from the 1950s
to 1980s, but then started to degrade again. Now, it is back to the
1960s in quality, and I quite often get calls I can't hear at all or
have to ask the caller to retry. Most of the time, just recalling
(no change of telephones or number) cures the problem. Speaking to
people with normal hearing, they have the same problem, but not as
frequently as I do (which is perhaps one call in ten, even landline
to landline within the UK and closely linked countries).
Given your explanation, that is almost certainly due to the recoding
tricks not matching the transmission characteristics. God alone knows
the details, but this failure mode is ubiquitous among systems that
over-optimise for specific cases, and rely on case recognition to get
the best solution. In a very complex environment, it is almost always
better to go for a robust solution that is second-best or at least
acceptable in almost all cases.
Schedulers, anyone? :-)
Regards,
Nick Maclaren.
Under US law you can be sued for doing nothing or doing anything. That
doesn't mean you shouldn't have some strategy. My point is that avoiding
reading patents and sticking to published papers doesn't help you avoid
patent infringement since during the ninties almost every academic and
researcher has started patenting everything they do. The only thing
reading patents does is possibly expose you to triple damages. But for
most of us, 3 X infinity is the same as 1 X infinity. Actually, 0 X
infinity, the cost of defending yourself, is too much.
The problem I see is that telephones today have quite different microphone
sensitivity, and while good phones today can compensate that by an
adjustable speaker loudness, it's horribly in telephone conferences - one
person is barely audible, and the other deafening, and the conferencing
systems seem to be unable to compensate for that. I haven't seen adjustable
microphones as well (apart from VoIP softphones).
Before we got a VoIP system imposed by our headquarter, we've tried some
different options, and found that a softphone with headset and speex codec
(high-quality setting) provided the best quality. What we got was Avaya
phones with a G.711 codec, relatively poor sound quality (worse than our
old ISDN based telephone, and much worse than speex), and quite often we
have dropouts when telephoning with the headquarter (there the telephone
data tunnels via a VPN to us - no chance to get a QoS routing on that).
BP> I haven't seen adjustable microphones as well (apart from VoIP
BP> softphones).
Snom SIP phones have adjustable microphone gain. You have to go to
advanced settings though, so it is not something you can easily change
while in a conference.
/Benny
That is true, and it applies to both the volume and frequency response
of both the microphone and speaker. But it isn't the issue I was talking
about. Even those with volume control often distort badly when turned
up to the volume I need.
I really do mean that person A calls person B and the 'line' is ghastly.
He hangs up and recalls, and the 'line' is fine - NO change of handsets,
telephone numbers or methods of connexion. The bad 'lines' occur about
one time in ten, more-or-less according to a Poisson process.
That was very common in the 1950s and 1960s, and the cause was genuinely
bad lines. It improved vastly up to the 1980s, and started to degrade
again - in my view, in 2007 it has got about as bad as it was in the late
1960s - I have been extremely puzzled as to why for a couple of decades,
until Morten explained it.
Regards,
Nick Maclaren.
> On Mar 8, 7:30 am, "Klaus Fehrle" <klausfeh...@freenet.de> wrote:
> > "MitchAlsup" <MitchAl...@aol.com> wrote in message
>
> > > C) So, frequency scaling can be done, but 1) not with the cores as
> > > they are today, 2) you are still going to have to learn to program
> > > them en masse. but
> >
> > > D) You won't ever have to learn to program more than about 64.
> >
> > You see massive ||ism going nowhere at all, or are you referring
> > just to our lifespan with it?
>
> I am refering to the basic problem that the performance gained through
> massive parallelism has little to do with the count of cores or
> threads and a lot to do with bandwidths existing outside of the
> processors core complexes.
>
> Mitch
I see the bottleneck. However i'd think it can - and should - be
widened significantly by going optical instead of using pins for signal
i/o down the road.
Regards
Klaus
The last couple of computers I have upgraded/replace, when upgrading
the processor I was forced or to take full advantage of the new
processor I also needed to upgrade the RAM. So, if every time I
upgrade the processor I need to upgrade the RAM it got me to thinking.
What would be nice is if some large companies started competing to
design a new architecture that would combine the processor and memory
units like in the Connection Machine. It kind of makes sense
especially when you look picture of processor dies and about a third
to half the chip area is now cache RAM. The next step would be
designing a high speed interconnect that would allow routing and
communication between processors to be efficient. Proposals have been
made in the past but when MPP fell out of favor for cluster computing,
the funding for research for MPP architectures that could have benefit
multi-core/multiprocessor designs ended.
I imagine a machine that could be the cross breading of a Connection
Machine and SGI Origin2000 but smaller, something I could fit under my
desk. I could plug in modules that were a combination of memory and
processors to expand my computer as easily as I plug in memory DIMMs
now.
If such a machine were to be built then the next problem, software.
Right now there is a reluctance to switch to Vista because there are a
lot of known problems that existing software won't run under Vista. If
a new machine with a radical new architecture were to be designed then
new compilers to take advantage of it would need to be written (only
logical). A new OS kernel and virtual machines to support previous
operating systems.
When you think about it you begin to realize why nobody has undertaken
it - nobody wants to foot the bill. Maybe the government will step in,
like they did with HDTV and jump start it.
On Mar 7, 12:30 pm, Thomas Lindgren <***********@*****.***> wrote:
> A bit of speculation here. I would expect that the performance gains
> from multicore start tapering off at 8-32 cores per chip (or even
> sooner). That would mean we will be running out of steam fairly
> quickly (5 years or so?). So, my question is, what will be next?
>
> My best guess at this point is going system-on-a-chip, exemplified by
> Cell, Xbox360, or merging the GPU with a few CPUs. Revisiting on-chip
> RAM might be promising too. But what more interesting options are
> there?
>
> (Can someone _please_ come up with an effective way to resume clock
> scaling? :-)
>
> Best,
> Thomas
> --
> Thomas Lindgren
>
> "Ever tried. Ever failed. No matter. Fail again. Fail better."
That, by itself and even ignoring the fact that the USPTO granted boatloads of
patents that have prior art, shows how broken the US legal - I hesitat to use
the word "justice" - system really is.
Jan
Yes, like the government is so adept at defining technology direction.
Actually you pretty much described Blue Gene, at least at a conceptual
level.
The problem with this as a desktop/side model is there would have to be
customers and software.
If you have a plan for those, the sandhill road boys would take care of
you.
--
Del Cecchi
"This post is my own and doesn’t necessarily represent IBM’s positions,
strategies or opinions.”
Build, not design. The model was thrust upon us because there was no
other model at the time that could compute what was needed for the
hydrogen bomb no matter how many computers you put in a room with
slide rules and pads of paper. It was an ingenious concept that has
taken us far.
Your point falls apart on a secondary level because today, we
integrate memory on chips. In fact, it's L1, L2 and L3 cache.
Embedding the core with memory has been done, although you could scale
it up by replacing the static RAM with dynamic. But still, that's not
going to change things dramatically.
The principal reason for the von Neumann architecture's popularity is
programmability. It's (relatively) easy to program. Programming other
architectures, be it the Connection Machine or heterogeneous multicore
architectures, is that programming them is painful. Not just painful,
but pulling your-finger-nails-off-their-nail-beds-while-having-root-
canal-without-local-anesthetic-and-being-nailed-to-a-cross-upside-down
painful. People follow the path of least resistance and the
conceptually easy nature of algebraic languages reinforces von Neumann
popularity.
It's possible to go down a rathole and talk about why language class
foo, such as functional languages, are better. Personally, I do think
functional languages are better. But they're not easy for less-than-
savants to grok. Try explaining the concept of a Haskell monoid
sometime to a Java programmer and you'll feel reality hitting you with
a clue-by-four hard and in the head.
-scooter