Anyone not switching to CUDA ?

fft1976

unread,

Jul 22, 2009, 1:05:46 PM7/22/09

to

I'm curious: is anyone not switching all their new projects to CUDA
and why? It seems to me that CPUs are quickly becoming irrelevant in
scientific computing.

Nicolas Bonneel

unread,

Jul 22, 2009, 4:39:11 PM7/22/09

to

fft1976 a �crit :

> I'm curious: is anyone not switching all their new projects to CUDA
> and why? It seems to me that CPUs are quickly becoming irrelevant in
> scientific computing.

multi-core programming is not really recent. Also, coding on GPU already
has several years, and CUDA is not the only way to code on GPU.
Finally, Larrabee is being release soon, and I would see no reason to
change all your code in a heavy way to make it run on GPU while Larrabee
might require much fewer changes.

--
Nicolas Bonneel
http://www-sop.inria.fr/reves/Nicolas.Bonneel/

fft1976

unread,

Jul 22, 2009, 7:23:13 PM7/22/09

to

On Jul 22, 1:39 pm, Nicolas Bonneel
<nicolas.bonneel.asuppri...@wanadoo.fr> wrote:
> fft1976 a écrit :

>
> > I'm curious: is anyone not switching all their new projects to CUDA
> > and why? It seems to me that CPUs are quickly becoming irrelevant in
> > scientific computing.
>
> multi-core programming is not really recent. Also, coding on GPU already
> has several years, and CUDA is not the only way to code on GPU.
> Finally, Larrabee is being release soon, and I would see no reason to
> change all your code in a heavy way to make it run on GPU while Larrabee
> might require much fewer changes.

Frankly, I doubt Larrabee will be anywhere near 1.7 TFLOPS for $500
that we can have with CUDA _today_ (for a certain fairly wide class of
scientific problems)

BGB / cr88192

unread,

Jul 22, 2009, 8:33:26 PM7/22/09

to

"fft1976" <fft...@gmail.com> wrote in message
news:b539239d-c599-4139...@m3g2000pri.googlegroups.com...

On Jul 22, 1:39 pm, Nicolas Bonneel
<nicolas.bonneel.asuppri...@wanadoo.fr> wrote:

> fft1976 a �crit :

>
> > I'm curious: is anyone not switching all their new projects to CUDA
> > and why? It seems to me that CPUs are quickly becoming irrelevant in
> > scientific computing.
>
> multi-core programming is not really recent. Also, coding on GPU already
> has several years, and CUDA is not the only way to code on GPU.
> Finally, Larrabee is being release soon, and I would see no reason to
> change all your code in a heavy way to make it run on GPU while Larrabee
> might require much fewer changes.

<--

Frankly, I doubt Larrabee will be anywhere near 1.7 TFLOPS for $500
that we can have with CUDA _today_ (for a certain fairly wide class of
scientific problems)

-->

I don't use CUDA, but I also don't do scientific computing...

I do some limited GPU based processing, but mostly this is through the use
of OpenGL and the use of fragment shaders.

I might be willing to code for the GPU if it were provided in a means
similar to GLSL, for example, as a kind of OpenGL add-on or companion
library, however, to make my code dependent on an IMO questionable
preprocessor and compiler toolset, probably not, even for the performance
benefit...

now, it probably seems like it would be less convinient to compile with a
library-based interface, but in my experience it is actually far more
convinient, even for things like ASM (which are traditionally statically
assembled and linked against an app).

though maybe counter-intuitive, the inconvinience cost of using a library
interface to compile and link the code, and accessing dynamically-compiled
code through function-pointers, turns out to be IME more convinient than
that of having to deal with an external compiler/assembler and getting it
linked in this way...

of course, maybe my sense of "convinience" is skewed, who knows...

Nicolas Bonneel

unread,

Jul 22, 2009, 9:59:29 PM7/22/09

to

fft1976 a �crit :

> On Jul 22, 1:39 pm, Nicolas Bonneel
> <nicolas.bonneel.asuppri...@wanadoo.fr> wrote:

>> fft1976 a �crit :

>>
>>> I'm curious: is anyone not switching all their new projects to CUDA
>>> and why? It seems to me that CPUs are quickly becoming irrelevant in
>>> scientific computing.
>> multi-core programming is not really recent. Also, coding on GPU already
>> has several years, and CUDA is not the only way to code on GPU.
>> Finally, Larrabee is being release soon, and I would see no reason to
>> change all your code in a heavy way to make it run on GPU while Larrabee
>> might require much fewer changes.
>
>
> Frankly, I doubt Larrabee will be anywhere near 1.7 TFLOPS for $500
> that we can have with CUDA _today_ (for a certain fairly wide class of
> scientific problems)

This is not really the point. You cannot change your whole projects with
heavy modifications on the basis of a technology which is currently
beeing seriously endangered. I guess Larrabee will also provide much
better, and possibly larger cache.
GPU also suffer from rounding of denormalized numbers to zero (which is
fine for graphics, but maybe not for scientific computing).
Think that to get good performance with GPUs, you really have to
re-think your whole project (even if it was already parallelized on CPU)
just because of the GPU cache.

With 32cores featuring vector instructions of 16 single precision
numbers at the same time, I guess you can indeed get performances
similar to current GPUs. But with much less involved re-coding.

If you target broad diffusion of your application, you're stuck anyway
with graphic cards which do not support CUDA (while they still support
shaders).
So, except if by "project" you mean a 6 month research project, I don't
find any argument to port tens/hundreds of thousands lines of code to CUDA.

Phil Carmody

unread,

Jul 23, 2009, 2:54:57 AM7/23/09

to

Nicolas Bonneel <nicolas.bonne...@wanadoo.fr> writes:
> GPU also suffer from rounding of denormalized numbers to zero (which
> is fine for graphics, but maybe not for scientific computing).

They also use floats rather than doubles, which are useless for
scientific computing. I notice that the ones that do support
doubles do so at a tenth of the performance of floats, which is
roughly the hit you'd get if you wrote your own multi-precision
floats.

Then again, I tend to be a bit of Kahan-ite when it comes
to FPUs.
Phil
--
If GML was an infant, SGML is the bright youngster far exceeds
expectations and made its parents too proud, but XML is the
drug-addicted gang member who had committed his first murder
before he had sex, which was rape. -- Erik Naggum (1965-2009)

nm...@cam.ac.uk

unread,

Jul 23, 2009, 3:47:16 AM7/23/09

to

In article <h48b4n$q1o$1...@news.albasani.net>,

BGB / cr88192 <cr8...@hotmail.com> wrote:
>
>
>I might be willing to code for the GPU if it were provided in a means
>similar to GLSL, for example, as a kind of OpenGL add-on or companion
>library, however, to make my code dependent on an IMO questionable
>preprocessor and compiler toolset, probably not, even for the performance
>benefit...

That was one of my two points. They are working on it. Watch this
space.

Regards,
Nick Maclaren.

Pfenniger Daniel

unread,

Jul 23, 2009, 6:23:44 AM7/23/09

to fft1976

I dont because the kind of scientific computing I practice is based
on long term (10-20yr) software development. For the same reason
numerical software like Lapack is still based on a slowly evolving
language like Fortran, which fits the need of long term stability.
Converting this base into more modern language will still take years.

Also CUDA is a proprietary technology from a single company unfriendly
with the notion of open source software, therefore the technology
has little chance to survive long. All the long term projects (like
for instance TeX, LaTeX) must be based on open standards free of
restrictions like software patents.

CUDA is good for short term projects like games and software closely
related to the hardware (such as graphics), after 5 yr such software
is anyway obsolete.

BGB / cr88192

unread,

Jul 23, 2009, 1:54:11 PM7/23/09

to

"Phil Carmody" <thefatphi...@yahoo.co.uk> wrote in message
news:873a8nq...@kilospaz.fatphil.org...

> Nicolas Bonneel <nicolas.bonne...@wanadoo.fr> writes:
>> GPU also suffer from rounding of denormalized numbers to zero (which
>> is fine for graphics, but maybe not for scientific computing).
>
> They also use floats rather than doubles, which are useless for
> scientific computing. I notice that the ones that do support
> doubles do so at a tenth of the performance of floats, which is
> roughly the hit you'd get if you wrote your own multi-precision
> floats.
>
> Then again, I tend to be a bit of Kahan-ite when it comes
> to FPUs.

probably this would depend a lot on the type of "scientific computing", as
there are likely some types of tasks where speed matters a whole lot more
than accuracy...

Paul Hsieh

unread,

Jul 23, 2009, 2:44:59 PM7/23/09

to

On Jul 22, 11:54 pm, Phil Carmody <thefatphil_demun...@yahoo.co.uk>
wrote:

> Nicolas Bonneel <nicolas.bonneel.asuppri...@wanadoo.fr> writes:
> > GPU also suffer from rounding of denormalized numbers to zero (which
> > is fine for graphics, but maybe not for scientific computing).
>
> They also use floats rather than doubles, which are useless for
> scientific computing. I notice that the ones that do support
> doubles do so at a tenth of the performance of floats, which is
> roughly the hit you'd get if you wrote your own multi-precision
> floats.
>
> Then again, I tend to be a bit of Kahan-ite when it comes
> to FPUs.

Sure, sure ... but don't mistake the forest from the trees. If nVidia
is really serious about CUDA they will be forced to make real IEEE-754
doubles just like Intel or anyone else. I am sure that this is already
in their pipeline of features slated for a future product.

nVidia *wants* this to be used for super computing applications and
the TOP500 list guy refuses to accept any results from something that
isn't 64 bit. The economical motivations should be sufficient to
guarantee that they support it in the future. And if they don't write
a benchmark for it showing that its slower than advertised, publicize
your benchmark widely and watch nVidia respond.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

nm...@cam.ac.uk

unread,

Jul 23, 2009, 2:55:16 PM7/23/09

to

In article <a0069717-ece6-43ef...@u16g2000pru.googlegroups.com>,

Paul Hsieh <webs...@gmail.com> wrote:
>
>Sure, sure ... but don't mistake the forest from the trees. If nVidia
>is really serious about CUDA they will be forced to make real IEEE-754
>doubles just like Intel or anyone else. I am sure that this is already
>in their pipeline of features slated for a future product.

I asked them that question, and their non-answer made it pretty clear
that it is.

Regards,
Nick Maclaren.

Phil Carmody

unread,

Jul 23, 2009, 3:38:14 PM7/23/09

to

They already have doubles in their Tesla line. Their spokespeople have
also already admitted that they'll make professional (i.e. Tesla)
customers pay for the advanced silicon until there's demand from the
commodity market, and that's the direction they're expecting, and
planning, to move in.

nm...@cam.ac.uk

unread,

Jul 23, 2009, 3:45:01 PM7/23/09

to

In article <87ljmfn...@kilospaz.fatphil.org>,

Phil Carmody <thefatphi...@yahoo.co.uk> wrote:
>
>They already have doubles in their Tesla line. Their spokespeople have
>also already admitted that they'll make professional (i.e. Tesla)
>customers pay for the advanced silicon until there's demand from the
>commodity market, and that's the direction they're expecting, and
>planning, to move in.

Er, have you any idea how slow they are? Real programmers seem to
agree that a Tesla is 3-4 times faster than a quad-core Intel CPU
in double precision, which doesn't really repay the huge investment
in rewriting in CUDA.

I believe that a future version of the Tesla will improve the speed
of double precision considerably.

Regards,
Nick Maclaren.

Rui Maciel

unread,

Jul 23, 2009, 9:11:46 PM7/23/09

to

Nicolas Bonneel wrote:

> multi-core programming is not really recent. Also, coding on GPU already
> has several years, and CUDA is not the only way to code on GPU.

Is there currently a better way to perform computations on a graphics card?

> Finally, Larrabee is being release soon, and I would see no reason to
> change all your code in a heavy way to make it run on GPU while Larrabee
> might require much fewer changes.

I believe that "might" and "soon" are vague terms that don't have a relevant base to work on, not to mention
that virtually all technical information regarding larrabee is nothing more than marketing cruft.

Knowing that, if we are considering "soon" releases then I believe that OpenCL is much more worthy of
attention.

Rui Maciel

Phil Carmody

unread,

Jul 24, 2009, 2:21:42 AM7/24/09

to

nm...@cam.ac.uk writes:
> In article <87ljmfn...@kilospaz.fatphil.org>,
> Phil Carmody <thefatphi...@yahoo.co.uk> wrote:
>>
>>They already have doubles in their Tesla line. Their spokespeople have
>>also already admitted that they'll make professional (i.e. Tesla)
>>customers pay for the advanced silicon until there's demand from the
>>commodity market, and that's the direction they're expecting, and
>>planning, to move in.
>
> Er, have you any idea how slow they are?

Yes. Message-ID: <873a8nq...@kilospaz.fatphil.org>

> Real programmers seem to
> agree that a Tesla is 3-4 times faster than a quad-core Intel CPU
> in double precision, which doesn't really repay the huge investment
> in rewriting in CUDA.

I'm glad to know that this newsgroup is blessed by the presence of
someone familiar with the priorities of every single company, university,
and programmer in the world. We are truly honoured.

> I believe that a future version of the Tesla will improve the speed
> of double precision considerably.

Technology will improve in the future. Wow, I'm so jealous of
your remarkable insights.

Les Neilson

unread,

Jul 24, 2009, 4:09:10 AM7/24/09

to

"BGB / cr88192" <cr8...@hotmail.com> wrote in message
news:h4a844$d4g$1...@news.albasani.net...

Surely you mean precision rather than accuracy ?
Isn't it better to get accurate (correct) results, albeit slowly, than to
get inaccurate (wrong) results really fast ?

But yes it does depend on the type of scientific computing - it depends on
the range of values covered and the precision to which they are sensitive
before errors become significant.
When constructing the Channel Tunnel (digging from both ends simultaneously)
overall distances were large but they still had to ensure that they met
without too much of a "step difference" in elevation or horizontal plane. I
understand that when they did meet the centre points (as measured with
lasers) were "off" by a only a few millimeters.

Les

user923005

unread,

Jul 24, 2009, 4:46:12 AM7/24/09

to

On Jul 23, 12:45 pm, n...@cam.ac.uk wrote:
> In article <87ljmfnqah....@kilospaz.fatphil.org>,

> Phil Carmody <thefatphil_demun...@yahoo.co.uk> wrote:
>
>
>
> >They already have doubles in their Tesla line. Their spokespeople have
> >also already admitted that they'll make professional (i.e. Tesla)
> >customers pay for the advanced silicon until there's demand from the
> >commodity market, and that's the direction they're expecting, and
> >planning, to move in.
>
> Er, have you any idea how slow they are? Real programmers seem to
> agree that a Tesla is 3-4 times faster than a quad-core Intel CPU
> in double precision, which doesn't really repay the huge investment
> in rewriting in CUDA.
>
> I believe that a future version of the Tesla will improve the speed
> of double precision considerably.

I see it like this:
CUDA is nice for some set of problems.
To use CUDA you need to use special CUDA hardware and the special CUDA
compiler. This computer that I am writing this post from has a CUDA
graphics card. But it won't perform better than a normal CPU for most
math operations. If you want the blazing speed, you need one of those
fancy pants CUDA cards that has a giant pile of CPUs[1]. At some
point the fancy pants CUDA card will no longer be faster than other
kinds of hardware, so you will need to make a choice at that point in
time to upgrade the CUDA card or use some other hardware. You can't
use recursion with CUDA. Moving the data on and off the card is a
problem unless there is a huge amount of crunching going on in the GPU
pile.

It's kind of like the Deep Blue chess computer. When it was created,
nothing came remotely close to the chess compute power. But hardware
adavances exponentially, so soon a $2000 desktop system will be as
fast or faster.

I think CUDA is neat. We have the fast CUDA cards in house here and
the CUDA compiler and we have done some fooling around with it to try
to figure out what we can do with it.
But if you do make some CUDA solutions, your customers will also need
the CUDA cards unless you only plan to run it in house.

In short, CUDA is cool, but it's not a panacea. No hardware solution
will ever be the ideal solution because hardware ages. Prophecy is
tricky for us humans. We tend to get it wrong in the long run, most
of the time.

[1] Not every CUDA card is a GeForce GTX 280M.

bartc

unread,

Jul 24, 2009, 5:21:48 AM7/24/09

to

"Les Neilson" <l.ne...@nospam.co.uk> wrote in message
news:jKWdnS-Xa6Jp9_TX...@eclipse.net.uk...

I doubt speed was too important in that case. The tunnel took several years
to construct.

And probably a precision nearer 32-bit than 64-bit would have sufficed.

--
Bart

Nicolas Bonneel

unread,

Jul 24, 2009, 5:34:15 AM7/24/09

to

Rui Maciel a �crit :

> Nicolas Bonneel wrote:
>
>> multi-core programming is not really recent. Also, coding on GPU already
>> has several years, and CUDA is not the only way to code on GPU.
>
> Is there currently a better way to perform computations on a graphics card?

Except for very fancy computations, most of what you can do with CUDA
you can also do it using vertex and fragment programs.
Summing 2 matrices is done by rasterizing a quad and calling a fragment
program passing the 2 textures as input and outputting the pixel's color
as the sum of the 2 textures. I sometime find this approach more natural
than using CUDA since I'm used to it.
On gpgpu.org you'll find tons of publications for more complicated
operations which were done way before CUDA was released.

>
>> Finally, Larrabee is being release soon, and I would see no reason to
>> change all your code in a heavy way to make it run on GPU while Larrabee
>> might require much fewer changes.
>
> I believe that "might" and "soon" are vague terms that don't have a relevant base to work on, not to mention
> that virtually all technical information regarding larrabee is nothing more than marketing cruft.

Some companies already have prototypes, and I heard about early 2010 for
the public release, although I'm not sure.

> Knowing that, if we are considering "soon" releases then I believe that OpenCL is much more worthy of
> attention.

Indeed. This is another good argument against porting every app to CUDA.

James Kuyper

unread,

Jul 24, 2009, 6:09:09 AM7/24/09

to

Les Neilson wrote:
>
> "BGB / cr88192" <cr8...@hotmail.com> wrote in message
> news:h4a844$d4g$1...@news.albasani.net...
>>
>> "Phil Carmody" <thefatphi...@yahoo.co.uk> wrote in message
>> news:873a8nq...@kilospaz.fatphil.org...
>>> Nicolas Bonneel <nicolas.bonne...@wanadoo.fr> writes:
>>>> GPU also suffer from rounding of denormalized numbers to zero (which
>>>> is fine for graphics, but maybe not for scientific computing).
>>>
>>> They also use floats rather than doubles, which are useless for
>>> scientific computing. I notice that the ones that do support
>>> doubles do so at a tenth of the performance of floats, which is
>>> roughly the hit you'd get if you wrote your own multi-precision
>>> floats.
>>>
>>> Then again, I tend to be a bit of Kahan-ite when it comes
>>> to FPUs.
>>
>> probably this would depend a lot on the type of "scientific
>> computing", as there are likely some types of tasks where speed
>> matters a whole lot more than accuracy...
>>
>
> Surely you mean precision rather than accuracy ?
> Isn't it better to get accurate (correct) results, albeit slowly, than
> to get inaccurate (wrong) results really fast ?

I thinking you're interpreting "accuracy" in a different way than BGB
did. It's not a binary choice between correct and incorrect, it's a
quantity describing the difference between the calculated result and
actual value. You can't expect high accuracy from a calculated value
unless it also has a high precision, but it's the accuracy which
actually matters.

A certain degree of inaccuracy is unavoidable in most cases of interest,
and it's a legitimate question to ask what level of inaccuracy is
acceptable. Sometimes, it is indeed the case that a less accurate
result, obtained more quickly, is much more useful than a more accurate
result, obtained more slowly.

There are some things that can be calculated, in principle, with
infinite accuracy; but only at the expense of infinite amounts of
computation time - if, as you suggest, inaccuracy should always be
avoided, then there would be no basis for deciding when to stop
improving the accuracy of such calculations.

Some real scientific calculations require results accurate to 20 or so
decimal places. Other calculations would remain useful even if they were
off by several orders of magnitude. Most calculations fall somewhere in
between those two extremes. Whether a computation can be done with
acceptable accuracy using single precision, or whether it requires
double precision (or even more!), depends upon the purpose of the
calculation. No one rule covers all cases.

fj

unread,

Jul 24, 2009, 6:10:22 AM7/24/09

to

On 24 juil, 11:21, "bartc" <ba...@freeuk.com> wrote:
> "Les Neilson" <l.neil...@nospam.co.uk> wrote in message
>
> news:jKWdnS-Xa6Jp9_TX...@eclipse.net.uk...
>
>
>
>
>
> > "BGB / cr88192" <cr88...@hotmail.com> wrote in message
> >news:h4a844$d4g$1...@news.albasani.net...
>
> >> "Phil Carmody" <thefatphil_demun...@yahoo.co.uk> wrote in message
> >>news:873a8nq...@kilospaz.fatphil.org...

Not sure :
- tunnel length about 50km (37 under the sea)
- precision to reach in computations : 5mm (probably less)

=> at least 7 orders of magnitude to cover

As on looses easily 2 orders because of computations, single precision
(8 significant digits) seems not enough.

>
> --
> Bart

nm...@cam.ac.uk

unread,

Jul 24, 2009, 6:21:38 AM7/24/09

to

In article <fb6d4c7c-19b1-42e0...@q11g2000yqi.googlegroups.com>,

>=3D> at least 7 orders of magnitude to cover

>
>As on looses easily 2 orders because of computations, single precision
>(8 significant digits) seems not enough.

Even if IEEE 754 delivered 8 significant digits, which it doesn't :-)
It's just below 7.

Regards,
Nick Maclaren.

fj

unread,

Jul 24, 2009, 6:41:55 AM7/24/09

to

On 24 juil, 12:21, n...@cam.ac.uk wrote:
> In article <fb6d4c7c-19b1-42e0-9e36-d05c4571a...@q11g2000yqi.googlegroups.com>,

>
> fj <francois.j...@irsn.fr> wrote:
> >On 24 juil, 11:21, "bartc" <ba...@freeuk.com> wrote:
>
> >> And probably a precision nearer 32-bit than 64-bit would have sufficed.
>
> >Not sure :
> >- tunnel length about 50km (37 under the sea)
> >- precision to reach in computations : 5mm (probably less)
>
> >=3D> at least 7 orders of magnitude to cover
>
> >As on looses easily 2 orders because of computations, single precision
> >(8 significant digits) seems not enough.
>
> Even if IEEE 754 delivered 8 significant digits, which it doesn't :-)
> It's just below 7.

Oops, you are right ! But as I rarely used single precision values
(except for graphics), I forgot that ... Another explanation I prefer
to forget immediately : Alzheimer :-(

>
> Regards,
> Nick Maclaren.

Rui Maciel

unread,

Jul 23, 2009, 9:51:21 PM7/23/09

to

Nicolas Bonneel wrote:

> This is not really the point. You cannot change your whole projects with
> heavy modifications on the basis of a technology which is currently
> beeing seriously endangered.

Where exactly do you base your "seriously endangered" allegation? And won't larrabee also force those who
wish to adopt it to "heavily modify" their projects?

> I guess Larrabee will also provide much
> better, and possibly larger cache.

According to what? And compared to what?

> GPU also suffer from rounding of denormalized numbers to zero (which is
> fine for graphics, but maybe not for scientific computing).

It is reported that CUDA's double precision is fully IEEE-754 compliant and I believe that all deviations
from that standard are explicitly listed in the CUDA programming guide.

> Think that to get good performance with GPUs, you really have to
> re-think your whole project (even if it was already parallelized on CPU)
> just because of the GPU cache.

Why do you believe that larrabee's case will be any different?

> With 32cores featuring vector instructions of 16 single precision
> numbers at the same time, I guess you can indeed get performances
> similar to current GPUs. But with much less involved re-coding.

As there are no larrabee benchmarks available and until a while ago larrabee was nothing more than a press
release and a powerpoint presentation, where exactly do you base your performance claims? And have you
looked into the C++ Larrabee prototype library? The examples listed there are more convoluted and
unreadable than anything present in CUDA's programming guide.

> If you target broad diffusion of your application, you're stuck anyway
> with graphic cards which do not support CUDA (while they still support
> shaders).

Why exactly do you believe that you should not "target broad diffusion" if you chose to program for a piece
of hardware that has nearly 75% of the market share and why exactly do you believe that, instead of
targeting the 75% market share, you are better off targeting a piece of hardware that doesn't even exist?

> So, except if by "project" you mean a 6 month research project, I don't
> find any argument to port tens/hundreds of thousands lines of code to
> CUDA.

And instead you believe it is better to base your "projects" on non-existing, non-proven hardware? That
advice doesn't seem to be sound by far.

Rui Maciel

Nicolas Bonneel

unread,

Jul 24, 2009, 7:00:27 AM7/24/09

to

James Kuyper a écrit :

> Les Neilson wrote:
>> Surely you mean precision rather than accuracy ?
>> Isn't it better to get accurate (correct) results, albeit slowly, than
>> to get inaccurate (wrong) results really fast ?
>
> I thinking you're interpreting "accuracy" in a different way than BGB
> did. It's not a binary choice between correct and incorrect, it's a
> quantity describing the difference between the calculated result and
> actual value. You can't expect high accuracy from a calculated value

[...]

>
> Some real scientific calculations require results accurate to 20 or so
> decimal places. Other calculations would remain useful even if they were
> off by several orders of magnitude. Most calculations fall somewhere in
> between those two extremes. Whether a computation can be done with
> acceptable accuracy using single precision, or whether it requires
> double precision (or even more!), depends upon the purpose of the
> calculation. No one rule covers all cases.

no, precision and accuracy are different. When you say: "scientific
calculations require results accurate to 20 or so decimal places", you
mean *precise* to 20 decimal places.
If the true expected value should be 0, and your algorithm is converging
toward 0 it is accurate, even if you get at the end 0.5 because you
didn't run enough iterations (in which case it is not precise enough).
But if your algo is converging toward 1 then the algo is not accurate.
Even if at the end you also get 0.5 because you didn't run enough
iterations (in which case you're both inprecise and inaccurate).

pdpi

unread,

Jul 24, 2009, 8:19:47 AM7/24/09

to

On Jul 24, 9:09 am, "Les Neilson" <l.neil...@nospam.co.uk> wrote:
> Surely you mean precision rather than accuracy ?
> Isn't it better to get accurate (correct) results, albeit slowly, than to
> get inaccurate (wrong) results really fast ?

I trust that this isn't what he meant, but if your inaccurate results
are unbiased, then the optimal method to get the best accuracy in the
least time might be to simply make loads of measurements, average
them, and let the law of large numbers work its magic.

nm...@cam.ac.uk

unread,

Jul 24, 2009, 9:07:39 AM7/24/09

to

In article <4a69139a$0$27140$a729...@news.telepac.pt>,
Rui Maciel <rui.m...@gmail.com> wrote:

>Nicolas Bonneel wrote:
>
>> GPU also suffer from rounding of denormalized numbers to zero (which is
>> fine for graphics, but maybe not for scientific computing).
>
>It is reported that CUDA's double precision is fully IEEE-754 compliant
>and I believe that all deviations from that standard are explicitly
>listed in the CUDA programming guide.

Firstly, it is pure dogma (and scientific nonsense) to say that hard
underflow is worse for scientific computing. Numerically, it has
both advantages and disadvantages over soft underflow.

Secondly, the latest version of IEEE 754 allows hard underflow as an
option, anyway, so doing that is no longer a breach of its standard.

Regards,
Nick Maclaren.

Nicolas Bonneel

unread,

Jul 24, 2009, 9:02:06 AM7/24/09

to

Rui Maciel a �crit :

> Nicolas Bonneel wrote:
>
>> This is not really the point. You cannot change your whole projects with
>> heavy modifications on the basis of a technology which is currently
>> beeing seriously endangered.
>
> Where exactly do you base your "seriously endangered" allegation?

On the release of a concurrent product by Intel. I said endangered, not
dead.

> And won't larrabee also force those who
> wish to adopt it to "heavily modify" their projects?

I was hoping not (see 3 items below).

>> I guess Larrabee will also provide much
>> better, and possibly larger cache.
>
> According to what? And compared to what?

by better cache I mean coherent cache (compared to GPU, and according to
their spec).

>> GPU also suffer from rounding of denormalized numbers to zero (which is
>> fine for graphics, but maybe not for scientific computing).
>
> It is reported that CUDA's double precision is fully IEEE-754 compliant and I believe that all deviations
> from that standard are explicitly listed in the CUDA programming guide.

This was for single.

But CUDA doubles are much slower than singles. (and even double are not
*fully* compliant - rounding-to-nearest-even is the only supported
rounding for some operations).

>> Think that to get good performance with GPUs, you really have to
>> re-think your whole project (even if it was already parallelized on CPU)
>> just because of the GPU cache.
>
> Why do you believe that larrabee's case will be any different?

I hope Larrabee will be able to handle a simple recursion which is very
a GPU specific limitation (ok, this is not due to the cache). The cache
coherence also simplifies the writing of parallel algos (synchronization
etc.). Larrabee also has the standard x86 instruction set which should
make it compile on any existing compiler.

Otherwise, if they don't manage to make it better in general, they
already lost the war, so I think they are clever enough to offer better
features.
But of course, if you believe that it will be at the same time:
- less readable
- no better features
- less performant
- same re-coding involvement
then, I'm wondering why they had the idea to release a new product!

>> With 32cores featuring vector instructions of 16 single precision
>> numbers at the same time, I guess you can indeed get performances
>> similar to current GPUs. But with much less involved re-coding.
>
> As there are no larrabee benchmarks available and until a while ago larrabee was nothing more than a press
> release and a powerpoint presentation, where exactly do you base your performance claims?

It was also optionnally a Siggraph paper more than a year ago.

> And have you
> looked into the C++ Larrabee prototype library? The examples listed there are more convoluted and
> unreadable than anything present in CUDA's programming guide.

Indeed, I just checked, and I was hoping for more readability :s

>> If you target broad diffusion of your application, you're stuck anyway
>> with graphic cards which do not support CUDA (while they still support
>> shaders).
>
> Why exactly do you believe that you should not "target broad diffusion" if you chose to program for a piece
> of hardware that has nearly 75% of the market share and why exactly do you believe that, instead of
> targeting the 75% market share, you are better off targeting a piece of hardware that doesn't even exist?

Where have you seen this 75% number ? CUDA is only for NVIDIA, and only
for "recent" graphic cards (>= 8 serie). I think the time where people
changed their hardware every year has passed.
Recall that you can still do pixels shaders on a much broader range of
graphic cards.

>> So, except if by "project" you mean a 6 month research project, I don't
>> find any argument to port tens/hundreds of thousands lines of code to
>> CUDA.
>
> And instead you believe it is better to base your "projects" on non-existing, non-proven hardware? That
> advice doesn't seem to be sound by far.

no, to base very long term projects on Fortran/C++ on CPUs and to base
projects for broad diffusion on pixel shaders (as it was done so far -
do you know of any public project done in CUDA?). CUDA is fine for short
term research projects where you want to get your teraflop before anyone
else.

James Kuyper

unread,

Jul 24, 2009, 9:19:26 AM7/24/09

to

...
>> Some real scientific calculations require results accurate to 20 or so
>> decimal places. Other calculations would remain useful even if they
>> were off by several orders of magnitude. Most calculations fall
>> somewhere in between those two extremes. Whether a computation can be
>> done with acceptable accuracy using single precision, or whether it
>> requires double precision (or even more!), depends upon the purpose of
>> the calculation. No one rule covers all cases.
>
> no, precision and accuracy are different. When you say: "scientific
> calculations require results accurate to 20 or so decimal places", you
> mean *precise* to 20 decimal places.

No, a result of 0.12345678901234567890 that is precise to 20 decimal
places, but which differs from the true value by 0.003 is accurate to
only 3 decimal places, and in most cases I don't care about the 17 extra
digits of precision, I'm worried about the 3 digits of accuracy. Double
precision floating point can give you extra precision compared to single
precision, but if there's inaccuracy in the measurements or algorithm,
the extra precision won't necessarily do you any good.

> If the true expected value should be 0, and your algorithm is converging
> toward 0 it is accurate, even if you get at the end 0.5 because you
> didn't run enough iterations (in which case it is not precise enough).
> But if your algo is converging toward 1 then the algo is not accurate.

Accuracy of an algorithm is meaningful only in the sense that you're
actually talking about he accuracy of the results of the algorithm. The
algorithm itself is correct or incorrect. For an iterative algorithm,
the complete description of the algorithm has to include the stopping
criteria. If calculates with a precision of 100 decimal places a value
of 1, when the correct value is 0, then it's an incorrect but very
precise algorithm that will produce very inaccurate results.

> Even if at the end you also get 0.5 because you didn't run enough
> iterations (in which case you're both inprecise and inaccurate).

If the algorithms' stopping criteria indicate termination at that point,
then they both produce results with an accuracy of 0.5

Ron Shepard

unread,

Jul 24, 2009, 11:44:55 AM7/24/09

to

In article <yxiam.824$646...@nwrddc01.gnilink.net>,
James Kuyper <james...@verizon.net> wrote:

> No, a result of 0.12345678901234567890 that is precise to 20 decimal
> places, but which differs from the true value by 0.003 is accurate to
> only 3 decimal places, and in most cases I don't care about the 17 extra
> digits of precision, I'm worried about the 3 digits of accuracy.

I don't have much to contribute to the accuracy and precision
discussion, but I wanted to point out that there are indeed many
problems in which those 20 digits might be important even though the
result is accurate to only three digits. This is related to the
general problem of solving an accurate model approximately, or
solving an approximate model precisely. Scientists need to be able
to do both.

In my field of chemistry, for example, it is possible to compute
bond dissociation energies accurate to about 1 kcal/mole from
individual total energies that are accurate to no better than 1000
kcal/mole (or worse, for larger molecules). This is because for
that particular property, the approximate mathematical model that is
used results in large cancellations of errors.

$.02 -Ron Shepard

Phil Hobbs

unread,

Jul 25, 2009, 11:20:45 AM7/25/09

to

Phil Carmody wrote:
> Nicolas Bonneel <nicolas.bonne...@wanadoo.fr> writes:
>> GPU also suffer from rounding of denormalized numbers to zero (which
>> is fine for graphics, but maybe not for scientific computing).
>
> They also use floats rather than doubles, which are useless for
> scientific computing. I notice that the ones that do support
> doubles do so at a tenth of the performance of floats, which is
> roughly the hit you'd get if you wrote your own multi-precision
> floats.
>
> Then again, I tend to be a bit of Kahan-ite when it comes
> to FPUs.
> Phil

Depends on the scientific coding you mean. Floats are great for, e.g.
FDTD, where the errors keep flowing out the edges of the boundary, and
so have little chance to grow.

Cheers

Phil Hobbs

Dr Ivan D. Reid

unread,

Jul 25, 2009, 7:35:10 PM7/25/09

to

On Fri, 24 Jul 2009 01:46:12 -0700 (PDT), user923005 <dco...@connx.com>

wrote in <65a777e1-c04c-4005...@i18g2000pro.googlegroups.com>:
> On Jul 23, 12:45�pm, n...@cam.ac.uk wrote:
>> In article <87ljmfnqah....@kilospaz.fatphil.org>,
>> Phil Carmody �<thefatphil_demun...@yahoo.co.uk> wrote:

>> >They already have doubles in their Tesla line. Their spokespeople have
>> >also already admitted that they'll make professional (i.e. Tesla)
>> >customers pay for the advanced silicon until there's demand from the
>> >commodity market, and that's the direction they're expecting, and
>> >planning, to move in.

>> Er, have you any idea how slow they are? �Real programmers seem to
>> agree that a Tesla is 3-4 times faster than a quad-core Intel CPU
>> in double precision, which doesn't really repay the huge investment
>> in rewriting in CUDA.

Tests I've done lately suggest that there is little, if any, real
performance gain on our particular code on the latest 1-U 2xTesla/2x4Core
Nehalem machine compared to doing it in CPU entirely. We're still limited
by some serial code so we need a CPU to feed each GPU instance (at present
our 4Kx4K FFTs can run 5 instances on a 4 GB Tesla). I'm hoping to turn
some of our serial code into a CUDA kernel in the next few days -- if that
works we'll still be dominated by the file output, I'm afraid.

>> I believe that a future version of the Tesla will improve the speed
>> of double precision considerably.

> I see it like this:
> CUDA is nice for some set of problems.
> To use CUDA you need to use special CUDA hardware and the special CUDA
> compiler. This computer that I am writing this post from has a CUDA
> graphics card. But it won't perform better than a normal CPU for most
> math operations. If you want the blazing speed, you need one of those
> fancy pants CUDA cards that has a giant pile of CPUs[1]. At some
> point the fancy pants CUDA card will no longer be faster than other
> kinds of hardware, so you will need to make a choice at that point in
> time to upgrade the CUDA card or use some other hardware. You can't
> use recursion with CUDA. Moving the data on and off the card is a
> problem unless there is a huge amount of crunching going on in the GPU
> pile.

I'm running seti@home on an Acer Aspire Revo PC (Windows 7 64-bit
release candidate). It has the nVidia ION GPU and chipset. Up until a
problem surfaced with the latest Patch Tuesday update (now a process
called taskhome.exe periodically starts using 100% of a hyperthread) the
16-core GPU was turning in performance about 120% of the combined two
threaded CPU instances.

> It's kind of like the Deep Blue chess computer. When it was created,
> nothing came remotely close to the chess compute power. But hardware
> adavances exponentially, so soon a $2000 desktop system will be as
> fast or faster.

The Revo cost �160, plus �34 for a 4 GB memory upgrade.

> I think CUDA is neat. We have the fast CUDA cards in house here and
> the CUDA compiler and we have done some fooling around with it to try
> to figure out what we can do with it.
> But if you do make some CUDA solutions, your customers will also need
> the CUDA cards unless you only plan to run it in house.

> In short, CUDA is cool, but it's not a panacea. No hardware solution
> will ever be the ideal solution because hardware ages. Prophecy is
> tricky for us humans. We tend to get it wrong in the long run, most
> of the time.

I can't disagree with that. Given the above, it appears that
the CPU manufacturers might be raising the bar more quickly than nVidia.

--
Ivan Reid, School of Engineering & Design, _____________ CMS Collaboration,
Brunel University. Ivan.Reid@[brunel.ac.uk|cern.ch] Room 40-1-B12, CERN
KotPT -- "for stupidity above and beyond the call of duty".

Rui Maciel

unread,

Jul 26, 2009, 10:49:54 AM7/26/09

to

Nicolas Bonneel wrote:

>> Where exactly do you base your "seriously endangered" allegation?
>
> On the release of a concurrent product by Intel. I said endangered, not
> dead.

Intel also makes graphics cards and that hasn't endangered other graphics cards makers.

<snip/>

>>> GPU also suffer from rounding of denormalized numbers to zero (which is
>>> fine for graphics, but maybe not for scientific computing).
>>
>> It is reported that CUDA's double precision is fully IEEE-754 compliant
>> and I believe that all deviations from that standard are explicitly
>> listed in the CUDA programming guide.
>
> This was for single.

Yet, is that relevant or even technically meaningful? And, as larrabee doesn't exists beyond powerpoint
presentations and press releases, how can we be sure it is compliant with IEEE-754?

> But CUDA doubles are much slower than singles. (and even double are not
> *fully* compliant - rounding-to-nearest-even is the only supported
> rounding for some operations).

It was reported that a specific Tesla model suffered from considerably lower performance when dealing with
doubles. Nonetheless, a problem affecting a specific product doesn't mean the entire architecture is
flawed.

Other than that, isn't the "float Vs double" performance issue something that is present in virtually all
platforms? In fact, although Intel is yet to present any objective, verifiable information on larrabee, the
company has been avoiding mentioning anything regarding larrabee's doubles performance.

So claiming that a certain platform offers a lower doubles performance when compared to floats does not in
any way mean that some other platform is any better, particularly when that product doesn't yet exist and
there is no objective data on it.

>>> Think that to get good performance with GPUs, you really have to
>>> re-think your whole project (even if it was already parallelized on CPU)
>>> just because of the GPU cache.
>>
>> Why do you believe that larrabee's case will be any different?
>
> I hope Larrabee will be able to handle a simple recursion which is very
> a GPU specific limitation (ok, this is not due to the cache). The cache
> coherence also simplifies the writing of parallel algos (synchronization
> etc.). Larrabee also has the standard x86 instruction set which should
> make it compile on any existing compiler.
>
> Otherwise, if they don't manage to make it better in general, they
> already lost the war, so I think they are clever enough to offer better
> features.
> But of course, if you believe that it will be at the same time:
> - less readable
> - no better features
> - less performant
> - same re-coding involvement
> then, I'm wondering why they had the idea to release a new product!

There is money to be made in that market and Intel wants a piece of it[1]. That's a very good reason to put
any product in the market no matter how bad it is.

On the other hand, blindly believing in the unproven features and capabilities of a platform that doesn't
(yet) exist beyond powerpoint presentations and press releases doesn't seem like a good idea, specially
when the company behind that product is making sure it generates enough hype to get it's revenue foot
firmly in that market.

Moreover, criticizing pretty much the only tangible, concrete technology that has been released in favor of
a baseless marketing campaign doesn't make it any better.

[1] http://channel.hexus.net/content/item.php?item=12233

>>> With 32cores featuring vector instructions of 16 single precision
>>> numbers at the same time, I guess you can indeed get performances
>>> similar to current GPUs. But with much less involved re-coding.
>>
>> As there are no larrabee benchmarks available and until a while ago
>> larrabee was nothing more than a press release and a powerpoint
>> presentation, where exactly do you base your performance claims?
>
> It was also optionnally a Siggraph paper more than a year ago.

I believe the Siggraph paper is, up to this day, the main reference of what larrabee may be. Yet, it simply
describes the architecture and then presents some perforance numbers produced by simulations that were made
based on estimates. More, they base they measured performance not on any tangible, meaningful way but on
something they call "larrabee units", which makes it impossible to make any meaningful comparison with any
gpgpu tech already available.

The paper in question is available at:
http://software.intel.com/file/18198/

>>> If you target broad diffusion of your application, you're stuck anyway
>>> with graphic cards which do not support CUDA (while they still support
>>> shaders).
>>
>> Why exactly do you believe that you should not "target broad diffusion"
>> if you chose to program for a piece of hardware that has nearly 75% of
>> the market share and why exactly do you believe that, instead of
>> targeting the 75% market share, you are better off targeting a piece of
>> hardware that doesn't even exist?
>
> Where have you seen this 75% number ? CUDA is only for NVIDIA, and only
> for "recent" graphic cards (>= 8 serie). I think the time where people
> changed their hardware every year has passed.

I've searched for the article that stated those numbers but couldn't find it. Instead I've found an
article[1] that states a 73% market share compared to intel's 27%, which doesn't seem to be trustworthy.
Other articles claim that NVidia's market share was, in Q1 2009, 67% [2]

[1] http://www.streetinsider.com/Corporate+News/NVIDIA+%28NVDA%29+Has+73%25+Market+Share+in+GPU+vs.
+Intels+27%25/3536299.html
[2] http://www.neoseeker.com/news/10596-nvidia-market-share-shows-slight-improvement-over-q4-2008/

> Recall that you can still do pixels shaders on a much broader range of
> graphic cards.

The CUDA-enabled hardware also offers support for OpenCL, which is a good way to get a uniform way to
target a broad range of systems, including CUDA.

>>> So, except if by "project" you mean a 6 month research project, I don't
>>> find any argument to port tens/hundreds of thousands lines of code to
>>> CUDA.
>>
>> And instead you believe it is better to base your "projects" on
>> non-existing, non-proven hardware? That advice doesn't seem to be sound
>> by far.
>
> no, to base very long term projects on Fortran/C++ on CPUs and to base
> projects for broad diffusion on pixel shaders (as it was done so far -
> do you know of any public project done in CUDA?).

As far as I know, autodesk supports CUDA in some of their products
http://finance.yahoo.com/news/Autodesk-Leverages-NVIDIA-GPU-prnews-3312834929.html?x=0&.v=1

Rui Maciel

Nicolas Bonneel

unread,

Jul 26, 2009, 8:37:33 PM7/26/09

to

Rui Maciel a �crit :

> Nicolas Bonneel wrote:
>
>>> Where exactly do you base your "seriously endangered" allegation?
>> On the release of a concurrent product by Intel. I said endangered, not
>> dead.
>
> Intel also makes graphics cards and that hasn't endangered other graphics cards makers.

yes, but without introducing a new technology so far. So any intel
graphics card plugged into you computer can run your previous shaders,
and no particularly new instructions were added.
So, as you say later, in this case, they just wanted to get into the
existing market of existing products.

>>>> GPU also suffer from rounding of denormalized numbers to zero (which is
>>>> fine for graphics, but maybe not for scientific computing).
>>> It is reported that CUDA's double precision is fully IEEE-754 compliant
>>> and I believe that all deviations from that standard are explicitly
>>> listed in the CUDA programming guide.
>> This was for single.
>
> Yet, is that relevant or even technically meaningful? And, as larrabee doesn't exists beyond powerpoint
> presentations and press releases, how can we be sure it is compliant with IEEE-754?

it is since it's based on an existing Pentium architecture compliant
with IEEE. They extend this architecture with a new instruction set, but
they are used to making things compliants. Unlike NVidia.

Also, as I mentionned earlier, prototypes have already been released.
This is not "just a Siggraph paper" anymore (or even, as you mention,
just a Powerpoint presentation!!).
If you happen to read french, the only link I found confirming that is:
http://www.canardpc.com/news-35623-canard_pc_vous_en_dit_plus_sur_larrabee.html

>
>
>> But CUDA doubles are much slower than singles. (and even double are not
>> *fully* compliant - rounding-to-nearest-even is the only supported
>> rounding for some operations).
>
> It was reported that a specific Tesla model suffered from considerably lower performance when dealing with
> doubles. Nonetheless, a problem affecting a specific product doesn't mean the entire architecture is
> flawed.

Not many graphics cards handle doubles (and in particular, much less
than the "75% maket shares"!). Given their documentation, there are 9
graphics cards to date which support doubles, including 2 Teslas and 3
Quadros.
On the website :
http://ixbtlabs.com/articles3/video/gt200-part1-p2.html
you can see that all 10 (?) of their double-precision graphic cards have
peak performance of 90 gigaflops, similar to a simple 8-core Xeon CPU.
So, this is not specific to Teslas. In fact, among all the processors in
an NVidia graphic card, only a small subset of them compute with double
precision.

>
> Other than that, isn't the "float Vs double" performance issue something that is present in virtually all
> platforms?

not in current CPUs. Current CPUs compute floats at approximately the
same speed as doubles.

> In fact, although Intel is yet to present any objective, verifiable information on larrabee, the
> company has been avoiding mentioning anything regarding larrabee's doubles performance.
>
> So claiming that a certain platform offers a lower doubles performance when compared to floats does not in
> any way mean that some other platform is any better, particularly when that product doesn't yet exist and
> there is no objective data on it.

Larrabee is based on Pentium CPU, and all current CPUs support doubles
with no cost.
They currently add Vector Processing Units, which handle either 8
float64 or 16 float32. So double precisions will divide performance on
these units by 2. Not by 10!.

I don't cricticize. I just answer to the somewhat oriented question
"anyone not switching to CUDA", as if it was obvious that CUDA was the
only and best choice for all projects.

Larrabee is not a powerpoint presentation anymore, and you can google
for its architecture and performances.
For example,
http://www.tomshardware.com/news/intel-larrabee-nvidia-geforce,7944.html
tells about a performance similar to GTX285.
There are even some numbers here (still in french, sorry):
http://forum.hardware.fr/hfr/Hardware/2D-3D/topic-unique-larrabee-sujet_819816_1.htm

>>>> With 32cores featuring vector instructions of 16 single precision
>>>> numbers at the same time, I guess you can indeed get performances
>>>> similar to current GPUs. But with much less involved re-coding.
>>> As there are no larrabee benchmarks available and until a while ago
>>> larrabee was nothing more than a press release and a powerpoint
>>> presentation, where exactly do you base your performance claims?
>> It was also optionnally a Siggraph paper more than a year ago.
>
> I believe the Siggraph paper is, up to this day, the main reference of what larrabee may be. Yet, it simply
> describes the architecture and then presents some perforance numbers produced by simulations that were made
> based on estimates. More, they base they measured performance not on any tangible, meaningful way but on
> something they call "larrabee units", which makes it impossible to make any meaningful comparison with any
> gpgpu tech already available.
>
> The paper in question is available at:
> http://software.intel.com/file/18198/

mm.. I know about this paper, their Siggraph talk was during the session
of my talk. So I couldn't attend to their talk. :s

>
>>>> If you target broad diffusion of your application, you're stuck anyway
>>>> with graphic cards which do not support CUDA (while they still support
>>>> shaders).
>>> Why exactly do you believe that you should not "target broad diffusion"
>>> if you chose to program for a piece of hardware that has nearly 75% of
>>> the market share and why exactly do you believe that, instead of
>>> targeting the 75% market share, you are better off targeting a piece of
>>> hardware that doesn't even exist?
>> Where have you seen this 75% number ? CUDA is only for NVIDIA, and only
>> for "recent" graphic cards (>= 8 serie). I think the time where people
>> changed their hardware every year has passed.
>
> I've searched for the article that stated those numbers but couldn't find it. Instead I've found an
> article[1] that states a 73% market share compared to intel's 27%, which doesn't seem to be trustworthy.
> Other articles claim that NVidia's market share was, in Q1 2009, 67% [2]
>
> [1] http://www.streetinsider.com/Corporate+News/NVIDIA+%28NVDA%29+Has+73%25+Market+Share+in+GPU+vs.
> +Intels+27%25/3536299.html
> [2] http://www.neoseeker.com/news/10596-nvidia-market-share-shows-slight-improvement-over-q4-2008/

The first link forgets about ATI/AMD.
The second reports 67% of the total desktop market.

However, I can also find some other random links:
Wikipedia: http://en.wikipedia.org/wiki/Nvidia
=> reports 64.64%
http://www.edn.com/article/CA6654751.html
=> reports 30.6%
http://news.cnet.com/8301-13924_3-10152768-64.html
=> reports about 23%

This still doesn't tell about CUDA enabled graphic cards, which is still
a small subset of NVidia graphic cards. This does /not/ represent the
widest range of graphic cards present in our computers. And if you
restrict to doubles enabled graphic cards, it drops even much lower.

>>>> So, except if by "project" you mean a 6 month research project, I don't
>>>> find any argument to port tens/hundreds of thousands lines of code to
>>>> CUDA.
>>> And instead you believe it is better to base your "projects" on
>>> non-existing, non-proven hardware? That advice doesn't seem to be sound
>>> by far.
>> no, to base very long term projects on Fortran/C++ on CPUs and to base
>> projects for broad diffusion on pixel shaders (as it was done so far -
>> do you know of any public project done in CUDA?).
>
> As far as I know, autodesk supports CUDA in some of their products
> http://finance.yahoo.com/news/Autodesk-Leverages-NVIDIA-GPU-prnews-3312834929.html?x=0&.v=1

Autodesk targets professionals. Ok. However, I didn't see any
integration of CUDA in 3DSMax and Maya, both from Autodesk... they would
clearly benefit from higher performance!
But I'm wondering why even CryEngine 3 doesn't use CUDA...

Victor Eijkhout

unread,

Jul 28, 2009, 2:16:52 PM7/28/09

to

Rui Maciel <rui.m...@gmail.com> wrote:

> Other than that, isn't the "float Vs double" performance issue something
> that is present in virtually all platforms?

No. Most chips have a peak speed (achievable, almost, with the linpack
benchmark) of one double operation per clock tick, or 2 or 4 of them
depending on the number of available fp units. Some chips can double
that speed with single (don't ask me for details; I never use single
precision); others at least benefit from the fact that you can pump
twice as many singles through the memory bandwidth.

This is not at all the same the double penalty of GPUs, which have the
single prec real as basic computing unit.

Victor.
--
Victor Eijkhout -- eijkhout at tacc utexas edu

stefa...@yahoo.com

unread,

Aug 1, 2009, 7:16:19 PM8/1/09

to

On Jul 22, 10:05 am, fft1976 <fft1...@gmail.com> wrote:
> I'm curious: is anyone not switching all their new projects toCUDA
> and why? It seems to me that CPUs are quickly becoming irrelevant in
> scientific computing.

GPU is blazingly fast only and only for algorithms executing the same
instruction simultaneously upon massive array of numbers - aka SIMD
(Single Instruction Multiple Data). Multi-core Intel/AMD CPU is MIMD
machine (Multiple Instructions Multiple Data) therefore, the
algorithms with inherently diverged code path runs more efficiently on
MIMD and SIMD friendly algorithms are better fit for today's GPU. CUDA
interface pretends to be a MIMD but it is VERY missleading, once
threads of warp are out of sync the performance going down to the
speed of single thread - dramatic slowdown. Ray-tracing is one of
examples where MIMD architecture is required to run it efficiently.
The best ray-tracing multi-core CPU implementations outperforms
dramatically the best GPU ray-tracing for big number of triangles and
the bigger number the bigger gap. GPU fans used to compare single
core-2 with GeForce 285 ray tracing and as a rule the CPU code is just
a port of GPU shaders accomplished by authors of GPU ray-tracing
implementations however the optimal for CPU ray-tracing is not optimal
for GPU and vs. verse. Ray Tracing experts knows this GPU/SIMD
limitations too well so NVIDIA worry for its tomorrow 3D positions
where ray-tracing is going to rule. Larabee is going to be kind of
MIMD/SIMD hybrid closer to clean MIMD and G300 is kind of hybrid but
closer to SIMD. Actually even today G200 is not a pure SIMD it has
undependable several SIMD units but still it is really good only for
SIMD friendly algorithms.

Stefan