Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

glReadPixels performance

216 views
Skip to first unread message

Antonio Bleile

unread,
Aug 21, 2004, 9:56:25 AM8/21/04
to
Hi All,

I have written a test application which just glReadPixels
the RGB values from an offscreen buffer. I've run the test
application on various machines with different AGP speeds
different RAM and different CPU's. Surprisingly almost
all machines where evenly fast when it came to glReadPixels
(except the on which had AGPx2). All the tests have been
done under Linux with NVIDIA's latest driver. All tested
machines videoboards had a quadro4 something chipset.

So my benchmark tells me that the glReadPixel command
processes every pixel somehow on the GPU, that's IMHO
the only explanation (I observed that glReadPixels
performance scales linear with the number of pixels read
back). I know that glReadPixels scales and biases pixels
according to what was set by glPixelTransferi() (btw. setting
anything else than the defautl values kills the performance).
And I know that glReadPixels might swap bytes depending
on OS or machine architecture. I've read on this NG that you
should store GL_BGR under windows for performance reasons
but that makes no difference under Linux. So I really
wonder where the bottleneck is. Could it be a poor
implementation in the driver? How else (some X commands?) can I get
my RGB values out of a glContext? My problem is that
I can't read HDTV resolutions (e.g. 1920x1080) in realtime
(25fps) out of my video board. In a few words:
How can I improve my glReadPixels performance under Linux?

Thank you for your comments,

Toni

--
for mail, mirror: ed.lausivksa@elielb

fungus

unread,
Aug 21, 2004, 10:29:26 AM8/21/04
to
Antonio Bleile wrote:
> ....All the tests have been

> done under Linux with NVIDIA's latest driver. All tested
> machines videoboards had a quadro4 something chipset.
> Surprisingly almost all machines where evenly
> fast when it came to glReadPixels

Why the "surprise"? They're all similar...

> on OS or machine architecture. I've read on this NG that you
> should store GL_BGR under windows for performance reasons
> but that makes no difference under Linux. So I really
> wonder where the bottleneck is. Could it be a poor
> implementation in the driver?

Are you reading RGB or RGBA? The important thing
isn't the byte order it's matching what's in the
frame buffer to your destination.

> My problem is that I can't read HDTV resolutions
> (e.g. 1920x1080) in realtime (25fps) out of my
> video board.

The AGP bus (and those graphics cards) simply
isn't designed for that sort of read performance
(200Mb/second). AGP is all about writing data,
not reading it.

> How can I improve my glReadPixels performance
> under Linux?

Your requirements are well into the "professional"
category so you might need some "professional"
hardware to meet them. SGI makes some nice little
machines which are designed for this sort of work.

The up-and-coming PCI Express bus might also be
able to do it.


--
<\___/>
/ O O \
\_____/ FTB. For email, remove my socks.

Governments, like diapers, should be changed often,
and for the same reason.

Antonio Bleile

unread,
Aug 21, 2004, 10:41:07 AM8/21/04
to
fungus wrote:
> Antonio Bleile wrote:
> > ....All the tests have been
> > done under Linux with NVIDIA's latest driver. All tested
> > machines videoboards had a quadro4 something chipset.
> > Surprisingly almost all machines where evenly
> > fast when it came to glReadPixels
>
> Why the "surprise"? They're all similar..

I'm free to feel surprised whenever I want to.

>> on OS or machine architecture. I've read on this NG that you
>> should store GL_BGR under windows for performance reasons
>> but that makes no difference under Linux. So I really
>> wonder where the bottleneck is. Could it be a poor
>> implementation in the driver?
>
>
> Are you reading RGB or RGBA? The important thing
> isn't the byte order it's matching what's in the
> frame buffer to your destination.

GL_RGBA made no difference.

> > My problem is that I can't read HDTV resolutions
> > (e.g. 1920x1080) in realtime (25fps) out of my
> > video board.
>
> The AGP bus (and those graphics cards) simply
> isn't designed for that sort of read performance
> (200Mb/second). AGP is all about writing data,
> not reading it.

I ruled out the AGP as a bottleneck because AGPx4
made no differnce to AGPx8. But I might be wrong here.

> > How can I improve my glReadPixels performance
> > under Linux?
>
> Your requirements are well into the "professional"
> category so you might need some "professional"
> hardware to meet them. SGI makes some nice little
> machines which are designed for this sort of work.

You are probably right. But I really want to avoid
to return to SGI. And first of all I have to find
out where the problem is before I buy an overpriced
SGI box.

> The up-and-coming PCI Express bus might also be
> able to do it.

Yea maybe, but first of all I want to understand what's
going on in my pipeline.... So you suggest AGP read performance
is the problem?

Cheers,

fungus

unread,
Aug 21, 2004, 11:57:08 AM8/21/04
to
Antonio Bleile wrote:
> So you suggest AGP read performance
> is the problem?
>

Yes.

> I ruled out the AGP as a bottleneck because AGPx4
> made no differnce to AGPx8. But I might be wrong here.

That's the write speed, not the read speed.
Reading is generally much slower.

You don't say what performance you're actually
getting but I doubt it's even close to what you
need. If it's in a whole different ballpark then
no amount of fiddling is going to produce a miracle.

JB West

unread,
Aug 22, 2004, 12:42:14 PM8/22/04
to

"Antonio Bleile" <dont.use....@foo.com> wrote in message
news:cg7ka9$rb0$06$1...@news.t-online.com...

You should just about be able to do what you want. Read rates of 30--40
million pixels/sec
are possible, but this is at the extreme edge of this technology, and you'll
have to be
very, very careful about pixel format (probably BGR -- forget the "A"),
definitely unsigned bytes,
and you might need to look at pixel buffer objects or other extensions to
get what you need.
This is getting up in the range of 200 Mbytes/second through the AGP bus,
close to
the upper practical limit on the design. You need to have
professional-quality motherboards,
make sure you AGP BIOS setting are the best, your AGP aperture is the best,
you have
the very latest chipset drivers, and you pay attention to every detail in
your pipleline, and you
are using a professional (QUADRO or FIREGL), not consumer-grade card. If you
are doing
somthing with the results, you probably need to multi-thread your app and
run on an MP
machine.

Note that getting a sustained 200 MB/second from you disks is similarly (or
worse) hard
to do on PC's -- you really need to know what you are doing, and have
well-matched
hardware.

There's an old thread on this newsgroup; search for

"Major improvement in glReadPixels on ATI Radeon 9700 Pro"

For more details.

-jbw


Philipp Klaus Krause

unread,
Aug 22, 2004, 1:07:36 PM8/22/04
to
fungus schrieb:

> Antonio Bleile wrote:
>
>> So you suggest AGP read performance
>> is the problem?
>>
>
> Yes.
>

On the opengl.org forums concensus seemed to be that higher
readback rates are possible with AGP and the ATI and Nvidia
drivers werde the problem and 3Dlabs Wildcat VP were better
at glReadPixles. There are no free linux drivers (in neither
sense of free) for these cards though and I don't know
how fast it is on the third-party linux drivers.

Philipp Klaus Krause

fungus

unread,
Aug 22, 2004, 1:41:47 PM8/22/04
to
Philipp Klaus Krause wrote:
>
> On the opengl.org forums concensus seemed to be that higher
> readback rates are possible with AGP

Ok, I'm not an expert on this, but I though AGP
read speed was AGPx1, ie. 133Mb/sec. He's trying
to read 200Mb/sec.

> ...and the ATI and Nvidia
> drivers were the problem

I doubt that the driver writers are deliberately
crippling their drivers.

Maybe the chips are incapable of reading video
memory under DMA (there's no need for this on
consumer cards) so reads are being done pixel
-by-pixel in software.

Antonio Bleile

unread,
Aug 23, 2004, 4:15:27 AM8/23/04
to

The funny thing is that GL_RGB and GL_RGBA or even GL_BGR makes no
difference in performance. So I get a measured bandwidth from ~136MB/s
to ~180MB/s. Probably 4 bytes/pixel are transferred and the A component
is discarded after the read? Where are the hardware gurus out there
who can tell about AGP read bandwidth? I can't find any definite answer
on the web.

> and you might need to look at pixel buffer objects or other extensions to
> get what you need.

Well I doubt they can improve performance of the AGP bus (assuming it
is the bottleneck).

I guess I have to go for a PCIX board....

Regards and thank you,

Antonio Bleile

unread,
Aug 23, 2004, 4:54:19 AM8/23/04
to
fungus wrote:
> Philipp Klaus Krause wrote:
>
>>
>> On the opengl.org forums concensus seemed to be that higher
>> readback rates are possible with AGP
>
>
> Ok, I'm not an expert on this, but I though AGP
> read speed was AGPx1, ie. 133Mb/sec. He's trying
> to read 200Mb/sec.

I actually need 1920x1080x3 ~= 150BM. And I get
pretty close to it....

Regards and thank you,

Thorsten Lange

unread,
Aug 23, 2004, 4:59:01 AM8/23/04
to
Antonio Bleile <dont.use....@foo.com> writes:

> [...] All tested


> machines videoboards had a quadro4 something chipset.

[...]


> So I really wonder where the bottleneck is.

It's the hardware you are using. Until now nVidia hardware was hardly
optimized for read back operations.

> My problem is that
> I can't read HDTV resolutions (e.g. 1920x1080) in realtime
> (25fps) out of my video board. In a few words:
> How can I improve my glReadPixels performance under Linux?

Use a nv40-based graphics card - any 6800 board will do (even a
consumer board). The read back operations are much faster on this
hardware.

Bye,
Thorsten

Antonio Bleile

unread,
Aug 23, 2004, 6:35:20 AM8/23/04
to
Thorsten Lange wrote:
> Antonio Bleile <dont.use....@foo.com> writes:
>
>
>>[...] All tested
>>machines videoboards had a quadro4 something chipset.
>
> [...]
>
>>So I really wonder where the bottleneck is.
>
>
> It's the hardware you are using. Until now nVidia hardware was hardly
> optimized for read back operations.

Yea, gamers never use it, so obviously no optimization here...

>>My problem is that
>>I can't read HDTV resolutions (e.g. 1920x1080) in realtime
>>(25fps) out of my video board. In a few words:
>>How can I improve my glReadPixels performance under Linux?
>
>
> Use a nv40-based graphics card - any 6800 board will do (even a
> consumer board). The read back operations are much faster on this
> hardware.

I'm just concerned about PCIX + Linux + Nvidia. Anybody here
successfully running this config?

Thorsten Lange

unread,
Aug 23, 2004, 8:28:29 AM8/23/04
to
Antonio Bleile <dont.use....@foo.com> writes:

> I'm just concerned about PCIX

That's "PCI-Express" not "PCI-X", I believe? IMHO you don't need
PCI-Express for your application, a 6800-based AGP card should do the
trick.

> + Linux + Nvidia. Anybody here
> successfully running this config?

Apart from PCI-Express, this is exactly my configuration - i am
rendering a PAL video stream in real time and 1080p does not need
that much more bandwidth, if you consider that I have to read back
frames (@50fps) even if my output is field-wise.

Bye,
Thorsten

Antonio Bleile

unread,
Aug 24, 2004, 3:59:57 AM8/24/04
to
Thorsten Lange wrote:
> Antonio Bleile <dont.use....@foo.com> writes:
>
>
>>I'm just concerned about PCIX
>
>
> That's "PCI-Express" not "PCI-X", I believe? IMHO you don't need

I had to learn that PCIX and PCI-Express are not the same thing.
Unfortunately I have a mainboard with only PCIX and there are no PCIX
graphic boards to buy.

> PCI-Express for your application, a 6800-based AGP card should do the
> trick.
>
>
>>+ Linux + Nvidia. Anybody here
>>successfully running this config?
>
>
> Apart from PCI-Express, this is exactly my configuration - i am
> rendering a PAL video stream in real time and 1080p does not need
> that much more bandwidth, if you consider that I have to read back
> frames (@50fps) even if my output is field-wise.

To get this straight: You glReadPixels 1920x1080x3x50 bytes/s out
of your AGP bus? That would mean that you read ~297MB/s. That's
beyond the theoretic maximum bandwidth of AGP read operations
(according to what others posted here...).

I'm confused.

Regards,

Thorsten Lange

unread,
Aug 24, 2004, 7:36:59 AM8/24/04
to
Antonio Bleile <dont.use....@foo.com> writes:

> To get this straight: You glReadPixels 1920x1080x3x50 bytes/s out
> of your AGP bus?

I read 720x576x4x50 bytes/s out of my AGP bus, and I have good reasons
to believe that this is not the end - each frame takes approx. 1.5 ms
to read.

> That would mean that you read ~297MB/s

The figures above lead to a transfer rate in the 1 GB/s range.

And yes, that's just glReadPixels.

Bye,
Thorsten

Antonio Bleile

unread,
Aug 24, 2004, 11:59:51 AM8/24/04
to
Thorsten Lange wrote:
> Antonio Bleile <dont.use....@foo.com> writes:
>
>
>>To get this straight: You glReadPixels 1920x1080x3x50 bytes/s out
>>of your AGP bus?
>
>
> I read 720x576x4x50 bytes/s out of my AGP bus, and I have good reasons
> to believe that this is not the end - each frame takes approx. 1.5 ms
> to read.


720x576x4x50/s ~= 79MB/s. That's no magic.

Please try to read 1920x1080x3 in realtime....


>>That would mean that you read ~297MB/s
>
>
> The figures above lead to a transfer rate in the 1 GB/s range.

??

> And yes, that's just glReadPixels.

I just don't get it into my hed....

JB West

unread,
Aug 24, 2004, 12:08:57 PM8/24/04
to

"Antonio Bleile" <dont.use....@foo.com> wrote in message
news:cgeshu$vba$04$1...@news.t-online.com...

I believe the 200Mb/sec figure isn't correct for AGP8X.
Certainly not for downloads, and 8x is considerably different
than previous AGP bus specs. The Chromium newsgroup
report 600 Mbytes/sec readback rates (net) on ATI hardware.
Indeed, you might consider using Chromium in your application --
they have spent a lot of time on the readback SPU, and it's used
heavily (really heavily) in real-world applications.

-jbw


fungus

unread,
Aug 24, 2004, 12:17:27 PM8/24/04
to
Thorsten Lange wrote:
>
> I read 720x576x4x50 bytes/s out of my AGP bus
> ...

> The figures above lead to a transfer rate in the 1 GB/s range.
>

Did you use Windows software calculator for that?

My CASIO calculator says: 720x576x4x50 = 82,994,000.

Ok, the CASIO uses hardware not software for the
math, but you're still a bit on the high side.
Have you tried disabling the frame sync while you
do the calculation?

yooyo

unread,
Aug 24, 2004, 7:20:40 PM8/24/04
to
> I can't read HDTV resolutions (e.g. 1920x1080) in realtime
> (25fps) out of my video board. In a few words:
> How can I improve my glReadPixels performance under Linux?
>

Try to use PBO (pixel buffer objects) or PDR (pixel data range). Read
some docs about this extensions on
http://developer.nvidia.com/object/nvidia_opengl_specs.html
This extension provide async read/write from/to GPU memory and it
can help you to do some job on CPU while reading pixels from framebuffer.

yooyo


Thorsten Lange

unread,
Aug 25, 2004, 1:01:58 PM8/25/04
to
Antonio Bleile <dont.use....@foo.com> writes:

> 720x576x4x50/s ~= 79MB/s. That's no magic.
>
> Please try to read 1920x1080x3 in realtime....

Done.

Well, actually it was 1920x1080x4, since I cannot change my
application that easily to drop the key channel. ;-)

>>>That would mean that you read ~297MB/s
>> The figures above lead to a transfer rate in the 1 GB/s range.
>
> ??

I am rendering @ 50 fps, but that is an arbitrary limitation (which
has its cause in the fact that i am rendering a PAL stream). The
rendering itself is faster and the read back is *much* faster. The
actual read operation is done in 1.5 ms.

I might still be reading back @ 50 fps if the read back was using up
20 ms, but then there would be no time to render anything that could
be read. Thus, the glReadPixels has to be faster, and if it is faster
the transfer rate has to be higher than 79 MB/s.

That's why I did my transfer rate calculation with 666 fps (1.5 ms
frame time) - because that is the rate that has to be there if the
read is to be done in the 1.5 ms measured.

BTW: The 1920x1080x4 read back took 10 ms (approx. 800 MB/s). Your
1920x1080x3x25 app should run fine on an NV40, so I guess.

Bye,
Thorsten

Thorsten Lange

unread,
Aug 25, 2004, 1:03:30 PM8/25/04
to
fungus <open...@SOCKSartlum.com> writes:

> Did you use Windows software calculator for that?

No. I was using xcalc. ;-)

> My CASIO calculator says: 720x576x4x50 = 82,994,000.

Please see my follow-up to Antonio's posting for my explanation.

Bye,
Thorsten

Antonio Bleile

unread,
Sep 22, 2004, 6:06:50 AM9/22/04
to

Hi,

you seem to be right.... I finally got a GeForce 6800 GT
and it does read back ~890MB/s (AGPx8). Unbelievable.

Cheers and thanks for your comments,

fungus

unread,
Sep 22, 2004, 7:06:14 AM9/22/04
to
Antonio Bleile wrote:
>
> you seem to be right.... I finally got a GeForce 6800 GT
> and it does read back ~890MB/s (AGPx8). Unbelievable.
>


That's good to know. This sort of question comes
up a lot so now there's an answer.

I must do a quick benchmark of my Radeon X800
which arrived a couple of days ago.

0 new messages