I have written a test application which just glReadPixels
the RGB values from an offscreen buffer. I've run the test
application on various machines with different AGP speeds
different RAM and different CPU's. Surprisingly almost
all machines where evenly fast when it came to glReadPixels
(except the on which had AGPx2). All the tests have been
done under Linux with NVIDIA's latest driver. All tested
machines videoboards had a quadro4 something chipset.
So my benchmark tells me that the glReadPixel command
processes every pixel somehow on the GPU, that's IMHO
the only explanation (I observed that glReadPixels
performance scales linear with the number of pixels read
back). I know that glReadPixels scales and biases pixels
according to what was set by glPixelTransferi() (btw. setting
anything else than the defautl values kills the performance).
And I know that glReadPixels might swap bytes depending
on OS or machine architecture. I've read on this NG that you
should store GL_BGR under windows for performance reasons
but that makes no difference under Linux. So I really
wonder where the bottleneck is. Could it be a poor
implementation in the driver? How else (some X commands?) can I get
my RGB values out of a glContext? My problem is that
I can't read HDTV resolutions (e.g. 1920x1080) in realtime
(25fps) out of my video board. In a few words:
How can I improve my glReadPixels performance under Linux?
Thank you for your comments,
Toni
--
for mail, mirror: ed.lausivksa@elielb
Why the "surprise"? They're all similar...
> on OS or machine architecture. I've read on this NG that you
> should store GL_BGR under windows for performance reasons
> but that makes no difference under Linux. So I really
> wonder where the bottleneck is. Could it be a poor
> implementation in the driver?
Are you reading RGB or RGBA? The important thing
isn't the byte order it's matching what's in the
frame buffer to your destination.
> My problem is that I can't read HDTV resolutions
> (e.g. 1920x1080) in realtime (25fps) out of my
> video board.
The AGP bus (and those graphics cards) simply
isn't designed for that sort of read performance
(200Mb/second). AGP is all about writing data,
not reading it.
> How can I improve my glReadPixels performance
> under Linux?
Your requirements are well into the "professional"
category so you might need some "professional"
hardware to meet them. SGI makes some nice little
machines which are designed for this sort of work.
The up-and-coming PCI Express bus might also be
able to do it.
--
<\___/>
/ O O \
\_____/ FTB. For email, remove my socks.
Governments, like diapers, should be changed often,
and for the same reason.
I'm free to feel surprised whenever I want to.
>> on OS or machine architecture. I've read on this NG that you
>> should store GL_BGR under windows for performance reasons
>> but that makes no difference under Linux. So I really
>> wonder where the bottleneck is. Could it be a poor
>> implementation in the driver?
>
>
> Are you reading RGB or RGBA? The important thing
> isn't the byte order it's matching what's in the
> frame buffer to your destination.
GL_RGBA made no difference.
> > My problem is that I can't read HDTV resolutions
> > (e.g. 1920x1080) in realtime (25fps) out of my
> > video board.
>
> The AGP bus (and those graphics cards) simply
> isn't designed for that sort of read performance
> (200Mb/second). AGP is all about writing data,
> not reading it.
I ruled out the AGP as a bottleneck because AGPx4
made no differnce to AGPx8. But I might be wrong here.
> > How can I improve my glReadPixels performance
> > under Linux?
>
> Your requirements are well into the "professional"
> category so you might need some "professional"
> hardware to meet them. SGI makes some nice little
> machines which are designed for this sort of work.
You are probably right. But I really want to avoid
to return to SGI. And first of all I have to find
out where the problem is before I buy an overpriced
SGI box.
> The up-and-coming PCI Express bus might also be
> able to do it.
Yea maybe, but first of all I want to understand what's
going on in my pipeline.... So you suggest AGP read performance
is the problem?
Cheers,
Yes.
> I ruled out the AGP as a bottleneck because AGPx4
> made no differnce to AGPx8. But I might be wrong here.
That's the write speed, not the read speed.
Reading is generally much slower.
You don't say what performance you're actually
getting but I doubt it's even close to what you
need. If it's in a whole different ballpark then
no amount of fiddling is going to produce a miracle.
You should just about be able to do what you want. Read rates of 30--40
million pixels/sec
are possible, but this is at the extreme edge of this technology, and you'll
have to be
very, very careful about pixel format (probably BGR -- forget the "A"),
definitely unsigned bytes,
and you might need to look at pixel buffer objects or other extensions to
get what you need.
This is getting up in the range of 200 Mbytes/second through the AGP bus,
close to
the upper practical limit on the design. You need to have
professional-quality motherboards,
make sure you AGP BIOS setting are the best, your AGP aperture is the best,
you have
the very latest chipset drivers, and you pay attention to every detail in
your pipleline, and you
are using a professional (QUADRO or FIREGL), not consumer-grade card. If you
are doing
somthing with the results, you probably need to multi-thread your app and
run on an MP
machine.
Note that getting a sustained 200 MB/second from you disks is similarly (or
worse) hard
to do on PC's -- you really need to know what you are doing, and have
well-matched
hardware.
There's an old thread on this newsgroup; search for
"Major improvement in glReadPixels on ATI Radeon 9700 Pro"
For more details.
-jbw
On the opengl.org forums concensus seemed to be that higher
readback rates are possible with AGP and the ATI and Nvidia
drivers werde the problem and 3Dlabs Wildcat VP were better
at glReadPixles. There are no free linux drivers (in neither
sense of free) for these cards though and I don't know
how fast it is on the third-party linux drivers.
Philipp Klaus Krause
Ok, I'm not an expert on this, but I though AGP
read speed was AGPx1, ie. 133Mb/sec. He's trying
to read 200Mb/sec.
> ...and the ATI and Nvidia
> drivers were the problem
I doubt that the driver writers are deliberately
crippling their drivers.
Maybe the chips are incapable of reading video
memory under DMA (there's no need for this on
consumer cards) so reads are being done pixel
-by-pixel in software.
The funny thing is that GL_RGB and GL_RGBA or even GL_BGR makes no
difference in performance. So I get a measured bandwidth from ~136MB/s
to ~180MB/s. Probably 4 bytes/pixel are transferred and the A component
is discarded after the read? Where are the hardware gurus out there
who can tell about AGP read bandwidth? I can't find any definite answer
on the web.
> and you might need to look at pixel buffer objects or other extensions to
> get what you need.
Well I doubt they can improve performance of the AGP bus (assuming it
is the bottleneck).
I guess I have to go for a PCIX board....
Regards and thank you,
I actually need 1920x1080x3 ~= 150BM. And I get
pretty close to it....
Regards and thank you,
> [...] All tested
> machines videoboards had a quadro4 something chipset.
[...]
> So I really wonder where the bottleneck is.
It's the hardware you are using. Until now nVidia hardware was hardly
optimized for read back operations.
> My problem is that
> I can't read HDTV resolutions (e.g. 1920x1080) in realtime
> (25fps) out of my video board. In a few words:
> How can I improve my glReadPixels performance under Linux?
Use a nv40-based graphics card - any 6800 board will do (even a
consumer board). The read back operations are much faster on this
hardware.
Bye,
Thorsten
Yea, gamers never use it, so obviously no optimization here...
>>My problem is that
>>I can't read HDTV resolutions (e.g. 1920x1080) in realtime
>>(25fps) out of my video board. In a few words:
>>How can I improve my glReadPixels performance under Linux?
>
>
> Use a nv40-based graphics card - any 6800 board will do (even a
> consumer board). The read back operations are much faster on this
> hardware.
I'm just concerned about PCIX + Linux + Nvidia. Anybody here
successfully running this config?
> I'm just concerned about PCIX
That's "PCI-Express" not "PCI-X", I believe? IMHO you don't need
PCI-Express for your application, a 6800-based AGP card should do the
trick.
> + Linux + Nvidia. Anybody here
> successfully running this config?
Apart from PCI-Express, this is exactly my configuration - i am
rendering a PAL video stream in real time and 1080p does not need
that much more bandwidth, if you consider that I have to read back
frames (@50fps) even if my output is field-wise.
Bye,
Thorsten
I had to learn that PCIX and PCI-Express are not the same thing.
Unfortunately I have a mainboard with only PCIX and there are no PCIX
graphic boards to buy.
> PCI-Express for your application, a 6800-based AGP card should do the
> trick.
>
>
>>+ Linux + Nvidia. Anybody here
>>successfully running this config?
>
>
> Apart from PCI-Express, this is exactly my configuration - i am
> rendering a PAL video stream in real time and 1080p does not need
> that much more bandwidth, if you consider that I have to read back
> frames (@50fps) even if my output is field-wise.
To get this straight: You glReadPixels 1920x1080x3x50 bytes/s out
of your AGP bus? That would mean that you read ~297MB/s. That's
beyond the theoretic maximum bandwidth of AGP read operations
(according to what others posted here...).
I'm confused.
Regards,
> To get this straight: You glReadPixels 1920x1080x3x50 bytes/s out
> of your AGP bus?
I read 720x576x4x50 bytes/s out of my AGP bus, and I have good reasons
to believe that this is not the end - each frame takes approx. 1.5 ms
to read.
> That would mean that you read ~297MB/s
The figures above lead to a transfer rate in the 1 GB/s range.
And yes, that's just glReadPixels.
Bye,
Thorsten
720x576x4x50/s ~= 79MB/s. That's no magic.
Please try to read 1920x1080x3 in realtime....
>>That would mean that you read ~297MB/s
>
>
> The figures above lead to a transfer rate in the 1 GB/s range.
??
> And yes, that's just glReadPixels.
I just don't get it into my hed....
I believe the 200Mb/sec figure isn't correct for AGP8X.
Certainly not for downloads, and 8x is considerably different
than previous AGP bus specs. The Chromium newsgroup
report 600 Mbytes/sec readback rates (net) on ATI hardware.
Indeed, you might consider using Chromium in your application --
they have spent a lot of time on the readback SPU, and it's used
heavily (really heavily) in real-world applications.
-jbw
Did you use Windows software calculator for that?
My CASIO calculator says: 720x576x4x50 = 82,994,000.
Ok, the CASIO uses hardware not software for the
math, but you're still a bit on the high side.
Have you tried disabling the frame sync while you
do the calculation?
Try to use PBO (pixel buffer objects) or PDR (pixel data range). Read
some docs about this extensions on
http://developer.nvidia.com/object/nvidia_opengl_specs.html
This extension provide async read/write from/to GPU memory and it
can help you to do some job on CPU while reading pixels from framebuffer.
yooyo
> 720x576x4x50/s ~= 79MB/s. That's no magic.
>
> Please try to read 1920x1080x3 in realtime....
Done.
Well, actually it was 1920x1080x4, since I cannot change my
application that easily to drop the key channel. ;-)
>>>That would mean that you read ~297MB/s
>> The figures above lead to a transfer rate in the 1 GB/s range.
>
> ??
I am rendering @ 50 fps, but that is an arbitrary limitation (which
has its cause in the fact that i am rendering a PAL stream). The
rendering itself is faster and the read back is *much* faster. The
actual read operation is done in 1.5 ms.
I might still be reading back @ 50 fps if the read back was using up
20 ms, but then there would be no time to render anything that could
be read. Thus, the glReadPixels has to be faster, and if it is faster
the transfer rate has to be higher than 79 MB/s.
That's why I did my transfer rate calculation with 666 fps (1.5 ms
frame time) - because that is the rate that has to be there if the
read is to be done in the 1.5 ms measured.
BTW: The 1920x1080x4 read back took 10 ms (approx. 800 MB/s). Your
1920x1080x3x25 app should run fine on an NV40, so I guess.
Bye,
Thorsten
> Did you use Windows software calculator for that?
No. I was using xcalc. ;-)
> My CASIO calculator says: 720x576x4x50 = 82,994,000.
Please see my follow-up to Antonio's posting for my explanation.
Bye,
Thorsten
Hi,
you seem to be right.... I finally got a GeForce 6800 GT
and it does read back ~890MB/s (AGPx8). Unbelievable.
Cheers and thanks for your comments,
That's good to know. This sort of question comes
up a lot so now there's an answer.
I must do a quick benchmark of my Radeon X800
which arrived a couple of days ago.