Nvidia Drivers Windows 10 64 Bits

0 views

Skip to first unread message

Emmanuel Des Meaux

unread,

Aug 4, 2024, 2:58:45 PM8/4/24

to guitistiben

Thisrelease includes support for the Video Decode andPresentation API for Unix-like systems (VDPAU) on most GeForce 8series and newer add-in cards, as well as motherboard chipsets withintegrated graphics that have PureVideo support based on theseGPUs.

Use of VDPAU requires installation of a separate wrapper librarycalled libvdpau. Please see your system distributor's documentationfor information on how to install this library. More informationcan be found at

Under Xinerama, VDPAU performs all operations other than displayon a single GPU. By default, the GPU associated with physical Xscreen 0 is used. The environment variableVDPAU_NVIDIA_XINERAMA_PHYSICAL_SCREEN may be used to specify aphysical screen number, and then VDPAU will operate on the GPUassociated with that physical screen. This variable should be setto the integer screen number as configured in the X configurationfile. The selected physical X screen must be driven by the NVIDIAdriver.

VDPAU is specified as a generic API - the choice of whichfeatures to support, and performance levels of those features, isleft up to individual implementations. The details of NVIDIA'simplementation are provided below.

Note that VdpBitmapSurfaceCreate's frequently_accessed parameterdirectly controls whether the bitmap data will be placed into videoRAM (VDP_TRUE) or system memory (VDP_FALSE). Note that if thebitmap data cannot be placed into video RAM when requested due toresource constraints, the implementation will automatically fallback to placing the data into system RAM.

The exact set of supported VdpDecoderProfile values depends onthe GPU in use. Appendix A,Supported NVIDIA GPU Products lists which GPUs supportwhich video feature set. An explanation of each video feature setmay be found below. When reading these lists, please note thatVC1_SIMPLE and VC1_MAIN may be referred to as WMV, WMV3, or WMV9 inother contexts. Partial acceleration means that VLD (bitstream)decoding is performed on the CPU, with the GPU performing IDCT andmotion compensation. Complete acceleration means that the GPUperforms all of VLD, IDCT, and motion compensation.

Partial acceleration. The NVIDIA VDPAU implementation does notsupport flexible macroblock ordering, arbitrary slice ordering,redundant slices, data partitioning, SI slices, or SP slices.Content utilizing these features may decode with visiblecorruption.

GPUs with VDPAU feature set E support an enhanced errorconcealment mode which provides more robust error handling whendecoding corrupted video streams. This error concealment is on bydefault, and may have a minor CPU performance impact in certainconfigurations. To disable this, set the environment variableVDPAU_NVIDIA_DISABLE_ERROR_CONCEALMENT to 1.

Note that all GPUs with VDPAU feature sets H and above, exceptGPUs with this note, support VDP_DECODER_PROFILE_VP9_PROFILE_2.Please check "VDPAU information" page in nvidia-settings for thelist of supported profiles.

Note that codec support may vary by product manufacturer andregion. For further details, please consult the documentationprovided by the Add-In Card manufacturer or system manufacturer ofyour product.

In order for either VDP_VIDEO_MIXER_FEATURE_DEINTERLACE_TEMPORALor VDP_VIDEO_MIXER_FEATURE_DEINTERLACE_TEMPORAL_SPATIAL to operatecorrectly, the application must supply at least 2 past and 1 futurefields to each VdpMixerRender call. If those fields are notprovided, the VdpMixer will fall back to bob de-interlacing.

Both regular de-interlacing and half-rate de-interlacing aresupported. Both have the same requirements in terms of the numberof past/future fields required. Both modes should produceequivalent results.

In order for VDP_VIDEO_MIXER_FEATURE_INVERSE_TELECINE to haveany effect, one of VDP_VIDEO_MIXER_FEATURE_DEINTERLACE_TEMPORAL orVDP_VIDEO_MIXER_FEATURE_DEINTERLACE_TEMPORAL_SPATIAL must berequested and enabled. Inverse telecine has the same requirement onthe minimum number of past/future fields that must be provided.Inverse telecine will not operate when "half-rate" de-interlacingis used.

While it is possible to apply de-interlacing algorithms toprogressive streams using the techniques outlined in the VDPAUdocumentation, NVIDIA does not recommend doing so. One is likely tointroduce more artifacts due to the inverse telecine process thanare removed by detection of bad edits etc.

The resolution of VdpTime is approximately 10 nanoseconds. Atsome arbitrary point during system startup, the initial value ofthis clock is synchronized to the system's real-time clock, asrepresented by nanoseconds since since Jan 1, 1970. However, noattempt is made to keep the two time-bases synchronized after thispoint. Divergence can and will occur.

Whenever a presentation queue is created, the driver determineswhether the overlay method may ever be used, based on systemconfiguration, and whether any other application already owns theoverlay. If overlay usage is potentially possible, the presentationqueue is marked as owning the overlay.

Whenever a surface is displayed, the driver determines whetherthe overlay method may be used for that frame, based on bothwhether the presentation queue owns the overlay, and the set ofoverlay usage limitations below. In other words, the driver mayswitch back and forth between overlay and blit methods dynamically.The most likely cause for dynamic switching is when a compositingmanager is enabled or disabled, and the window becomes redirectedor unredirected.

When TwinView is enabled, the blit method can only sync to oneof the display devices; this may cause tearing corruption on thedisplay device to which VDPAU is not syncing. You can use theenvironment variable VDPAU_NVIDIA_SYNC_DISPLAY_DEVICE to specifythe display device to which VDPAU should sync. You should set thisenvironment variable to the name of a display device, for example"CRT-1". Look for the line "Connected display device(s):" in your Xlog file for a list of the display devices present and their names.You may also find it useful to review Chapter 12, Configuring Multiple Display Devices on One XScreen "Configuring Twinview" and the section on EnsuringIdentical Mode Timings in Chapter 18,Programming Modes.

A VdpPresentationQueue allows a maximum of 8 surfaces to beQUEUED or VISIBLE at any one time. This limit is per presentationqueue. If this limit is exceeded, VdpPresentationQueueDisplayblocks until an entry in the presentation queue becomes free.

This documentation describes the capabilities of the NVIDIAVDPAU implementation. Hardware performance may vary significantlybetween cards. No guarantees are made, nor implied, that anyparticular combination of system configuration, GPU configuration,VDPAU feature set, VDPAU API usage, application, video stream,etc., will be able to decode streams at any particular framerate.

System performance (raw throughput, latency, and jittertolerance) can be affected by a variety of factors. One of thesefactors is how the client application uses VDPAU; i.e. the numberof surfaces allocated for buffering, order of operations, etc.

NVIDIA GPUs typically contain a number of separate hardwaremodules that are capable of performing different parts of the videodecode, post-processing, and display operations in parallel. Toobtain the best performance, the client application must attempt tokeep all these modules busy with work at all times.

Consider the decoding process. At a bare minimum, theapplication must allocate one video surface for each referenceframe that the stream can use (2 for MPEG or VC-1, a variablestream-dependent number for H.264) plus one surface for the picturecurrently being decoded. However, if this minimum number ofsurfaces is used, performance may be poor. This is becauseback-to-back decodes of non-reference frames will need to bewritten into the same video surface. This will require that decodeof the second frame wait until decode of the first has completed; apipeline stall.

Further, if the video surfaces are being read by the video mixerfor post-processing, and eventual display, this will "lock" thesurfaces for even longer, since the video mixer needs to read thedata from the surface, which prevents any subsequent decodeoperations from writing to the surface. Recall that when advancedde-interlacing techniques are used, a history of video surfacesmust be provided to the video mixer, thus necessitating that evenmore video surfaces be allocated.

Next, consider the display path via the presentation queue. Thisportion of the pipeline requires at least 2 output surfaces; onethat is being actively displayed by the presentation queue, and onebeing rendered to for subsequent display. As before, using thisminimum number of surfaces may not be optimal. For some videostreams, the hardware may only achieve real-time decoding onaverage, not for each individual frame. Using compositing APIs torender on-screen displays, graphical user interfaces, etc., mayintroduce extra jitter and latency into the pipeline. Similarly,system level issues such as scheduler algorithms and system loadmay prevent the CPU portion of the driver from operating for shortperiods of time. All of these potential issues may be solved byallocating more output surfaces, and queuing more than oneoutstanding output surface into the presentation queue.

The reason for using more than the minimum number of videosurfaces is to ensure that the decoding and post-processingpipeline is not stalled, and hence is kept busy for the maximumamount of time possible. In contrast, the reason for using morethan the minimum number of output surfaces is to hide jitter andlatency in various GPU and CPU operations.

The choice of exactly how many surfaces to allocate is aresource usage v.s. performance trade-off; Allocating more than theminimum number of surfaces will increase performance, but useproportionally more video RAM. This may cause allocations to fail.This could be particularly problematic on systems with a smallamount of video RAM. A stellar application would automaticallyadjust to this by initially allocating the bare minimum number ofsurfaces (failures being fatal), then attempting to allocate moreand more surfaces, provided the allocations kept succeeding, up tothe suggested limits above.