Crc Error Nvidia

0 views

Skip to first unread message

Violet Schoneman

unread,

Aug 5, 2024, 12:10:50 AM8/5/24

to langpacymbi

WheneverI try to load Enscape, I get the error message below and then my Revit crashes. I was using the latest version of Revit 2021 (21.1.21.45), the latest version of Enscape (3.5.6+204048) I have gone through all the options following the steps on switching the used graphics card and downloading the newest GPU drivers with no luck. I have a NVIDIA T1000 8GB graphics card and I downloaded the latest driver (31.0.15.3799 dated 2023-12-01). I contacted my company's IT and they were also not able to solve the issue. I should also mention that I have even tried making a new Revit model with just a single wall to test if it works since most forums say the geometry may be too heavy, but it still doesn't work.

I found this thread ((Resolved) WARNING: Enscape crashes with the latest NVIDIA drivers) and thought perhaps there was again an issue between the NVIDIA driver and Enscape. One of my co-workers is using the NVIDIA driver from 2023-05-24 version 31.0.15.3598 with the NVIDIA T1000 graphics card and Enscape 3.5.0.107264. He is able to use Enscape with no issue. When I tried to download the older version of Enscape and the older driver I still had the same issue.

Regarding your co-worker using the same card, please also keep in mind that he is using an outdated version at this point which will not include all the latest features/bugfixes and just perchance they might have disabled ray-tracing as well already via the General Settings?

On top of that ideally also check out the linked articles at the bottom of the response regarding "Performance Considerations" and "Memory Considerations" in case you haven't yet as they include even more tips and details on how to circumvent this in the future.

Acer ConceptD Driver: 530.30.02 CUDA Version 12.1, NVIDIA GeForce RTX 3080 running Xorg. I step away from my desk for more than 15 minutes and I come back to a frozen system. Getting extremely frustrated.

I am still on 515.105.01 and can confirm this bug exists on this version, but only when i play on Steam.

So i was going to upgrade to 530 as @amrits said that this bug will be fixed, but it seems like it isnt yet, since someone after that post reported they are still experiencing these issues.

Would also love any news/to help fix this issue if I can. I seem to be having the exact symptoms of richard.decal, in that my Lenovo P15 Gen 2 running Ubuntu 20.04.6 has recently started experiencing external monitors connected via lenovo dock freezing. I was running 535, downgraded to 525 and still see the nvidia-modeset: ERROR: GPU:0: Idling display engine timed out error. It seems to have lessened/stopped the freezing, but there are other performance bugs present in this version that I would like to avoid.

I've tried a few things like purging and reinstalling the driver or using an older one , with no luck. I've also tried downloading and running the .run driver from the nvidia website but it failed to install.

I experienced this error when trying to reduce the load on my Nvidia card by selecting my internal graphics within nvidia-settings -> PRIME Profiles -> Select GPU -> Intel (not NVIDIA). This did not have the desired effect as it's not possible to use CUDA on the NVIDIA card without the NVIDIA profile enabled.

I had the same issue and a combination of these posts worked for me:

How to inspect the currently used Nvidia driver version and switch it to another alternative?

How do I know which NVIDIA driver I need?

I did the following:

apt-cache search nvidia grep -P '^nvidia-(driver-)?[0-9]+\s'

This gave me a list of several drivers, so I went installing one by one, until I got to the right one;

sudo apt-get install nvidia-driver-XXX

After that, I could see my nVidia GPU in both hardinfo, dkms status and ubuntu-drivers devices (something I couldn't before).

However, nvidia-settings and nvidia-smi didn't work (which meant that the driver wasn't properly loaded).

I also have this issue in my Windows 10 + Arch Linux. It seems to be a bug in nvidia driver, as discussed in this link. Some nvidia cards don't have USB Type-C interface, yet still try to load its i2c driver. Hopefully this problem will be fixed in the next version of the driver.

This kind of error in the scenario you described is often related to too much load on the system as a whole and render calls to OpenGL timing out and causing exceptions. One way to try and fix this is to make sure that whenever you are using apps like Adobe suite or Premiere pro your GPU is in no way throttled by Windows.

Yes, the new version of Premiere could cause this. With dedicated Hardware support in apps like Premiere Pro it occasionally happens that either the GPU driver or the new feature set causes problems. There is always very thorough testing of interoperability during Beta stages of both our drivers and Adobe software, but sometimes bugs get unnoticed because of last minute changes.

So at this stage the best suggestion I can give is to either revert CC to an earlier version, if that is possible, or the NVIDIA driver (You can find older drivers through the Advanced Search). And look out for any new release on either front. Chances are that the issue is fixed with the next versions.

Beside this I would also suggest contacting Adobe support, which you should be able to with an official Adobe CC license. They might have more information on maybe other users having the same issue or that they might already be in contact with NVIDIA directly. That last is something I am not able to check.

I also think that the problem is from Adobe, because I am testing the PC with different advanced software, such as Autodesk Maya and software for visual effects and I have never had any problems.

I will contact Adobe to try to resolve the problem.

Thank you very much for your availability and kindness.

I have just noticed this on kernel 5.14.1 on Ubuntu 21.04 as well. My guess is that if you install the mainline kernel 5.14.16 using the Ubuntu mainline GUI kernel installation tool (which installs kernels from the Ubuntu mainline kernel websote), then this might go away for you.

One think that I can surely say that for me it occurs if and only if monitor (monitors, I use 3, not sure if that is relevant) is being woken for sleep. I will try newer kernel, just saying that no errors in few first seconds of boor log are not indicative of whether this issue is there or not.

EDIT: @anon22950299 Found this other post suggesting that downgrading xserver to an older version solves the issue: [SOLVED] 5.15.8, something's off [not kernel problem] / Kernel & Hardware / Arch Linux Forums

[drm:drm_new_set_master [drm]] ERROR [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset ownership

This message does appear for me in 495.46 (in fact two at a time) (Optimus laptop with intel comet lake cpu and rtx 3060), but does not seem to cause any problems so I just ignore it.

This document explains what Xid messages are, and is intended to assist system administrators, developers, and FAEs in understanding the meaning behind these messages as an aid in analyzing and resolving GPU-related problems.

nvidia-smi is a command-line program that installs with the NVIDIA driver. It reports basic monitoring and configuration data about each GPU in the system. `nvidia-smi can list ECC error counts (Xid 48) and indicate if a power cable is unplugged (Xid 54), among other things. Please see the nvidia-smi man page for more information. Run nvidia-smi -q for basic output.

NVIDIA Data Center GPU Manager (DCGM) is a suite of tools for managing and monitoring NVIDIA datacenter GPUs in cluster environments. It includes active health monitoring, comprehensive diagnostics, system alerts and governance policies including power and clock management. DCGM diagnostics is a health checking tool that can check for basic GPU health, including the presence of ECC errors, PCIe problems, bandwidth issues, and general problems with running CUDA programs.

nvidia-bug-report.sh is a script that installs with the NVIDIA driver. It collects debug logs and command outputs from the system, including kernel logs and logs collected by the NVIDIA driver itself. The command should be run as root:

File a bug with NVIDIA, including output of the command nvidia-bug-report.sh. Refer to the document GPU Debug Guidelines for guidance on gathering additional information to provide to NVIDIA and troubleshooting common Xid causes.

This event is logged for general user application faults. Typically this is an out-of-bounds error where the user has walked past the end of an array, but could also be an illegal instruction, illegal register, or other case.

This event is logged when a fault is reported by the MMU, such as when an illegal address access is made by an applicable unit on the chip. Typically these are application-level bugs, but can also be driver bugs or hardware bugs.

This event is logged when a fault is reported by the DMA controller which manages the communication stream between the NVIDIA driver and the GPU over the PCI-E bus. These failures primarily involve quality issues on PCI, and are generally not caused by user application actions.

This event is logged when the user application aborts and the kernel driver tears down the GPU application running on the GPU. Control-C, GPU resets, sigkill are all examples where the application is aborted and this event is created.

On GPUs that support row remapping, starting with NVIDIA Ampere archtecture GPUs, these events provide details on row remapper activity. For more information row remapper Xids, refer to -gpu-mem-error-mgmt/index.html#row-remapping.

On earlier GPUs that support dynamic page retirement, these events provide details on dynamic page retirement activity. For more information on dynamic page retirement Xids, refer to -page-retirement/index.html.

This event may indicate a hardware failure with the link itself, or may indicate a problem with the device at the remote end of the link. For example, if a GPU fails, another GPU connected to it over NVLink may report an Xid 74 simply because the link went down as a result.