Dpmra.exe

0 views
Skip to first unread message

Karoline Oum

unread,
Aug 4, 2024, 6:59:46 PM8/4/24
to golfruzamud
Disablereal-time monitoring on the Protected Server - Disable the real-time monitoring of dpmra.exe, which is located in the folder C:\Program Files\Microsoft Data Protection Manager\DPM\bin. Also disable real-time monitoring for C:\Program Files\Microsoft Data Protection Manager\DPM folder on the protected server.

Configure anti-virus software to delete the infected files on protected servers and the DPM server - To prevent data corruption of replicas and recovery points, configure the antivirus software to delete infected files, rather than automatically cleaning or quarantining them. Automatic cleaning and quarantining might cause the antivirus software to modify files, making changes that DPM can't detect.


We use DPM 2019 to protect 16 Windows Servers, which for the last year has been running with no issues. I have encountered a problem recently where DPM reports inconsistent replicas for numerous servers (currently 8 servers with 11 datasources) which result in me running consistency checks which consistently fail. The failure is linked to the DPMRA.EXE crashing, which writes an Application Error to the Application Event Log (Event ID 1000) citing the DPMRA.exe as the Faulting Application and the KERNELBASE.dll as the Faulting Module Name (see screen shot below):


This happened at the beginning of last month for the first time and I copied the dpmra.exe from another DPM server which failed to work, and after a long period of troubleshooting, I eventually restored the DPM Database which appeared to resolve the issue, but I am reluctant to go down this path again as it has now re-occurred.


I have tried throttling the DPM clients to 70Mbps and then to 50Mbps, but this hasn't helped (DPM has a 10Gb NIC, but servers have a 1Gb connection) and all clients are on the local LAN. It coincides with Windows update week, but I'm not convinced this is the issue as in both cases it started happening a few days after the servers were rebooted (1 to 4), and I would have thought it would have happened much immediately.


The DPMRACurr.errlog files are not much help as they log so many benign errors, it's really hard to work out what is an issue and what isn't. What I have seen recurring in the logs are entries as shown below:


I have looked online and can find nothing relating to this - there are also multiple DPMRACurr.errlog..Crash files in the log directory as well for each time the dpmra.exe crashes (when it does, it does not crash the DPM console, it just fails the job.


Does anyone have any suggestions or have seen the same issue before - I am going around in circles at the moment and need to sort the issue rather than just restoring the database as it seems to be a recurring incident?


As an update to this, I have stopped all jobs from running / being able to run during the troubleshooting process and used PowerShell to view the logs in real time on both the DPM server and the clients while running a consistency check one at a time on the previously failed jobs. From this process I have noticed that the Watson error coincides with the DPMRA.exe related Application Error in the event log, however the backup job does not fail immediately. After cancelling the failed job, I then removed the datasource from it's Protection Group, importantly ensuring no disk or tape data was retained, and continued this process until all offending datasources have been identified and removed (I had 2 which needed removing). After this the jobs were all able to be successfully resolved and once all the failed jobs had run successfully I could add the two problem datasources back in to their Protection Groups.


This has worked around the problem for me and I now have a functioning DPM server, but as to why the datasources have caused this problem, I am still no wiser. This is the second time in 2 months this has happened, and I don't know if it was the same datasources that caused the previous problem (a DPM database restore was how I got around it last time), but I will monitor this and check it next month to see if it happens again.


If anyone has any ideas as to why this may be occurring, or even how to investigate the cause further, that would be great. Also please let me know if anyone else experiences this problem. I have worked with DPM for a number of years now, and since DPM 2010, not had many issues - certainly not ones as disruptive as this. I do find the DPM logs not the most intuitive as they seem to add a lot of warnings and errors as part of normal behaviour which makes using them for troubleshooting quite hard at times. They are also not the easiest to decipher which adds further frustrations when using them for general troubleshooting.


I had the same thing happen, and began happening since DPM 2019 UR4.



I thought I isolated it down to certain replicas having files/folders with paths longer than 250 characters.



Usually stopping protection of those members mitigates the issue, and putting those members back into protection will cause DPMRA.exe to crash every 15 minutes (according to the Event Viewer's Application Logs).



I've also had this happen when using one DPM server for secondary protection of a primary server's protected items (or in some cases, just the database of the primary server).



Either way, it's bad design to have a backup destination crash your entire backup product.


Our database contains 9 different files for filename dpmra.exe . You can also check most distributed file variants with name dpmra.exe. This files most often belongs to product Microsoft System Center Data Protection Manager 2007. and were most often developed by company Microsoft Corporation. This files most often have description DPM REPLICATION AGENT. This is executable file. You can find it running in Task Manager as the process dpmra.exe.


I can confirm that there are no errors on the file system or file corruption as reported by Chkdsk and also as we can open/write to the same files both, from original disk location and DPM disk backup.


Now my question: Is it possible to create a Manual replica (for example with the help of USB external drives) for dedup enabled volumes or does the replica creation process have to be triggered from within DPM console so that all data will be transfered over the Network?


I did tell it to retain the old data. This may be (Hostname) listed below in the database. The new PG created was as hostname.domainname. Even after removing the PG this entry remains in the database listed below.


trying to use DPM 2012 R2 RU2 to backup the VM's that run on a 4 node Server 2012 R2 Hyper-V cluster. The Guests are stores on SMB 3.0 Shares on a 2 node Server 2012 R2 Scale Out File Server cluster. when I go to expand the SOFS group in DPM I get the error:


We are protecting a deduplicated volume of a few terabytes to a primary DPM server, and then to a secondary one. I have noticed that the secondary DPM Server has allocated and used significantly more disk for the replica volume than has the primary. Why would this be? it almost looks like the secondary has protected the data in an unoptimised state (I have checked that the dedupe role is installed on the DPM servers).


I have DPM running at our main office. We backup a remote office server with 1.5TB of data over a VPN connection. It was working fine until our local storage pool drives suffered physical failure. All fixed and had to recreate protection groups. Initial backup of remote server was slow, and when it finished it said it was inconsistent. I ran a consistency check and it took a really long time and got interrupted several times. Then it finally looked like it finished with no errors. I tried to create a replication point and it failed saying it was not consistent. I started the consistency check over again. This time it ran and it counted up to 2,034.12MB in the Data Transferred column. It has now been stuck at that number for 30 hours but still lists as in progress.


In looking at dpmra.exe on using process monitor the protected computer, when data was being transferred I could see file name in the logs. Now that it has stopped counting all i have seen for the last 30 hours is registry open, close and query keys. A sample from the process monitor log is below. Anyone know if this is normal, or if something is stuck, or if there is some other problem?


I am having trouble getting my file backup to run after an upgrade from DPM 2010 to DPM 2012 SP1. Due to the upgrade every single protected data source entered into consistency check, this is expected behaviour, and most of these cc has finished without problems. I have a single agent left, that constantly fails. Almost immediately after a cc is started the agent throws the following event in the application log of the client server.


We have a protected 2012 R2 server that has Data Deduplication enabled. DPM tried to create a recovery point and ran out of disk space. When I check to see what happened, the job had transferred over 43GB of data. This is way outside of the norm for this protection group. Upon investigation I found that a scheduled weekly defrag had run. I assume this to be the source of the churn. Can anyone provide an guidance on the interaction of the these processes? The scheduled defrag was enabled by default. I assume we'll need to disable this? I have never defragged my VM's as it would cause massive churn for the VM backup's in DPM. But this was not a VM backup, just a backup of files on a deduplicated volume.

3a8082e126
Reply all
Reply to author
Forward
0 new messages