MD5 digest not same File as attributes

159 views
Skip to first unread message

Samuel

unread,
Feb 8, 2024, 9:33:03 AM2/8/24
to bareos-users

Hi,

I occasionally run into the following warning during copy and migration jobs:
Warning: dird/catreq.cc:608 MD5 digest not same File=103468 as attributes=103467

The two indices change with each occurrence, but the file and attribute index are always off by one.

Does anybody know what this is about?

Thanks and best regards,
Samuel

Sebastian Sura

unread,
Feb 20, 2024, 6:18:03 AM2/20/24
to bareos...@googlegroups.com

Hi Samuel,

would it be possible to run bscan with the bsr of that job and the -r/--list-records option set ?
The output should look like

bscan: stored/bscan.cc:494-0 Record: SessId=1 SessTim=1708427201 FileIndex=-4 Stream=1 len=164
bscan: stored/bscan.cc:494-0 Record: SessId=1 SessTim=1708427201 FileIndex=1 Stream=1 len=182
bscan: stored/bscan.cc:494-0 Record: SessId=1 SessTim=1708427201 FileIndex=1 Stream=29 len=4250
bscan: stored/bscan.cc:494-0 Record: SessId=1 SessTim=1708427201 FileIndex=1 Stream=1998 len=80
bscan: stored/bscan.cc:494-0 Record: SessId=1 SessTim=1708427201 FileIndex=1 Stream=40 len=16
bscan: stored/bscan.cc:494-0 Record: SessId=1 SessTim=1708427201 FileIndex=2 Stream=1 len=183
bscan: stored/bscan.cc:494-0 Record: SessId=1 SessTim=1708427201 FileIndex=2 Stream=29 len=10008
bscan: stored/bscan.cc:494-0 Record: SessId=1 SessTim=1708427201 FileIndex=2 Stream=1998 len=80
bscan: stored/bscan.cc:494-0 Record: SessId=1 SessTim=1708427201 FileIndex=2 Stream=40 len=16

This would help us greatly in trying to debug this issue.

Kind Regards
Sebastian Sura


Am 08.02.24 um 15:33 schrieb 'Samuel' via bareos-users:
nto the following warning during copy and migration jobs:
Warning: dird/catreq.cc:608 MD5 digest not sa
-- 
 Sebastian Sura                  sebasti...@bareos.com
 Bareos GmbH & Co. KG            Phone: +49 221 630693-0
 https://www.bareos.com
 Sitz der Gesellschaft: Köln | Amtsgericht Köln: HRA 29646
 Komplementär: Bareos Verwaltungs-GmbH
 Geschäftsführer: Stephan Dühr, Jörg Steffens, Philipp Storz

Samuel

unread,
Feb 23, 2024, 6:56:17 AM2/23/24
to bareos-users
Hi Sebastian,

After looking at the logs some more I noticed that all jobs with this warning have in common that they're using compression (LZ4) and autoxflate on replication.
Perhaps this combination is still not entirely fixed yet. I'll try disabling autoxflate in the coming days.

The entire result of bscan is in the attachment. However here's the last part of it where bscan seems to abort (?):
$ sudo -u bareos bscan -b md5_digest_error_copy_only.bsr --list-records TapeDevice2 2>&1 | tee records.txt
...
bscan: stored/bscan.cc:494-0 Record: SessId=3480 SessTim=1707222936 FileIndex=23 Stream=2 len=65536
bscan: stored/bscan.cc:494-0 Record: SessId=3480 SessTim=1707222936 FileIndex=23 Stream=2 len=65536
bscan: stored/bscan.cc:494-0 Record: SessId=3480 SessTim=1707222936 FileIndex=23 Stream=2 len=65536
bscan: stored/bscan.cc:494-0 Record: SessId=3480 SessTim=1707222936 FileIndex=23 Stream=2 len=22528
bscan: stored/bscan.cc:494-0 Record: SessId=3480 SessTim=1707222936 FileIndex=23 Stream=3 len=16
bscan: stored/bscan.cc:494-0 Record: SessId=3480 SessTim=1707222936 FileIndex=24 Stream=1 len=100
bscan: stored/bscan.cc:494-0 Record: SessId=3480 SessTim=1707222936 FileIndex=25 Stream=1 len=83
bscan: stored/bscan.cc:494-0 Record: SessId=3480 SessTim=1707222936 FileIndex=-5 Stream=94145 len=193
bscan: stored/bscan.cc:681-0 Could not find SessId=3480 SessTime=1707222936 for EOS record.
Records would have been added or updated in the catalog:
      0 Media
      1 Pool
      0 Job
      0 File
      0 RestoreObject
23-Feb 12:05 bscan JobId 0: Releasing device "TapeDevice2" (/dev/tape/by-id/scsi-35000e111c71ac0bf-nst).

I also tried restoring the copy which ends in error.

Most files are restored successfully except for one file which happens to have FileIndex=2 (corresponding to the `attributes=2` in the warning?):
Enter SQL query: select path,name from file,path where jobid=94146 and fileindex=2 and file.pathid=path.pathid;
+-----------------------------------+--------------------------+
| path                              | name                     |
+-----------------------------------+--------------------------+
| /var/backup/old/2024-02-07_23:30/ | backup_www_1100CC.tar.gz |
+-----------------------------------+--------------------------+

A full restore of the original non-copy backup jobs works fine, as do restores of other copy jobs on tape of filesets that don't use compression.
The original job that was copied (93682) doesn't exist anymore; I think it was an always-incremental consolidated incremental backup which got consolidated again the following day.

Best regards,
Samuel
records.txt.zip
md5_digest_error_copy_only.bsr
copy_job_log.txt
restore_job_log.txt

Sebastian Sura

unread,
Feb 23, 2024, 7:15:15 AM2/23/24
to bareos...@googlegroups.com

Hi Samuel

thanks for the gathering this info.  Ill look into the bscan issue as well though i think this might be a known issue.
Regardless, regarding your actual issue: The bscan output shows that File 3 is missing its attributes (it has no stream 3).
This confuses the director as it never got told that file 3 now started getting backed up and this is why you get the warning
message from the director.

The same is happening during the restore: the filedaemon never got told that now a new file was started (because it never got the attribute stream)
so it basically merged both File 2 and File 3 into one file.  Even with this in mind, i think the restore should have caught that and issues a warning.
Ill look into why this did not happen.

If you add up all the data records (stream=2) with fileid=2 and fileid=3, you will get 738212473 bytes, which is exactly what the filedaemon reported
as size for file 2.  Since the log contains the size it expected i imagine that you could manually restore file 3 by splitting file 2 into two.

Can you check which file fileid 3 corresponds to ? Is it a special kind of file or just a normal one ?

Ill try to see if i can reproduce your issue in the copy system test.  Do you do a local copy (so copy to the same sd) or a remote one?

Kind Regards

Sebastian Sura

Am 23.02.24 um 12:56 schrieb 'Samuel' via bareos-users:
--
You received this message because you are subscribed to the Google Groups "bareos-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bareos-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bareos-users/e766e8c6-5b58-424f-816f-f7c5d7252180n%40googlegroups.com.

Sebastian Sura

unread,
Feb 23, 2024, 7:16:32 AM2/23/24
to bareos...@googlegroups.com

I meant to say File 3 has no stream 1, sorry for the confusion!

Am 23.02.24 um 13:15 schrieb Sebastian Sura:

Samuel

unread,
Feb 23, 2024, 8:02:28 AM2/23/24
to bareos-users
Thanks for looking into it!

Hm, there seems to be no file with fileindex=3 in the database for the copy:
Enter SQL query: select * from file where jobid=94146 and fileindex=3;
No results to list.

All files that are being backed up are just regular files, nothing special.

The copy runs locally on the director's host from its SD to the same SD.

Here's file 2 and 3 of today's consolidated incremental backup which will be copied tomorrow and most likely lead to the same warning again as it has for the last few days (for this specific fileset it always warns about `File=3 attributes=2`):

Enter SQL query: select *, decode_lstat(lstat) from file where jobid=94511 and fileindex=2;
+-------------+-----------+-------+---------+----------+--------+--------+--------+----------------------------------------------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------+
| fileid      | fileindex | jobid | pathid  | deltaseq | markid | fhinfo | fhnode | lstat                                                          | md5                    | name                     | decode_lstat                                                                                 |
+-------------+-----------+-------+---------+----------+--------+--------+--------+----------------------------------------------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------+
| 484,575,603 |         2 | 94511 | 910,901 |        0 |      0 | 0      | 0      | P0D CgA4 IGk B A i A sADp5 BAA FgAo BlxAt1 BlxASF BlxVYH A A d | D/VouduY5TF4KFacjRE7Hw | backup_www_1100CC.tar.gz | (64771,655416,33188,1,0,34,0,738212473,4096,1441832,1707346805,1707345029,1707431431,0,0,29) |
+-------------+-----------+-------+---------+----------+--------+--------+--------+----------------------------------------------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------+

Enter SQL query: select *, decode_lstat(lstat) from file where jobid=94511 and fileindex=3;
+-------------+-----------+-------+---------+----------+--------+--------+--------+---------------------------------------------------------------+------------------------+-----------------------------+-------------------------------------------------------------------------------------------+
| fileid      | fileindex | jobid | pathid  | deltaseq | markid | fhinfo | fhnode | lstat                                                         | md5                    | name                        | decode_lstat                                                                              |
+-------------+-----------+-------+---------+----------+--------+--------+--------+---------------------------------------------------------------+------------------------+-----------------------------+-------------------------------------------------------------------------------------------+
| 484,575,604 |         3 | 94511 | 910,901 |        0 |      0 | 0      | 0      | P0D CgA5 IGk B A i A BA9gA BAA IHw BlxAt8 BlxASH BlxVYH A A d | J9h+OdR/6XCqhqpJdJvYtw | backup_databases_1100CC.tar | (64771,655417,33188,1,0,34,0,17029120,4096,33264,1707346812,1707345031,1707431431,0,0,29) |
+-------------+-----------+-------+---------+----------+--------+--------+--------+---------------------------------------------------------------+------------------------+-----------------------------+-------------------------------------------------------------------------------------------+


Restore of this job (jobid=94511) works just fine:

23-Feb 13:44 bareos-sd JobId 94559: Releasing device "FileDevice-ReadOnly-0005" (/backup_1/bareos).
23-Feb 13:44 bareos-dir JobId 94559: Max configured use duration=82,800 sec. exceeded. Marking Volume "AI-Consolidated-1771" as Used.
23-Feb 13:44 bareos-dir JobId 94559: Bareos bareos-dir 23.0.2~pre32.0a0e55739 (31Jan24):
  Build OS:               Ubuntu 20.04.5 LTS
  JobId:                  94559
  Job:                    Restore-Files.2024-02-23_13.43.47_38
  Restore Client:         "bareos-fd" 23.0.2~pre32.0a0e55739 (31Jan24) Ubuntu 20.04.5 LTS,ubuntu
  Start time:             23-Feb-2024 13:43:49
  End time:               23-Feb-2024 13:44:20
  Elapsed time:           31 secs
  Files Expected:         33
  Files Restored:         33
  Bytes Restored:         9,143,514,060
  Rate:                   294952.1 KB/s
  FD Errors:              0
  FD termination status:  OK
  SD termination status:  OK
  Bareos binary info:     Bareos community build (UNSUPPORTED): Get professional support from https://www.bareos.com
  Job triggered by:       User
  Termination:            Restore OK

I'll also try restoring its copy from tape once it has been copied.

Best regards,
Samuel

Samuel

unread,
Feb 26, 2024, 5:32:34 AM2/26/24
to bareos-users
Hi again.

So, as expected, the warning occurred again when job 94511 was copied (but also two new warnings for this specific job):
Warning: dird/catreq.cc:608 MD5 digest not same File=3 as attributes=2
Warning: dird/catreq.cc:608 MD5 digest not same File=18 as attributes=17
Warning: dird/catreq.cc:608 MD5 digest not same File=23 as attributes=22

The restore of the copy also errored in the same manner as before:
 2024-02-26 10:43:50 bareos-fd JobId 95784: Error: findlib/attribs.cc:381 File size of restored file /tmp/bareos-restores/var/backup/old/2024-02-07_23:30/backup_www_1100CC.tar.gz not correct. Original 738212473, restored 442105856.
 2024-02-26 10:44:10 bareos-fd JobId 95784: Error: findlib/attribs.cc:381 File size of restored file /tmp/bareos-restores/var/backup/old/2024-02-12_23:30/backup_www_1100CC.tar.gz not correct. Original 856739265, restored 783548416.
 2024-02-26 10:44:17 bareos-fd JobId 95784: Error: findlib/attribs.cc:381 File size of restored file /tmp/bareos-restores/var/backup/old/2024-02-14_23:30/backup_www_1100CC.tar.gz not correct. Original 856739984, restored 31064064.

For now I'll try disabling autoxflate and see if that solves the issue.

Best regards,
Samuel

Sebastian Sura

unread,
Feb 26, 2024, 5:51:59 AM2/26/24
to bareos...@googlegroups.com

Hi Samuel

i just wanted to make sure i completely understand the situation.  You have an always incremental job with compression
turned on (which algorithm do you use ?) and then you have a copy job that copies the consolidated full job (on the same sd)
to tape with a device that auto inflates on write.

Do i have that right ?

Does the device where the job gets consolidated onto allow concurrent writes from multiple jobs ?

What selection type has your copy job ?

Kind Regards
Sebastian Sura

Am 26.02.24 um 11:32 schrieb 'Samuel' via bareos-users:

Samuel

unread,
Feb 26, 2024, 6:19:50 AM2/26/24
to bareos-users
Yes, that's right. In this case though it's a consolidated incremental which was copied. However, the warnings also occur during migration jobs of consolidated fulls. The warning has never occurred while copying/migrating a non-consolidated incremental.

This is all working just fine except for a couple of jobs which all have the following in common: filedaemon is on another host than director, they use LZ4 compression and autoxflate inflates on replication (though it's also enabled for all the other jobs but doesn't have any effect since they don't use compression).

I use the SQL Query selection type:
Selection Type = SQL Query
Selection Pattern = "
  SELECT DISTINCT Job.JobId, Job.StartTime
  FROM Job, Pool
  WHERE Pool.Name IN ('AI-Consolidated', 'AI-Incremental') AND
        Pool.PoolId = Job.PoolId AND
        Job.Type = 'B' AND
        Job.JobStatus IN ('T', 'W') AND
        Job.Level IN ('I') AND
        Job.JobBytes > 0 AND
        Job.JobId NOT IN (
            SELECT PriorJobId
            FROM Job AS copy
            WHERE copy.Type IN ('B', 'C', 'A') AND
                  copy.JobStatus IN ('T', 'W') AND
                  copy.PriorJobId != 0
        )
  ORDER by Job.StartTime;"

No, there's no concurrent writing going on currently. Before updating to Bareos 23 I did have devices with concurrent writing enabled, but that was before these warnings happened.

Best regards,
Samuel

Samuel

unread,
Mar 4, 2024, 5:43:45 AM3/4/24
to bareos-users
I finally had the chance to restart the storage daemon this morning in order to disable autoxflate.
Today's copy jobs that ran afterwards finished without any warning, so I think it's probably an issue with autoxflate (on replication?). I can also successfully restore those new copies.

Best regards,
Samuel

Sebastian Sura

unread,
Mar 25, 2024, 10:46:42 AM3/25/24
to bareos...@googlegroups.com

Hi Samuel,

i think i managed to reproduce your Problem.  I created a fix here if you want to check it out: https://github.com/bareos/bareos/pull/1745

Kind Regards

Sebastian Sura

Am 04.03.24 um 11:43 schrieb 'Samuel' via bareos-users:
Reply all
Reply to author
Forward
Message has been deleted
0 new messages