svnadmin: E16004: Invalid r4422 footer. How to investigate deeper?

146 views
Skip to first unread message

Dmitry Minsky

unread,
Jun 18, 2022, 1:18:04 PM6/18/22
to us...@subversion.apache.org
I have a pretty old repository and now going to move it to another machine. When I start the dump process I stumbled upon this error in one of the old revisions:

svnadmin: E16004: Invalid r4422 footer

I now workaround with a skip over corrupted revision and continue --incremental dump. But like every 10-50 revisions I still get this error because I suppose there are files that depend on something from this old corrupted revision. The question is, how and where to look in this revision so I could manually fix the error by changing some files or checksum or anything?

Daniel Shahaf

unread,
Jun 19, 2022, 5:19:36 AM6/19/22
to Dmitry Minsky, us...@subversion.apache.org
Dmitry Minsky wrote on Sat, 18 Jun 2022 17:16 +00:00:
> I have a pretty old repository and now going to move it to another machine.
> When I start the dump process I stumbled upon this error in one of the old
> revisions:
>
> svnadmin: E16004: Invalid r4422 footer
>

It's actually E160004. (Just saying this so search engines will find this thread.)

> I now workaround with a skip over corrupted revision and continue
> --incremental dump. But like every 10-50 revisions I still get this error
> because I suppose there are files that depend on something from this old
> corrupted revision. The question is, how and where to look in this revision
> so I could manually fix the error by changing some files or checksum or
> anything?

At the very end of the file, like the following but on db/revs/4/4422:

% strings db/revs/0/1 | tail -1
417 b4657c89ff644471b6760fd6389d253c 445 ea755737e485eeb03c0012e5d6bc1b49I
% < r/db/revs/0/1 xxd -s 417 -l 9
000001a1: 4c32 502d 494e 4445 58 L2P-INDEX
% < r/db/revs/0/1 xxd -s 445 -l 9
000001bd: 5032 4c2d 494e 4445 58 P2L-INDEX
%

(The first command might pick up some of the P2L/L2P data too? I don't
remember whether it's guaranteed that there'll be a non-printable
character between that and the last line.)

Dmitry Minsky

unread,
Jun 19, 2022, 8:00:05 AM6/19/22
to Daniel Shahaf, us...@subversion.apache.org
There is something in revisions before and after corrupted one:

% strings repo/db/revs/7/7448 | tail -1
2244140591 fa0c1a8229575b0ce27ef0c5a8b898b4 2244140730 7d861109493094a15013c7ea105e33a1W

% strings repo/db/revs/7/7450 | tail -1
51995736 59c66b6d95365e6bdb4be4ec3b2d3a34 51995799 72059006b7c456b03efb7f07e0557795S

But on corrupted (actual number of corrupted revision is 7449) it doesn't give anything meaningful

% strings repo/db/revs/7/7449 | tail -1
Y$Q8

% < repo/db/revs/7/7449 xxd -s Y$Q8 -l 9
00000000: 4445 4c54 410a 5356 4e                   DELTA.SVN

Daniel Shahaf

unread,
Jun 21, 2022, 3:55:14 AM6/21/22
to Dmitry Minsky, us...@subversion.apache.org
Dmitry Minsky wrote on Sun, 19 Jun 2022 11:53 +00:00:
> There is something in revisions before and after corrupted one:
>
> % strings repo/db/revs/7/7448 | tail -1
> 2244140591 fa0c1a8229575b0ce27ef0c5a8b898b4 2244140730 7d861109493094a15013c7ea105e33a1W
>
> % strings repo/db/revs/7/7450 | tail -1
> 51995736 59c66b6d95365e6bdb4be4ec3b2d3a34 51995799 72059006b7c456b03efb7f07e0557795S
>
> But on corrupted (actual number of corrupted revision is 7449) it doesn't
> give anything meaningful
>
> % strings repo/db/revs/7/7449 | tail -1
> Y$Q8
>

OK, so the last line isn't there at all.

What you now need to do is figure out what happened to the revision
file. For starters, check whether the change-list portion (a list of
lines with "add-file" "modify-dir" and such on them, immediately
before «L2P-INDEX») is still there.

- If that's the case, the file was likely truncated, but the lost
portion of the file can likely be reconstructed manually, though
perhaps with some effort (what's the size of db/revs/7/7449 in bytes
and the number of lines `svn log -qvr 7449`'s output should have?).

- If it got truncated earlier, then it's likely that actual deltas
(PLAIN..ENDREP or DELTA..ENDREP) got truncated, i.e., data loss has
occurred, so you'd have to reconstruct the revision manually or
proceed without the lost data.

- And if it's not merely truncated but otherwise corrupted, well,
it depends.

Do you have any way to reconstruct the revision or its contents? Older
backups, mirrors, commit mails, a working copy that committed this
revision or updated to it and never got 'svn cleanup' run on it?
(The thinking being to exploit issue #4071; cf.
<https://subversion.apache.org/docs/release-notes/1.7#wc-pristines>.)

Reminder that revision files contain binary data and should not be
opened by editors that auto-fix whitespace and so on. And that
repositories should be backed up before being manually operated on.

> % < repo/db/revs/7/7449 xxd -s Y$Q8 -l 9
> 00000000: 4445 4c54 410a 5356 4e DELTA.SVN
>

That's likely because xxd(1) calls atoi(3) on its argv[2], which
(assuming the shell parameter «Q8» is unset) has the value «Y», so the
function call returns 0.

The output appears sane, though. These bytes are exactly what the
start of a revision file might look like.

Cheers,

Daniel

P.S. For any future FSFS hackers out there, note that it's likely r7449
added a file or a directory. (Why? Because gung bhgchg fubjf
n /frys-pbzcerffrq/ qrygn.)

Dmitry Minsky

unread,
Jun 28, 2022, 7:08:54 AM6/28/22
to Daniel Shahaf, us...@subversion.apache.org
Ok. I’m pretty sure that db/revs/7/7449 is just truncated. Since there aren’t any signs of any text readable data at the bottom of the file and the top of file looks similar to 7448, 7450 and to any other revision. 

So, let’s say I’m 85.23% sure about content of this particular revision. How can I recreate revision from folder with files? This rev contains only add-dir and add-file changes. 

--
Dmitry Minsky

Daniel Shahaf

unread,
Jun 28, 2022, 9:15:09 AM6/28/22
to Dmitry Minsky, us...@subversion.apache.org
Dmitry Minsky wrote on Tue, 28 Jun 2022 11:01 +00:00:
> Ok. I’m pretty sure that db/revs/7/7449 is just truncated. Since there
> aren’t any signs of any text readable data at the bottom of the file
> and the top of file looks similar to 7448, 7450 and to any other
> revision.
>
> So, let’s say I’m 85.23% sure about content of this particular
> revision. How can I recreate revision from folder with files? This rev
> contains only add-dir and add-file changes.

What does the "folder with files" contain?

Is it a working copy? A repository? An export? None of the above?

Does it contain exactly the files and directories added in r7449 *as
they were in that revision*, and nothing else?

Dmitry Minsky

unread,
Jun 28, 2022, 9:43:53 AM6/28/22
to Daniel Shahaf, us...@subversion.apache.org
> What does the "folder with files" contain?

Just a random files on my computer ;) It’s not from working copy or repository or anything else meaningful. Let’s assume that it’s just a bunch of random files which I want to put in the middle of repo and hope that it won’t blow up ;) Is that possible?

--
Dmitry Minsky

Daniel Shahaf

unread,
Jun 28, 2022, 9:51:46 AM6/28/22
to Dmitry Minsky, us...@subversion.apache.org
Dmitry Minsky wrote on Tue, 28 Jun 2022 13:18 +00:00:
>> What does the "folder with files" contain?
>
> Just a random files on my computer ;) It’s not from working copy or
> repository or anything else meaningful. Let’s assume that it’s just a
> bunch of random files which I want to put in the middle of repo and
> hope that it won’t blow up ;) Is that possible?

With enough effort, yes.

Devs: In attempting to recreate db/revs/7/7449, what needs to be
matched? Off the top of my head, it's rep-cache.db references, actual
rep-sharing references in future rev files, and possibly node-rev id's.
Anything else?

What's the output of «sqlite3 rep-cache.db '.header on' 'SELECT * FROM
rep_cache WHERE revision = 7449'»?

Does any rev file after 7449 contain " 7449 " on a "text:" or
"props:" line?

Does any rev file after 7449 contain ".7449/"?

Daniel

Dmitry Minsky

unread,
Jun 28, 2022, 10:50:06 AM6/28/22
to Daniel Shahaf, us...@subversion.apache.org
Soo, here is an output of sql request:

% sqlite3 rep-cache.db '.header on' 'SELECT * FROM rep_cache WHERE revision = 7449'
hash|revision|offset|size|expanded_size
a684c1201230ed000e8baf11fcd890efebb059db|7449|3|106064003|111204465


And here is 7449 file size

% ls -l revs/7/7449
-r--r--r--. 1 apache apache 106067461 Feb 4 2021 revs/7/7449


Now, what about "links" to 7449 in revisions after 7449. There is something in 7450:

% strings revs/7/7450 | tail -7
copyroot: 0 /
minfo-cnt: 61
_7.0.t7449-67p add-dir false false false /trunk/ProjectB/Art/Characters/Survivors/01/textures/substance_render
_9.0.t7449-67p add-file true true false /trunk/ProjectB/Art/Characters/Survivors/01/textures/substance_render/01_render.spp
L2P-INDEX
P2L-INDEX
51995736 59c66b6d95365e6bdb4be4ec3b2d3a34 51995799 72059006b7c456b03efb7f07e0557795S


% strings revs/7/7450 | grep 7449
text: 7450 3 51992815 54791207 b326aa3b7fd0ea02b8e75ac8a8dcc656 1430895ca8250cfb117997d6ee543e7e2c06c265 7449-67p/_b
props: 2 757 65 53 113136892f2137aa0116093a524ade0b - 7449-67p/_d
DELTA 7449 11 138
pred: 4-7052.0.r7449/12
pred: 3-6161.0.r7449/14
DELTA 7449 15 24
pred: 2-6160.0.r7449/16
DELTA 7449 17 20
pred: 1-6132.0.r7449/18
DELTA 7449 19 25
pred: 3-232.0.r7449/20
DELTA 7449 21 25
pred: 0.0.r7449/2
_7.0.t7449-67p add-dir false false false /trunk/ProjectB/Art/Characters/Survivors/01/textures/substance_render
_9.0.t7449-67p add-file true true false /trunk/ProjectB/Art/Characters/Survivors/01/textures/substance_render/01_render.spp


And there is no "links" to 7449 in 7451 revision and after it, BUT I still can't dump these revisions. Maybe because of "chain" of "links". Like 7449 <- 7450 <- 7451 etc.?

% svnadmin dump /var/repo_serpico -r7450 > ~/sdb/test.dump
svnadmin: E160004: Corrupt representation '7449 21 25 159 24ad3bd9d7945c1c7ca3f5e714ea868e - -'
svnadmin: E160004: Invalid r7449 footer

% svnadmin dump /var/repo_serpico -r7451 > ~/sdb/test.dump
svnadmin: E160004: Corrupt representation '7449 21 25 159 8f3d18747d3388ff2b35096cafbd57ab - -'
svnadmin: E160004: Invalid r7449 footer


--
Dmitry Minsky

Daniel Shahaf

unread,
Jun 28, 2022, 1:59:51 PM6/28/22
to Dmitry Minsky, us...@subversion.apache.org
Dmitry Minsky wrote on Tue, 28 Jun 2022 14:44 +00:00:
> Soo, here is an output of sql request:
>
> % sqlite3 rep-cache.db '.header on' 'SELECT * FROM rep_cache WHERE
> revision = 7449'
> hash|revision|offset|size|expanded_size
> a684c1201230ed000e8baf11fcd890efebb059db|7449|3|106064003|111204465
>

OK, so it would seem r7449 added one file and no directories. That, or
every other added file/directory was a copy.

>
> And here is 7449 file size
>
> % ls -l revs/7/7449
> -r--r--r--. 1 apache apache 106067461 Feb 4 2021 revs/7/7449
>

So the rev file size is the sqlite3 SIZE plus 3458 bytes. I guess those
could be the dir rep, node-rev header, and so on.

Also:

>>> '%x' % 106067461
'6527605'
>>> '%x' % 106064003
'6526883'

No [a-f] in either. I guess that's just a coincidence. The probability
of that (disregarding the high two bytes which didn't change) is
(10/16)**8) ≈ 2.3% ≈ 1/43.

>
> Now, what about "links" to 7449 in revisions after 7449. There is
> something in 7450:
>
> % strings revs/7/7450 | tail -7
> copyroot: 0 /
> minfo-cnt: 61
> _7.0.t7449-67p add-dir false false false /trunk/ProjectB/Art/Characters/Survivors/01/textures/substance_render
> _9.0.t7449-67p add-file true true false /trunk/ProjectB/Art/Characters/Survivors/01/textures/substance_render/01_render.spp

Those are node-rev id's from r7450's transaction. It's using in-transaction
id's as opposed to final in-revision id's, but I get that on a new test
repo too, which suggests this is an unrelated issue and that nothing
depends on these two values. Which is to say, "move along, nothing to
see here".

> L2P-INDEX
> P2L-INDEX
> 51995736 59c66b6d95365e6bdb4be4ec3b2d3a34 51995799
> 72059006b7c456b03efb7f07e0557795S
>
>
> % strings revs/7/7450 | grep 7449
> text: 7450 3 51992815 54791207 b326aa3b7fd0ea02b8e75ac8a8dcc656 1430895ca8250cfb117997d6ee543e7e2c06c265 7449-67p/_b
> props: 2 757 65 53 113136892f2137aa0116093a524ade0b - 7449-67p/_d

That's what the structure file terms "uniquifier". I don't recall its
semantics off the top of my head.

> DELTA 7449 11 138
> pred: 4-7052.0.r7449/12
> pred: 3-6161.0.r7449/14

Yeah, these matter. The former is a non-self DELTA rep, i.e., a file
stored as a delta against another file; the latter indicates that a node-revision
("a revision of something in the repository") is a newer revision of an
existing "something in the repository" (as opposed to a historyless
add). When regenerating r7449's rev file you'll want to make sure both
of these pointers remain valid.

The pred: links are easier since you can probably just recommit r7449 to
a copy-up-to-r7448 of the repository and then change them. Make sure
not to break offsets later in the file.

The delta bases will require more work; see below.

«svnfsfs load-index» might be helpful in regenerating the rev file.
I haven't tried it. (Or you could use linear addressing for the
restore, if regenerating a linear-addressing file is easier.)

> DELTA 7449 15 24
> pred: 2-6160.0.r7449/16
> DELTA 7449 17 20
> pred: 1-6132.0.r7449/18
> DELTA 7449 19 25
> pred: 3-232.0.r7449/20
> DELTA 7449 21 25
> pred: 0.0.r7449/2

Ditto.

> _7.0.t7449-67p add-dir false false false /trunk/ProjectB/Art/Characters/Survivors/01/textures/substance_render
> _9.0.t7449-67p add-file true true false /trunk/ProjectB/Art/Characters/Survivors/01/textures/substance_render/01_render.spp
>
>
> And there is no "links" to 7449 in 7451 revision and after it, BUT I
> still can't dump these revisions. Maybe because of "chain" of "links".
> Like 7449 <- 7450 <- 7451 etc.?
>

That plus the fact that you didn't pass --deltas or --incremental so it
tried to dump the entire contents of ^/@r7450 (what «svn co ^/@7450»
would get, as opposed to «svn diff -c 7450»).

> % svnadmin dump /var/repo_serpico -r7450 > ~/sdb/test.dump
> svnadmin: E160004: Corrupt representation '7449 21 25 159 24ad3bd9d7945c1c7ca3f5e714ea868e - -'
> svnadmin: E160004: Invalid r7449 footer
>

OK, so the next step is to reconstruct bases of the five non-self
DELTAs in r7450.

First, look in the truncated r7449 rev file. There might be intact reps
in it. A rep always ends with "ENDREP\n". (Nothing prevents "ENDREP\n"
from occurring inside the rep itself; parsing a rep requires knowing its
length in advance.)

Second, try the "random files" you mentioned upthread.

Once you have all these candidate files — the reps extracted from the
truncated rev file and the "random files" — try applying each of the
deltas in r7450 to each of the candidate files, and figure out which
combinations produce the md5/sha1 checksums recorded in r7450.

Presumably directory deltification is enabled, meaning those five deltas
comprise one file content delta (based on rep-cache.db) and four
directory deltas — one for each directory level between the modified
file and the repository root — which can be regenerated by hand.
(This is delicate in case the svndiff — meaning the contents of the
DELTA — has "copy" instructions that refer to the node-rev id inside
the serialized directory node-rev, but possible.)

Devs — anyone sees any simpler solution? If you've thought about
this and _don't_ see a simpler solution, please say so.

Assuming I haven't missed any simpler solution, you'll want:

1. To extract from the r7449 rev file what can be extracted from it.
The code for that exists in libsvn_fs_fs, but you'll need to jump
through hoops to arrange for it to be called even though r7449 is
truncated. Basically, you need to either skip (in the debugger or with
a custom patch) or fabricate (by editing rev files manually) everything
that happens before libsvn_fs_fs seek()s to a particular offset in the
revision file.

2. A script that takes as input a file and a delta, applies the latter
to the former, and outputs the result. We don't seem to have one of
those already. If you write one, do consider contributing it for our
tools/ directory.

3. (possibly, depending on step #1) To regenerate the new dir reps of
the truncated r7449 based on r7450 and following revisions.

Daniel

Daniel Shahaf

unread,
Jun 28, 2022, 2:03:04 PM6/28/22
to Dmitry Minsky, d...@subversion.apache.org, us...@subversion.apache.org
Good morning dev@,

Anyone has a script that takes as input a file and an svndiff and emits
to stdout the result of applying the latter to the former? This came up
on users@ in the context of reconstructing a truncated rev file.

I've checked tools/.

Cheers,

Daniel


Daniel Shahaf wrote on Tue, 28 Jun 2022 17:58 +00:00:
> Assuming I haven't missed any simpler solution, you'll want:

Daniel Shahaf

unread,
Jun 28, 2022, 2:11:55 PM6/28/22
to Dmitry Minsky, us...@subversion.apache.org
Dmitry Minsky wrote on Tue, 28 Jun 2022 13:18 +00:00:
>> What does the "folder with files" contain?
>
> Just a random files on my computer ;) It’s not from working copy or
> repository or anything else meaningful. Let’s assume that it’s just a
> bunch of random files which I want to put in the middle of repo and
> hope that it won’t blow up ;) Is that possible?

And to answer my own question about a simpler solution:

1. Create a copy of the repository up to r7448.

2. Create an empty r7449.

3. Copy over the remaining revisions, discarding (if dump/load: with
svndumpfilter; if svnsync: by using authz to make certain paths
unreadable) all changes to the files that r7449 touched.

This is lossy, but a lot easier.

Actually, you could just do this experiment and in step 2 manually
commit one of the candidate files to the path it was originally
committed in r7449 to, to see whether that candidate is the right one;
but then you'd have to load/sync all revisions between r7450 and the
next revision that touches the file-modified-in-r7449 in order to see
that the candidate isn't the right one.

The important thing is that once you've fabricated an r7449 you can dump
(--deltas --incremental) or svnsync r7450:HEAD on top of it… assuming
all _future_ revisions are fine.

You might want to check that no other revisions are truncated, and which
revisions after r7449 modified the file touched in r7449. ("file",
singular, per the rep-cache output.) If no other revision touched that
file, you can just do the "copy up to r7448, commit one of the
candidates, copy r7450:HEAD" thing with whichever candidate you prefer.

Daniel
Reply all
Reply to author
Forward
0 new messages