Jean-Baptiste DUBOIS wrote on Fri, Mar 05, 2021 at 11:04:01 +0000:
> Inside the 'db' folder we have 'rev', 'revprops' 'transactions' folder contains files.
revs/ is where file content lives. See subversion/libsvn_fs_fs/structure.
(You can read it in trunk too: it covers all FSFS formats, not only the
newest one.)
> Some files are missing but the company told us that all files
> recovered are fully recovered (ie: file integrity is OK).
That's good. It's also a very different question than the one Philippe
asked.
>
> I know that we could not restore the entire database, but can we extract some 'SVN plain data' from file in 'revs' folder
>
> Hereunder a view of thoses files (ordered by Size) ...
>
> [revs]
For future reference, text is preferable to images. Copying the output
of `dir` would have been easier for us to consume.
> Is it possible to detect plain data with no dependency on previous revs inside these 'revs' files and extract them ?
Yes. In your case it'll actually be a four-liner loop in the scripting
language of your choice, but I'll give the full answer.
Let's take an example format-2 revision file:
[[[
% rm -rf r
% svnadmin create r --compatible-version=1.4
% cat r/db/format
2
% for i in {1..$(wc -l < ${fn::=~/src/svn/trunk/README})} ; do svnmucc -mm -U file://$(pwd)/r put =(head -$i $fn) iota$((i % 2)) ; done > /dev/null
% cat_the_youngest_revision_file() { < r/db/revs/*(.om[1]) LC_ALL=C sed -e 's/[^ -~]/X/g' } # translate any octets other than printable ASCII to "X"
% cat_the_youngest_revision_file | nl -ba
1 DELTA 82 0 331
2 SVNXXXXX%XX(XXXXXXX&X& Finally, be sure to see Appendix B in the Subversion Book. It
3 contains a very quick overview of the major differences between
4 CVS and Subversion.
5
6 ENDREP
7 id: 2.0.r86/210
8 type: file
9 pred: 2.0.r84/183
10 count: 42
11 text: 86 0 188 2341 c09759b8da81bf3e23647f4c517abbe0
12 cpath: /iota0
13 copyroot: 0 /
14
15 PLAIN
16 K 5
17 iota0
18 V 16
19 file 2.0.r86/210
20 K 5
21 iota1
22 V 16
23 file 1.0.r85/210
24 END
25 ENDREP
26 id: 0.0.r86/428
27 type: dir
28 pred: 0.0.r85/428
29 count: 86
30 text: 86 347 68 68 d3f8fb2002e019614ec0e47c79e2ac4c
31 cpath: /
32 copyroot: 0 /
33
34 2.0.t85-1 modify true false /iota0
35
36
37 428 558
%
]]]
(Aside: As an f8 repository, the svndiff delta would contain only the
last line of README, rather than the last three lines as in this
example.)
The parts you're interested in are:
- "DELTA %ld %ld %ld" lines (e.g., the «82» on line 1)
- "text:" and "props:" lines (e.g., the «86» on line 11)
- "DELTA\n" lines (without numbers)
- "PLAIN\n" lines
In the first two cases, the first number on the line identifies the
revision number in which depended-on data is found. See the
aforementioned «structure» file for details. The last two cases
identify data that's present inline.
Using this information, you could build a DAG of reachable reps (a "rep"
is the thing between the "DELTA" or "PLAIN" line and the "ENDREP" line)
and extract them. However, since you're on format 2, there's an easier
way.
Format 2 doesn't support rep-sharing and doesn't deltify directory reps,
so simply running `svnlook changed -r 86` and then `svnlook cat -r 86`
against each file printed thereby should extract everything extractable.
Any given `svn cat` invocation might fail if a DELTA line refers to
a revision whose rev files has been lost. ("text:" and "props:" lines
will always point into the rev file they themselves are in.)
Use `svnlook propget` in addition to `svn cat` to extract versioned
properties. svn_hash_read2() will parse the format. (It's a public
API and likely available via the bindings as well, but if needed,
I happen to have posted a pure Python implementation of that last week:
<
https://mail-archives.apache.org/mod_mbox/subversion-dev/202102.mbox/%3C20210226173525.GA24828%40tarpaulin.shahaf.local2%3E>.)
Note: that's `svnlook cat -r`, not `svn cat -r`. The difference
matters: -r to svnlook denotes a peg revision, not an operational
revision. (Also, using svnlook(1) bypasses several layers of API.)
In newer formats, where directory reps may be deltified, it's possible
to get a case such as
.
r10: mkdir /A
r20: add /A/foo
r30: add /A/bar
.
with r20 lost. In this case, if the rep of /A in r30 happened to depend
on the rep of /A in r20, `svn ls ^/A@30` and `svn cat` of files
thereunder would both fail. However, if one figured out the location of
/A/bar's node-rev header or rep, one could still read those directly,
using the appropriate internal APIs.
Cheers,
Daniel