Why
initializing an HFS volume destroys filenames and folder
structure
The reason lies in
a fundamental architectural difference between HFS+ and NTFS-type
filesystems, and it is why initializing an HFS volume is among
the most destructive
scenarios you can face in a
recovery lab.
THE FILENAME
EXISTS IN ONLY ONE PLACE, AND THAT PLACE GETS OVERWRITTEN
In HFS+, all names and the entire
folder hierarchy live exclusively in the Catalog File,
a B-tree keyed by (parent CNID + node name). There is no copy of
the name anywhere else:
the file data itself contains
zero information about what it is called or where it sits in
the tree. The raw content of a JPEG knows it is a JPEG (it has
its FFD8 signature), but it
does not know it is called
vacation_2019.jpg, nor that it lives in /Photos/Summer/. That
association is written only in the catalog.
When you initialize, Disk Utility
writes a fresh Volume Header (offset 1024) and its backup
(the Alternate Volume Header, located 1024 bytes from the end of
the volume), both pointing
to an empty Catalog File,
plus an empty Extents Overflow and an Allocation File (the
bitmap)
that marks everything as free. The critical
point is that HFS+ allocates its metadata files
deterministically, in the first blocks of the volume: the new
empty catalog therefore tends
to land physically on
top of the header node, the index nodes, and the first leaf nodes
of
the old catalog, which are exactly the nodes that
hold the root of the tree. This is not a
soft delete
with a flag flipped (the way it works when you delete a single file): it is
a
rewrite of the central structure.
WHY SURVIVING FRAGMENTS ARE NOT
ENOUGH
Even in the best
case, where some leaf nodes of the old catalog survive in unallocated
space
and UFS Explorer or DataExtractor manages to
find them with a low-level scan (HFS+ node
descriptors
have a recognizable structure: fLink, bLink, kind, height, numRecords),
three
insurmountable problems remain.
First, the catalog record only gives
you name + parent CNID + the first 8 extents of the
data fork. Without the index nodes and without the chain of
nodes climbing back to the root
(CNID 2), you have a
file report.pdf with parent CNID 1547, but if the record for
folder
1547 has been lost you do not know what that
folder is called nor where to place it. It
becomes an
orphan. The hierarchy can only be rebuilt if the parent chain is intact,
and
initialization breaks precisely the top of
it.
Second, the Extents
Overflow B-tree is also zeroed, so files fragmented across more than
8
extents lose their tail. You recover the beginning
at best.
Third, the reset
bitmap means the tool can no longer distinguish allocated from free, and
as
soon as the new volume is used even slightly,
deterministic allocation keeps overwriting
exactly the
region where the old catalog lived, and then the data itself.
THE FALLBACK TO CARVING
When the catalog cannot be rebuilt,
the only thing left to the tool is signature-based
recovery (IntelliRAW in UFS Explorer, raw recovery in
DataExtractor): it scans the blocks
looking for known
signatures and reconstructs files from their content. By construction
this
method cannot return names (they were never
written together with the data), cannot return
folders
(they existed only in the B-tree), and is unreliable on fragmented files or
files
without a recognizable signature. That is why
you end up with FILE0001.JPG, FILE0002.PDF
sorted by
type.
THE CONTRAST WITH
NTFS
This explains why with
NTFS you often get away with it after a quick format. NTFS writes
the
name and the reference to the parent directory
inside each file's own MFT record (the
$FILE_NAME
attribute), co-located, self-describing, and fixed-size. The old MFT (large
and
typically in the middle of the volume) is usually
not overwritten by the tiny new MFT, so
the records
survive and tools rebuild the full names and tree from the parent
references. In
HFS+ there is no co-located per-file
record: once the central catalog is gone, the name
information is simply no longer present anywhere on the disk. It
is not that the tool cannot
read it: there is nothing
left to read.
Sorry
Roberto
CTO @
RecuperoDati299