Updates to DFXML Python tools and model

42 views
Skip to first unread message

dwhite21787

unread,
Jun 16, 2017, 5:02:50 PM6/16/17
to Digital Curation, alexande...@nist.gov
Posted on behalf of Alexand...@nist.gov and forwarded from df...@nist.gov :

Hello all,

There have been some recent updates in the DFXML Python code base and tool family.  This message describes two scripts discussed at this year's BitCurator Users' Forum (BUF), and some updates for consideration for the DFXML language and schema that are in part thanks to the second script.

First, the Python code base has received some stability updates, bug fixes, and another DFXML generator.  This new generator, `walk_to_dfxml.py`, takes a logical (/mounted) file system, walks (/traverses) it, and produces a DFXML file summarizing the directory tree starting at the directory where the script was called.  This behavior is similar to `dfxml_tool.py`, but the "walk" script can also take a parallel-processing flag.

In terms of stress-testing, I've used the "walk" script to report the contents of an offline Linux disk's root partition and report all the files, including the `/dev` folder's device files.  From discussions at BUF, it seemed this would help some folks.

Also at BUF, I discussed a translator I had been working on, converting the output of the Disktype [1] disk analyzer into DFXML.  This translator is now available here:

    https://github.com/ajnelson-nist/disktype_to_dfxml

>From the discussions, it seemed this would be helpful for some of the projects conference attendees were working on.

Testing this translator presented some historical uses of the layers of the storage system on-disk stack, which suggest some upgrades to the DFXML language and object nesting model.  For instance, the simple view I had of what could be on a disk was previously close to this:

    Disk                   <--- "Source(s)" in DFXML
    / Partitioning system  <--- Not in DFXML
      / Partition          <--- Not in DFXML
        / File system      <--- "Volumes" in DFXML
          / Files

I was also aware of some polyglot / hybrid storage models, such as an ISO 9660 file system encompassing the disk at the same time as a partitioning system that also encompassed the disk (sometimes used for PC/Mac game publication); and the MBR/EFI hybrid partitioning scheme.  Here's what I didn't know could happen, found from some cases in the National Software Reference Library's software installation media set:

    Disk
    / Partitioning system
      / Partition
        / Partitioning system  <--- Can occur with SPARC [2]
          / Partition
            / (etc.)

    Disk
    / ISO 9660 file system
      / Disk  <--- El Torito virtual boot disk (often a <3MB "floppy")
        / Partitioning system
          / (etc.)

    Disk
    / ...
      / HFS file system  <--- Wrapper for embedded HFS+ file system
        / HFS+ file system

These go a bit past what DFXML supports:

    Input stream
    / File systems (volumes)
      / Files

I will be filing Github issues on the DFXML Schema repository to discuss these extensions.

Feedback is welcome on any of these efforts.

--Alex


[1] Disktype's project page is here:
    http://disktype.sourceforge.net/
[2] One example of a disklabel/partition/disklabel nesting is here:
    https://github.com/ajnelson-nist/disktype_to_dfxml/blob/unstable/tests/ubuntu16.04/nsrl-16618-1.txt
Reply all
Reply to author
Forward
0 new messages