BitCurator VM vs ISO

86 views
Skip to first unread message

jfar...@law.harvard.edu

unread,
Apr 12, 2017, 12:43:53 PM4/12/17
to Digital Curation
Hi DC list,

We've been running BitCurator in the production environment on a dedicated machine for a few years now, but because I'm just not at all comfortable in linux environments and have limited time to work on born-digital processing, I've mostly been defaulting to FTK Imager and other programs on my Windows machine to get things done in a timely manner. We don't have a nailed down process yet and I'm still experimenting with lots of different tools. So despite my current laziness, I do want to stick with having BitCurator around, especially given all of the recent development that I haven't yet gotten a chance to play with.

But as our born-digital processes do start to look more like a program, we need more and more workstations to support the work. Right now we have a whole machine that's only running BC and barely being used. Our IT dept is resistant to helping us create a dual-boot environment where we can access either BC or Windows on this machine. It's also a ton of time and expertise to keep the dedicated machine up to date and we just don't have that available to us.

I thought I'd decided to switch to the VM but right off in the quickstart guide it recommends using a production environment.

If anyone has thoughts or experiences to share about these two environments and why the production environment is ideal, I'd love to hear (or point me to another thread, since I imagine this has been discussed many times before). For us, I'm trying to decide whether I should really push IT to help us with a dual-boot system, or start using the VM instead, or use a different model that has worked for other members of this community. The VM would be installed on a FRED-L which has either identical or very similar specs to DI's current offering: https://www.digitalintelligence.com/products/fredl/

Thanks for any help you can provide!

Jess

Curator of Digital Collections, Harvard Law School Library




John Scancella

unread,
Apr 12, 2017, 1:37:31 PM4/12/17
to Digital Curation
I would venture a guess that a VM will have unacceptable performance issues (as does any high performance software vs native). The thing to be aware of is what is your limiting factor? If you are imaging a usb connected hard drive you are most likely IO bound and thus a VM may be acceptable. If on the other hand you are doing a bunch of calculations, you are CPU bound and the VM will really become an issue.

Michael Kjörling

unread,
Apr 12, 2017, 1:46:59 PM4/12/17
to digital-...@googlegroups.com
On 12 Apr 2017 10:37 -0700, from blacksmi...@gmail.com (John Scancella):
> I would venture a guess that a VM will have unacceptable performance issues
> (as does any high performance software vs native).

That depends on the platform, and on how you set it up. Modern
virtualization solutions running on semi-modern hardware _can_ offer
near-native performance if configured properly.


> On Wednesday, April 12, 2017 at 12:43:53 PM UTC-4, jfar...@law.harvard.edu
> wrote:
>>
>> But as our born-digital processes do start to look more like a program, we
>> need more and more workstations to support the work. Right now we have a
>> whole machine that's only running BC and barely being used. Our IT dept is
>> resistant to helping us create a dual-boot environment where we can access
>> either BC or Windows on this machine. It's also a ton of time and expertise
>> to keep the dedicated machine up to date and we just don't have that
>> available to us.
>>
>> I thought I'd decided to switch to the VM but right off in the quickstart
>> guide it recommends using a production environment.

Just a fleeting thought: What is the actual underlying problem that
you are trying to solve here?

A VM is not really any different in terms of needing to be kept up to
date than a physical host, so the upkeep probably won't be much
different. If anything, with a VM or a dual-boot environment, you now
have _two_ environments that you need to concern yourself with and
keep up to date. Are you trying to find a way to put the computer to
use a larger percentage of the time by allowing it to also run other
software? (Giving computer hardware something to do is a laudable
goal, but if we are to suggest actual solutions, it helps to know for
sure what exact problem you are looking to solve.)

--
Michael Kjörling • https://michael.kjorling.semic...@kjorling.se
“People who think they know everything really annoy
those of us who know we don’t.” (Bjarne Stroustrup)

Matthew Disregardmatthew Farrell

unread,
Apr 12, 2017, 4:05:18 PM4/12/17
to digital-...@googlegroups.com
I doubt there's one answer. For me, it's a matter of trade-offs. 

Regarding VM configuration, if you or your IT department can help configure a virtualization setup as Michael suggests, you may get the desired performance levels. If you're using the available BitCurator VirtualBox configuration without any tweaks, you may experience performance issues unless you're host machine has a lot of available memory and you've applied it to the VM in the VirtualBox settings.

The benefits to running the available VirtualBox BitCurator machine include the ability to start from scratch whenever you want to (e.g., whenever a new version of BitCurator is released), and you can use a shared folder with your host machine to avoid configuring the BitCurator environment with your institution's networked storage. The downsides include a) you'll have to recustomize the environment each time you upgrade, and b) VirtualBox is not a very lightweight virtualization solution (it is free, and relatively user friendly, however).

Running some of the forensics applications (bulk_extractor, fiwalk, BitCurator Reporting Tool) are fairly resource intensive, so running BitCurator as a VM in Virtualbox with the bare minimum memory assigned to it will be very slow (or for disk images of even modest size, processes may time out entirely). But as Michael suggested, if you have IT support and a greater range of virtualization options available to you, you may workaround this. 

I run a dual-boot setup Windows 7/BitCurator. It has it's headaches, and I'm on my own to keep the linux side updated. That said, the machine had (until a recent memory upgrade) 4 GB, and so virtualization in VirtualBox was not possible if I wanted to run forensic reports on disk images of relatively small thumb drives.

best,
farrell



--
You received this message because you are subscribed to the Google Groups "Digital Curation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to digital-curation+unsubscribe@googlegroups.com.
To post to this group, send email to digital-curation@googlegroups.com.
Visit this group at https://groups.google.com/group/digital-curation.
For more options, visit https://groups.google.com/d/optout.

Chris Adams

unread,
Apr 12, 2017, 5:15:49 PM4/12/17
to digital-...@googlegroups.com
On Wed, Apr 12, 2017 at 1:46 PM, Michael Kjörling <mic...@kjorling.se> wrote:
On 12 Apr 2017 10:37 -0700, from blacksmi...@gmail.com (John Scancella):
> I would venture a guess that a VM will have unacceptable performance issues
> (as does any high performance software vs native).

That depends on the platform, and on how you set it up. Modern
virtualization solutions running on semi-modern hardware _can_ offer
near-native performance if configured properly.

Agreed – for at least a decade, it's rare to have the hypervisor be a bottleneck before I/O, at least for VMware or Hyper-V (VirtualBox has historically lagged behind both of those, unsurprising given the differences in development resources). Assuming that the host isn't starved for RAM – a cheap problem to fix – you're generally only going to have to worry about hypervisor calls (i.e. simulated hardware) being slower. Pure CPU-bound tasks such as hashing should be very close to native and the combination of optimized guest drivers for e.g. networking and hardware support for virtualization has reduced most of the I/O concerns unless you're trying to sustain transfer rates well into the gigabit/s range.

The major area where you might run into problems would be if you had something very timing sensitive like an external USB device with inadequate buffering so it'd lose data if you can't respond in time, which is not the case for a normal external drive. I last encountered it with data acquisition hardware which old at the time and I'd be more than a little surprised to encounter that now.

Chris 

Jess Whyte

unread,
Apr 13, 2017, 10:51:11 AM4/13/17
to Digital Curation
We use a VM with a shared folder. Works fine.
Reply all
Reply to author
Forward
0 new messages