Dump Utility cache efficiency analysis

Nirmal Thacker

unread,

Jun 23, 2009, 4:25:45 PM6/23/09

to freebsd...@freebsd.org

Hello

This is regarding the dump utility cache efficiency analysis post made on
February '07 by Peter Jeremy [
http://lists.freebsd.org/pipermail/freebsd-hackers/2007-February/019666.html]
and if this project is still open. I would be interested to begin exploring
FreeBSD (and contributing) by starting this project.

I do have some basic understanding of the problem at hand - to determine if
a unified cache would appeal as a more efficient/elegant solution compared
to the per-process-cache in the Dump utility implementation. I admit I am
new to this list and FreeBSD so I wouldn't be able to determine what the
current implementation is, until I get started.

I would first like to understand the opinions of anyone who has looked at
this problem or think this would be a worthwhile project to start off with.

I would also appreciate if I could get simple tips and pointers of setting
up my machine for the project. I understand this would be on the lines of:

1. Installing a stable FreeBSD build
2. Check out a version of the Build suitable for the project
3. Pointers to begin studying the current implementation in the code-tree
structure (would I expect it to lie in the fs/ directory?). I tried to find
it in the FreeBSD cross reference (http://fxr.watson.org/)
4. Read some important sections of the developer handbook (some suggestions
would be great)

Lastly- does this project require the know-how's of device drivers? If so, I
would have to work harder.

Thanks a lot!

- nirmal
_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hacke...@freebsd.org"

Matthew Dillon

unread,

Jun 23, 2009, 9:08:22 PM6/23/09

to Nirmal Thacker, freebsd...@freebsd.org

:Hello

:
:This is regarding the dump utility cache efficiency analysis post made on
:February '07 by Peter Jeremy [
:http://lists.freebsd.org/pipermail/freebsd-hackers/2007-February/019666.html]
:and if this project is still open. I would be interested to begin exploring
:FreeBSD (and contributing) by starting this project.
:
:I do have some basic understanding of the problem at hand - to determine if
:a unified cache would appeal as a more efficient/elegant solution compared
:to the per-process-cache in the Dump utility implementation. I admit I am
:new to this list and FreeBSD so I wouldn't be able to determine what the
:current implementation is, until I get started.

:...

I think the cache in the dump utility is still the one I worked up
a long time ago. It was a quick and dirty job at the time, and it
was never really designed for parallel operation which is probably
why it doesn't work so well in that regard.

In my opinion, a unified cache would be an excellent improvement.
Ultimately dump is an I/O bound process so I don't think we would
really need to worry about the minor increases in cpu overhead
from the additional locking needed.

There are a few issues you will have to consider:

* Dump uses a fork model for its children rather then pthreads. You
would either have to use the F_*LK fcntl() operations or use a
simpler flock() scheme to lock across the children. Alternatively
you could change dump over to a pthreads model and use pthreads
mutexes, but that would entail a lot more work. Dump was never
designed to be threaded.

* The general issue with any caching scheme for dump is how much to
actually cache per I/O vs the size of the cache. Caching larger
amounts of data hits diminishing returns as it also increases seek
times and waste (cached data never usde). Caching smaller amounts
of data hits diminishing returns as it causes the disk to seek more.

Disk drives generally do have a track cache, but they also only typically
have 8-16M of cache ram (32M in newer drives, particularly the higher
capacity ones). A track is typically about 1-2M (maybe higher now) so
it doesn't take much seeking for the drive to blow out its internal
track cache. Caching that much data in a single read would probably
be detrimental anyway.

This also means you do not necessarily want to cache too much
linearly-read data, as the disk drive is already doing it for you.

Because of all of this it is going to be tough to find cache parameters
that work well generally, and the parameters are going to chance
drastically based on the amount of cache you specify on the command
line and the size of the partition being dumped.

-Matt

Danny Braniss

unread,

Jun 24, 2009, 3:52:46 AM6/24/09

to Nirmal Thacker, freebsd...@freebsd.org

> Hello
>
> This is regarding the dump utility cache efficiency analysis post made on
> February '07 by Peter Jeremy [
> http://lists.freebsd.org/pipermail/freebsd-hackers/2007-February/019666.html]
> and if this project is still open. I would be interested to begin exploring
> FreeBSD (and contributing) by starting this project.
>
> I do have some basic understanding of the problem at hand - to determine if
> a unified cache would appeal as a more efficient/elegant solution compared
> to the per-process-cache in the Dump utility implementation. I admit I am
> new to this list and FreeBSD so I wouldn't be able to determine what the
> current implementation is, until I get started.
>
> I would first like to understand the opinions of anyone who has looked at
> this problem or think this would be a worthwhile project to start off with.
>
> I would also appreciate if I could get simple tips and pointers of setting
> up my machine for the project. I understand this would be on the lines of:
>
> 1. Installing a stable FreeBSD build
> 2. Check out a version of the Build suitable for the project
> 3. Pointers to begin studying the current implementation in the code-tree
> structure (would I expect it to lie in the fs/ directory?). I tried to find
> it in the FreeBSD cross reference (http://fxr.watson.org/)
> 4. Read some important sections of the developer handbook (some suggestions
> would be great)
>
> Lastly- does this project require the know-how's of device drivers? If so, I
> would have to work harder.
>

short answer:
you don't need driver knowledge, but fs is a must.
long answer:
In the days long gone, the cpu/disk where slower than the tape,
which could 'stream', and unless you could provide data fast enough, the tape
would stop, rewind some, then pick up speed, and write.
Nowadays, tapes are slower, but some/most of us dump to file, or
pipe to restore (dump -f - ... | restore rf -), so that the tape speed is
irrelevant. On the other hand, computers have much more memory, so buffering
can be done by the OS.
What I'm trying to say, and not wanting to take out any air from
from the sails, is that dump should be re-valuated, and maybe OpenBSD/KIS
is the best.

danny

Peter Jeremy

unread,

Jun 24, 2009, 3:59:05 AM6/24/09

to Nirmal Thacker, freebsd...@freebsd.org

On 2009-Jun-23 15:52:04 -0400, Nirmal Thacker <thacker...@gmail.com> wrote:
>I would first like to understand the opinions of anyone who has looked at
>this problem or think this would be a worthwhile project to start off with.

I'm aware of the following references:
http://www.mavetju.org/mail/view_message.php?list=freebsd-hackers&id=375676
http://www.mavetju.org/mail/view_thread.php?list=freebsd-stable&id=1335519&thread=yes

>1. Installing a stable FreeBSD build
>2. Check out a version of the Build suitable for the project

Any changes will need to apply to FreeBSD -current, though they may be
back-ported once tested. This means that you will need a -current
system at some point. 8-current is reasonably stable at this point and
would be my suggestion.

>3. Pointers to begin studying the current implementation in the code-tree
>structure (would I expect it to lie in the fs/ directory?). I tried to find
>it in the FreeBSD cross reference (http://fxr.watson.org/)

The code is in src/sbin/dump. It references various system header
files in order to understand the UFS on-disk format.

>Lastly- does this project require the know-how's of device drivers? If so, I
>would have to work harder.

No. Dump is completely userland.

--
Peter Jeremy

Alexander Leidinger

unread,

Jun 24, 2009, 9:13:52 AM6/24/09

to Nirmal Thacker, freebsd...@freebsd.org

On Tue, 23 Jun 2009 15:52:04 -0400 Nirmal Thacker
<thacker...@gmail.com> wrote:

> I would also appreciate if I could get simple tips and pointers of
> setting up my machine for the project. I understand this would be on
> the lines of:
>
> 1. Installing a stable FreeBSD build
> 2. Check out a version of the Build suitable for the project

All development is taking place in -CURRENT, so you would have to check
out this one, or you install it right away, it's not declared stable
yet, but as we have started the release management process for 8.0,
it's not that unstable either... :)

> 3. Pointers to begin studying the current implementation in the
> code-tree structure (would I expect it to lie in the fs/ directory?).
> I tried to find it in the FreeBSD cross reference
> (http://fxr.watson.org/) 4. Read some important sections of the
> developer handbook (some suggestions would be great)

Dump is a complete userland implementation. All you need to know is the
userland programming stuff, specially for what you want to do. You can
find it online at
http://svnweb.freebsd.org/viewvc/base/head/sbin/dump/

Bye,
Alexander.

Nirmal Thacker

unread,

Jun 24, 2009, 2:00:05 PM6/24/09

to Peter Jeremy, freebsd...@freebsd.org

Thanks for all the replies and suggestions
I ll begin by running, benchmarking, understanding dump for myself and take
up Matt's suggestions above to understand the unified caching implementation
in more detail
-n

On Wed, Jun 24, 2009 at 3:58 AM, Peter Jeremy
<peter...@optushome.com.au>wrote: