Discscrub Enhancement Proposals

47 views
Skip to first unread message

Peter Hyman

unread,
Aug 6, 2014, 11:13:20 AM8/6/14
to diskscru...@googlegroups.com
In order to improve usability of discscrub, I am beginning work on some enhancements and wish to get feedback on suitability.
  1. Allow use of Environment Variables for setting persistent options (such as Pattern, Scrub Signature, Remove File, Rename Directory Entry, Link Target, etc.
  2. Allow use of a conf file in /etc or user home to store persistent options. User conf will override all, /etc/conf will override ENV, ENV will set program options, command line overrides all.
  3. Allow removal of multiple regular files on the command line.
  4. Allow use of Wildcards to select multiple files
  5. When Multiple Files selected, force a one time interactive mode requiring a positive YES response
  6. Consider adding an option to override interactive prompt (i.e. accept all mode)

Architecturally, this won't change the program much. 
  1. Conf files or ENV files would be loaded prior to reading the command line. Command line will override all.
  2. After all options are read from argv, all additional argv items would be considered files
    1. If a special file (i.e. a device), stop there and scrub the device.
    2. For each additional argv, scrub files one at a time
      1. If argv is a regular file, then process
      2. If argv is a wildcartd, then glob it and process file list one at a time
      3. dirent function will need some review since multiple files will cause a name collision
Any feedback is appreciated.

Peter

Jim Garlick

unread,
Aug 6, 2014, 11:57:06 AM8/6/14
to diskscru...@googlegroups.com
Hi Peter,

Sounds pretty good to me. Couple comments inline below.

Jim

On Wed, Aug 06, 2014 at 08:13:20AM -0700, Peter Hyman wrote:
> In order to improve usability of discscrub, I am beginning work on some
> enhancements and wish to get feedback on suitability.
>
> 1. Allow use of Environment Variables for setting persistent options
> (such as Pattern, Scrub Signature, Remove File, Rename Directory Entry,
> Link Target, etc.
> 2. Allow use of a conf file in /etc or user home to store persistent
> options. User conf will override all, /etc/conf will override ENV, ENV will
> set program options, command line overrides all.

I would probably set the precedence like this:
1. (lowest) compiled-in defaults
2. /etc/scrub.conf
3. ~/.scrub.conf
3. environment
4. (highest) command line

and be clear about that in the man page.

One thing about config files: writing your own parser is rarely a good idea,
but people often grab the scrub tarball in a hurry, and having to deal with
a an external package dependency (especially on non-linux platforms)
could be a problem. So...do we really want a config file?

> 3. Allow removal of multiple regular files on the command line.
> 4. Allow use of Wildcards to select multiple files

The shell would handle expanding a command line with wildcards in it,
or did you have something else in mind here? Maybe a --recursive
option would be useful...

> 5. When Multiple Files selected, force a one time interactive mode
> requiring a positive YES response
> 6. Consider adding an option to override interactive prompt (i.e. accept
> all mode)

I'm not a fan of confirmation, could we do without that one?

> Architecturally, this won't change the program much.
>
> 1. Conf files or ENV files would be loaded prior to reading the command
> line. Command line will override all.
> 2. After all options are read from argv, all additional argv items would
> be considered files
> 1. If a special file (i.e. a device), stop there and scrub the device.
> 2. For each additional argv, scrub files one at a time
> 1. If argv is a regular file, then process
> 2. If argv is a wildcartd, then glob it and process file list one
> at a time
> 3. dirent function will need some review since multiple files will
> cause a name collision
>
> Any feedback is appreciated.

I'd validate the whole command line before scrubbing anything.

Suggestion: add unit tests to exercise new functionality and keep commits
tight (one feature each).

> Peter

Thanks!

Regards,

Jim

Peter Hyman

unread,
Aug 6, 2014, 12:58:11 PM8/6/14
to diskscru...@googlegroups.com
Good Suggestions.

I respect that you are against interactive. That's why I suggested a quiet setting. Do you really want a scrub * to run unimpeded?
Recursion is a good idea, but really dangerous!!!!!
CONF files are easy to parse and could be tailored at the user level. I've done this before. It would have to be user created, otherwise other settings would remain. Any system /etc/conf file would be commented out.
I was nervous about shell expansion since it would grab everything potentially, directory names, sockets, Sure, we can evaluate each one using internal functions. It would save me time for sure ! I am not familiar with ALL shells though so this would require careful testing.

Unit tests would isolate each new feature and validate it.

I'll start the framework soon and test snippets as we go.

Do I need to be a member of the project to have git access or even a server clone?

Jim Garlick

unread,
Aug 6, 2014, 1:24:36 PM8/6/14
to diskscru...@googlegroups.com
Hi Peter,

You shouldn't need special repo access. You can either create your
own clone of the project on google code:
Follow the tabs: "source" -> "clones" -> "create a clone".

then post here when you're ready for review. Or you can just 'git clone'
the repo privately and send patches to the list for review. Whatever
works best for you.

I'd suggest doing the multiple file change first, get that reviewed, then
the config stuff. I'm still on the fence on the config file so saving
that towards the end might be a good idea.

Yeah, I'm good with scrub * running unimpeded, like rm * :-)
I agree a recursive option is probably too scary - and people can always

find . -type f|xargs scrub [options]

I'd avoid internal globbing - it's the shell's job.
One thing you could do to keep life simple and avoid changing scrub's
arguments too much is insist that all files on the command line be
the same type. So you could scrub a glob of block devices or
a glob of regular files, or a glob of directories, but not a
directory full of different types of objects.

Jim
> --
> You received this message because you are subscribed to the Google Groups "diskscrub-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to diskscrub-disc...@googlegroups.com.
> To post to this group, send email to diskscru...@googlegroups.com.
> Visit this group at http://groups.google.com/group/diskscrub-discuss.
> For more options, visit https://groups.google.com/d/optout.

Peter Hyman

unread,
Aug 6, 2014, 2:02:46 PM8/6/14
to diskscru...@googlegroups.com
Sounds like a plan. The intent was NOT to change any arguments to existing code, but to add a loop surrounding file/device processing and feed one at a time. We can defer the ENV/conf stuff and agree to disagree :-). But this is your project!

I am FOR not having recursion. That is too uncontrolled.
I am really AGAINST the idea of globbing NON regular files. scrub /dev/sda should be fine. The idea of scrub /dev/sd? could spell disaster, albeit slowly :-).

peter@tommyiv:/tmp$ ./test /dev/sd??
 argc = 10, ./test
10, /dev/sda1
10, /dev/sda2
10, /dev/sda3
10, /dev/sda4
10, /dev/sda5
10, /dev/sda6
10, /dev/sda7
10, /dev/sda8
10, /dev/sda9

Jim Garlick

unread,
Aug 6, 2014, 2:16:38 PM8/6/14
to diskscru...@googlegroups.com
As there's no way to prevent the shell from expanding globs before
scrub sees them, I'm not sure there's much to be done about it if
you are going to accept more than one target on the command line.

It seems to me the best you can do is look at the list of files you
are given very carefully and reject any that are scary...

Jim
> > an email to diskscrub-disc...@googlegroups.com <javascript:>.
> > > To post to this group, send email to diskscru...@googlegroups.com
> > <javascript:>.

Peter Hyman

unread,
Aug 7, 2014, 4:30:44 AM8/7/14
to diskscru...@googlegroups.com
I've made good progress, however, some decisions have to be made.

In scrub.c, while in file handling, there are many 
exit(1)

bailouts for errors. In a loop of multiple files, this can't happen. I work around this using a variable that is set on error
retcode=1

and just continue the while loop
while( filecounter< argc ) {
   
do stuff
   
if any error {
      retcode
=1
     
goto nextfile
   
}

   scrub file
   scrub directories
/* if requested */
   unlink file      
/* if requested */

nextfile
:
   filecounter
++     /* bump counter, next file */
} /* end while */

done:
if retcode print some warning about prior errors

I believe scrub.c is the only one with hardcoded exit(1). This should be avoided. Errors will be classified into Fatal and Non-Fatal categories. A Fatal error would be any type of file error, i/o, etc. A Non-Fatal one would be incompatible options and file type. In out case, with multiple files, 
neither should end the program, but should clean up any memory or variable settings, bail out of any while, switch, if structure, and continue.

One thing that may present a problem is lack of an error log file. Since we will process multiple files, and errors print to stderr along with all the other file handling and scrubbing messages going to stdout, specific errors will be lost due to scrolling. But an error log file itself might be scrubbed, so I am not sure what to do about this.

Comments?

Peter Hyman

unread,
Aug 7, 2014, 8:59:00 AM8/7/14
to diskscru...@googlegroups.com
I correct myself. the scrub() function is called from many places and has some hard exits in it. Other than ENOSPC, returning a FALSE due to any other error would force a lot of rewrite to calling functions. And some, like malloc errors really should be unrecoverable. Here's a stripped down version of the exits in scrub() (with duplicates omitted). I think these fails have to fail. I'm getting ready to start testing.....

static bool
scrub(char *path, off_t size, const sequence_t *seq, int bufsize, 
      bool Sopt, bool sparse, bool enospc)
{

    if (!(buf = alloc_buffer(bufsize))) {
        fprintf(stderr, "%s: out of memory\n", prog);
        exit(1);
    }

    if (initrand() < 0) {
        fprintf (stderr, "%s: initrand: %s\n", prog, strerror(errno));
        exit(1);
    }

                if (churnrand() < 0) {
                    fprintf(stderr, "%s: churnrand: %s\n", prog, strerror(errno));
                    exit(1);
                }
                written = fillfile(path, size, buf, bufsize, 
                                   (progress_t)progress_update, p, 
                                   (refill_t)genrand, sparse, enospc);
                if (written == (off_t)-1) {
                    fprintf(stderr, "%s: %s: %s\n", prog, path, strerror(errno));
                    exit(1);
                }

                checked = checkfile(path, written, buf, bufsize, 
                                    (progress_t)progress_update, p, sparse);
                if (checked == (off_t)-1) {
                    fprintf(stderr, "%s: %s: %s\n", prog, path, strerror(errno));
                    exit(1);
                }
                if (checked < written) {
                    fprintf(stderr, "%s: %s: verification error\n", prog, path);
                    exit(1);
                }

        if (written < size) {
            assert(i == 0);
            assert(enospc == true);
            isfull = true; 
            size = written;
            if (size == 0) {
                fprintf(stderr, "%s: file system is full (0 bytes written)\n", prog);
                break;
            }
        }
    }
    if (!Sopt) {
        if (writesig(path) < 0) {
            fprintf(stderr, "%s: writing signature to %s: %s\n", prog, path, strerror (errno));
            exit (1);
        }
    }


.

Jim Garlick

unread,
Aug 7, 2014, 11:48:41 AM8/7/14
to diskscru...@googlegroups.com
Peter,

You might check out
http://code.google.com/p/diskscrub/source/browse/libscrub/libscrub.c
http://code.google.com/p/diskscrub/issues/detail?id=17

We were transitioning functions into a library which should be exit-free.
I believe the state is it builds conditionally, and has a toy driver.
It is currently parallel to the code used by the scrub mainline.
The plan was to transition the main over to use the library but we stalled.
See issue 17 for more detail.

Jim

Jim Garlick

unread,
Aug 7, 2014, 11:51:42 AM8/7/14
to diskscru...@googlegroups.com
Also: there's a couple commits to the library that neeed to be merged in
this clone:

http://code.google.com/r/venpatnala-diskscrub/

I can merge them later but you might want to pull them into your
development branch.

Jim

Peter Hyman

unread,
Aug 7, 2014, 1:00:29 PM8/7/14
to diskscru...@googlegroups.com
I think, for now, I will stay with the monolithic program. Fewer moving parts. I don't want to have to manage two versions - one monolithic, and one with external library. Let's get one thing working. Because, if code is accepted, it may impact library.

Peter Hyman

unread,
Aug 11, 2014, 10:41:24 PM8/11/14
to diskscru...@googlegroups.com
See Issue 30 for complete patchset

Peter Hyman

unread,
Aug 13, 2014, 7:36:51 AM8/13/14
to diskscru...@googlegroups.com
I have written a short file, CAVEATS, that explains how multi-file processing occurs and which errors are fatal and not fatal (the program will continue). It shows examples of how to preserve program output by using 2>error.log and 1>scrub.log.


On Wednesday, August 6, 2014 11:13:20 AM UTC-4, Peter Hyman wrote:
CAVEATS.gz

Jim Garlick

unread,
Aug 13, 2014, 11:06:18 AM8/13/14
to diskscru...@googlegroups.com
Hi Peter,
I just wanted to touch base and let you know that I haven't had a
chance to review your changes but they are much appreciated. I hope
to get to them towards the end of the week.
Regards,
JIm

Peter Hyman

unread,
Aug 13, 2014, 4:50:21 PM8/13/14
to diskscru...@googlegroups.com
I understand. I just wanted to get it out of the way and test. It's pretty cool watching files get scrubbed one after the other and seeing errors when they occur. Almost all the changes are in the stanza at the end of the main function.

Good luck in the review.
Reply all
Reply to author
Forward
0 new messages