Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Removing extraneous files from a directory mirror

11 views
Skip to first unread message

Don Kuenz

unread,
Sep 7, 2012, 11:25:19 AM9/7/12
to
My FreeBSD host contains two directories /u and /disk2/u. The latter
directory, /disk2/u ought to mirror the former directory, /u, but
extraneous files now exist on /disk2/u. This script shows what needs
to be done:

#!/bin/sh
#
# make /disk2/u a mirror image of /u by removing extraneous files.
#
cd /disk2/u
for file in `find . -type f`
do
if [ ! -f "/u/$file" ] ; then
rm /disk2/u/$file
fi
done

My gut tells me that something faster is needed and that $file is
likely to "blow up" from receiving too many names during the find
phase.

Can anybody think of a better way of doing this? By using find's
exec argument, for instance, to fully process (remove) a given
file name before moving on to the next file name?

TIA.

--
Don Kuenz

Aragorn

unread,
Sep 7, 2012, 11:30:51 AM9/7/12
to
On Friday 07 September 2012 17:25, Don Kuenz conveyed the following to
comp.unix.shell...
What about rsync? It is perfectly capable of doing the mirroring for
you.

--
= Aragorn =
(registered GNU/Linux user #223157)

Icarus Sparry

unread,
Sep 7, 2012, 11:45:16 AM9/7/12
to
Whilst rsync is a very good solution and is probably what the OP should
be using, it doesn't answer the question as asked.

#!/bin/sh
cd /disk2/u
find . -type f | while read -r file
do
if [ ! -f "/u/$file" ] ; then
rm "/disk2/u/$file"
fi
done

This assumes that no filename has no newline characters in it.

There are ways of dealing with filenames that contain newlines, but you
are much better off if you avoid them in the first place! Something like

find . -type f -exec sh -c 'for f ; do [ -f "/u/$f" ] || \
rm "/disk2/u/$f" ; done ' dummy {} +

should do it.

Don Kuenz

unread,
Sep 7, 2012, 12:15:23 PM9/7/12
to
Icarus Sparry <i.spa...@gmail.com> wrote:
> On Fri, 07 Sep 2012 17:30:51 +0200, Aragorn wrote:

>> What about rsync? It is perfectly capable of doing the mirroring for
>> you.
>
> Whilst rsync is a very good solution and is probably what the OP should
> be using,

The OP *is* using rsync, and has used rsync for years. But something
happened, the mirror got 107% full, and "rsync --delete-before" stopped
working. :(

> There are ways of dealing with filenames that contain newlines, but you
> are much better off if you avoid them in the first place! Something like
>
> find . -type f -exec sh -c 'for f ; do [ -f "/u/$f" ] || \
> rm "/disk2/u/$f" ; done ' dummy {} +
>
> should do it.

Thanks, I'll give it a try.
--
Don Kuenz

Thomas 'PointedEars' Lahn

unread,
Sep 7, 2012, 1:55:39 PM9/7/12
to
Don Kuenz wrote:

> My FreeBSD host contains two directories /u and /disk2/u. The latter
> directory, /disk2/u ought to mirror the former directory, /u, but
> extraneous files now exist on /disk2/u. This script shows what needs
> to be done:
>
> #!/bin/sh
> #
> # make /disk2/u a mirror image of /u by removing extraneous files.
> #
> cd /disk2/u
> for file in `find . -type f`
> do
> if [ ! -f "/u/$file" ] ; then
> rm /disk2/u/$file
> fi
> done
>
> My gut tells me that something faster is needed and that $file is
> likely to "blow up" from receiving too many names during the find
> phase.
>
> Can anybody think of a better way of doing this?

rsync --delete $more_options $source $destination

F'up2 comp.os.unix.shell¹

______
¹ I was wrong. There is de.comp.os.unix.apps.misc, but no
comp.os.unix.apps.misc. Short of comp.unix.shell, where
would you discuss GNU or GPL-based programs such as rsync(1)?
--
PointedEars

Twitter: @PointedEars2
Please do not Cc: me. / Bitte keine Kopien per E-Mail.

Dave Gibson

unread,
Sep 7, 2012, 3:50:54 PM9/7/12
to
Don Kuenz <gar...@crcomp.net> wrote:
> My FreeBSD host contains two directories /u and /disk2/u. The latter
> directory, /disk2/u ought to mirror the former directory, /u, but
> extraneous files now exist on /disk2/u. This script shows what needs
> to be done:
>
> #!/bin/sh
> #
> # make /disk2/u a mirror image of /u by removing extraneous files.
> #

> Can anybody think of a better way of doing this?

mtree -p /u -c | mtree -p /disk2/u -r

[ Followup-To: set to cubfm ]

Stuart Barkley

unread,
Sep 8, 2012, 2:16:23 PM9/8/12
to
On Fri, 7 Sep 2012 at 11:25 -0000, Don Kuenz wrote:

> My FreeBSD host contains two directories /u and /disk2/u. The latter
> directory, /disk2/u ought to mirror the former directory, /u, but
> extraneous files now exist on /disk2/u.

> Can anybody think of a better way of doing this?

A couple others suggested rsync. Here is a sample command that I use
to mirror my local media collection to a second drive.

rsync -n --stats -vi -aHh -P --delete /data/media/. /media/4gh-media/media/

The '-n' option enables dry-run mode where needed changes are only
reported, not performed. This allows me to double check large updates
before being applied.

I also occasionally add the '-c' option which checksums the files to
better detect any bit rot in the mirror.

If mirroring onto a FAT/NTFS drive (so the drive can be read by my TV,
etc) you may also need options "--no-g --no-p --modify-window=2".

Stuart
--
I've never been lost; I was once bewildered for three days, but never lost!
-- Daniel Boone
Message has been deleted

Don Kuenz

unread,
Sep 9, 2012, 4:52:46 PM9/9/12
to
Dave Gibson <dave.gma...@googlemail.com.invalid> wrote:

> mtree -p /u -c | mtree -p /disk2/u -r

This looked promising, but it blew up, probably because of too many
file names.

Decades ago a boss taught me that when running out of space, one gets
the most mileage by looking for the largest files. Recursive manual
invocations of

du -s *

found a rather large file by the third recursion. Removing that
file brought the mirror down to 98% capacity. Followed by a

fsck -y

to ensure integrity. Yet another invocation of rsync with its
--progress argument yielded the true problem:

michael# /usr/local/bin/rsync -aHvx --delete-before --progress --stats /u/ /disk2/u
building file list ...
ERROR: out of memory in flist_expand
rsync error: error allocating core memory buffers (code 22) at util.c(120) [sender=2.6.8]

My "plan B" is to now manually, recursively invoke rsync, working
upwards in the file hierarchy.

--
Don Kuenz

Chick Tower

unread,
Sep 10, 2012, 11:27:23 AM9/10/12
to
On 2012-09-09, Don Kuenz <gar...@crcomp.net> wrote:
> My "plan B" is to now manually, recursively invoke rsync, working
> upwards in the file hierarchy.

Is there any reason you can't just delete /disk2/u/ and recreate it to
have a duplicate of /u/?
--
Chick Tower

For e-mail: cubfm DOT sent DOT towerboy AT xoxy DOT net

Don Kuenz

unread,
Sep 10, 2012, 1:57:25 PM9/10/12
to
Chick Tower <c.t...@deadspam.com> wrote:

> Is there any reason you can't just delete /disk2/u/ and recreate it to
> have a duplicate of /u/?

Recreating /u is not a trivial task on my production server. rsync
orginally got deployed for nightly backups years ago after

/sbin/dump 0f - /u | /sbin/restore -ruf -

started taking too long (all night and well into the start of business
each morning).

Anyhow, my problem seems localized to a usenet archive. rsync now
runs just fine with that usenet archive excluded.

cd /u/archive
find . -print -depth | cpio -pdm /disk2/u/archive

ought to take care of nightly backups of the usenet archive beings
nothing ever gets deleted under it.

--
Don Kuenz
0 new messages