Bash Scripting Help

19 views
Skip to first unread message

Jonathan

unread,
Sep 6, 2022, 12:12:40 PM9/6/22
to Unknown, TangerineSDR Listserv
Hi All,

I need some assistance in writing a bash script that will run as a cron job to archive VLF data. The data is recorded in hour-long chunks as files. I have a cleardown script that will cleardown the oldest files and always maintain a rotating buffer, so as the newest file is being recorded, the oldest file is being deleted. What I need now is a way to locate the oldest file, encode it to flac using vlfrx-tools vtflac utility, then delete the original file. Here is an example:

find locating the oldest files. There is often more than one and the naming convention is YYMMDD-HHMMSS. Running the following:

find /data/vlf_32k -type f -mtime +50
 
prints:

/data/vlf_32k/220904-144416
/data/vlf_32k/220904-150000
...
/data/vlf_32k/220904-230000

Next, we need to encode those files to flac to the archive directory:

vtflac -e /data/vlf_32k/220904-144416 > /mnt/archive/vlf_32k_flac/220904-144416.fx
vtflac -e /data/vlf_32k/220904-150000 > /mnt/archive/vlf_32k_flac/220904-150000.fx
...
vtflac -e /data/vlf_32k/220904-230000 > /mnt/archive/vlf_32k_flac/220904-230000.fx

Finally, delete the original files:

rm /data/vlf_32k/220904-144416
rm /data/vlf_32k/220904-150000
...
rm /data/vlf_32k/220904-230000

I haven't done a lot of bash scripting, so I'm asking if anyone here may have experience with this. I need the above commands to be executed by a bash script so it'll complete those commands no matter how many files that the find utility will find. Sometimes, there will be no files the find utility locates and the script should just exit. It'll be executed every hour by cron. Any help or assistance will be appreciated!

Thank you!

Jonathan
KC3EEY 

Rob Wiesler

unread,
Sep 6, 2022, 12:55:50 PM9/6/22
to ham...@googlegroups.com
Hi, Jonathan.
If you're going to be doing this kind of thing a lot, I recommend
spending some time with GNU parallel. That said, I've attached a script
that does what you asked for. After making my script executable, your
find(1) invocation should look like:

find /data/vlf_32k -type f -mtime +50 -exec /path/to/transcode-and-archive /mnt/archive/vlf_32k_flac/ {} +

Notes:
- This will handle any number of input files with no issues. If there
are too many input files for your system's argument limit, find(1)
will execute my script multiple times as appropriate.
- I don't have vtflac installed, so I don't know if it will choke on
filenames starting with '-', or if it supports the -- convention. You
probably won't care, but if you do, you may need to make some minor
modifications to how vtflac is invoked in my script.
- My script is interruption-proof. If the process or system dies for
any reason, it *may* leave a partial or corrupted .fx.part file in the
archive directory, but it will never delete the source file until the
corresponding .fx file is written to disk. If there are .fx.part
files in the archive directory after an interrupted run, the next run
(unless interrupted too early) will fix that. Additionally, contrary
to your exact script outline above, it will remove each source file
individually once the destination file is written, so if getting
interrupted is common the script will still make progress on whatever
backlog it has if it can.

-- Rob Wiesler (AC8YV)
transcode-and-archive

Robert McGwier

unread,
Sep 10, 2022, 11:50:43 AM9/10/22
to ham...@googlegroups.com, TangerineSDR Listserv
#!/bin/bash
dirs=($(find /tmp/test -type d))
for dir in "${dirs[@]}"; do
  cd "$dir"
  ls -pt | grep -v / | tail -n +4 | xargs rm -f
done
replace /tmp/test with your directory name.

--
Please follow the HamSCI Community Participation Guidelines at http://hamsci.org/hamsci-community-participation-guidelines.
---
You received this message because you are subscribed to the Google Groups "HamSCI" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hamsci+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hamsci/CAOY0kB3J3HX_8fFgNBgxzmCE2C3aVvPQO%3DNa54TiZX8RK23U-Q%40mail.gmail.com.


--
Dr. Robert W McGwier, Ph.D.
Adjunct Faculty, Virginia Tech
Affiliated Faculty, University of Scranton
ARDC Member of Board
N4HY: ARRL, TAPR, AMSAT, EARC, CSVHFS
Sky: AAVSO, Sky360, Auburn A.S., Skyscrapers

Rob Wiesler

unread,
Sep 10, 2022, 1:44:42 PM9/10/22
to ham...@googlegroups.com, TangerineSDR Listserv
>> <https://groups.google.com/d/msgid/hamsci/CAOY0kB3J3HX_8fFgNBgxzmCE2C3aVvPQO%3DNa54TiZX8RK23U-Q%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>
> #!/bin/bash
> dirs=($(find /tmp/test -type d))
> for dir in "${dirs[@]}"; do
> cd "$dir"
> ls -pt | grep -v / | tail -n +4 | xargs rm -f
> done
> replace /tmp/test with your directory name.

Okay, no, this will break.

First, $dirs will break if any directory in /tmp/test has whitespace in
its name. This is one of the reasons I didn't answer with a standalone
script - bash and POSIX shell scripts have difficulty iterating over
filenames safely. This *particular* issue comes from your find(1)
invocation, but once you fix that you'll find yourself trying to make
sure the read builtin works properly with null delimiters, and it's just
cleaner not to do that.

Second, don't just blindly cd. cd has options, and since your dirs var
may contain fractional directory names due to the aforementioned bug,
you may end up in the home directory just before that rm -f . A
directory named "/tmp/test/ -L" will cause exactly that to happen. Use
the -- convention to force all subsequent arguments to be treated as
filenames. Normally this wouldn't be a problem because find(1) will
ensure that all the files it prints will start with a known prefix, but
you should get into the habit anyway.

Third, *any* use of ls ... | [ tail ... | ] xargs ... is a bug for lots
of reasons. If you really want to delete a bunch of files in a
directory, use find(1) again. That means something like:

find -maxdepth 1 -type f -exec rm -f {} +

... and I continue to assert that deleting input files in batches is
wrong. Delete them incrementally when you finish processing them (which
is skipped in this script anyway).

Fourth, you could have replaced *all* of this script with this
"one"-liner, which is both safe and clean (and at least as correct as
before):

#!/bin/sh
find /tmp/test -type f -exec rm -f {} +

Fifth, it's not clear to me at all that it's correct to recurse into
subdirectories in the first place.


The most important thing, though, is whether or not Jonathan has gotten
something to work, which he has so far declined to mention.

--
73,
Rob Wiesler (AC8YV)

emuman100

unread,
Nov 6, 2022, 7:42:06 PM11/6/22
to HamSCI
Thanks All! I really appreciate everyone's help on this.

Rob,

I was able to test your script. It works as intended! The file name will always be some "date-time" or "date"and begin with a number. vtflac will accept any file name, so there should not be any issue.

I run "find /data/vlf_96k -type f -mtime +45 -exec /path/to/transcode-and-archive /mnt/data/vlf/vlf_96k_flac {} +" to compress raw data and clear down the oldest files. Using vtflac, the 96k raw data that consumes 1.3GB/hr compresses down to 297MB/hr.

I really appreciate the help on this! Thank you for your contribution!

Jonathan
KC3EEY
Reply all
Reply to author
Forward
0 new messages