running sed in shell script and crontab

3 views
Skip to first unread message

shay walters

unread,
Nov 27, 2022, 5:19:22 PM11/27/22
to uc...@googlegroups.com
I have a command that works fine in an interactive shell to clean up
text from a data collection controller that has a telnet interface.
The raw data looks like this:

^M-------------------------------------------------------------------------------^@
^M ---P1--- <RoomTemp> (Room Temp)^@
^M ---P2--- <BasementTemp> (Basement Temp)^@
^M ---P3--- <OutdoorTemp> (Outdoor Temp)^@
^M^M

As you can see, it includes some CR as well as null characters. There
are also some BEL characters and some random garbage characters when
first getting connected.

I have a sed command that does a nice job cleaning up this text to
only printable characters - this is the command:

sed $'s/[^[:print:]\t]//g' /home/shay/rawdata.txt > /home/shay/output.txt

This command works fine if I type it into a bash shell. But it won't
work in a shell script file (xxx.sh) and it won't work in a crontab
entry. The sed command runs (it generates output), but the output
has not had the non-printable characters stripped from it.

To further complicate the situation, on a different computer, the
command works every time, whether at a bash prompt, in a shell script,
or in a crontab. I have "solved" the problem by doing the data
collection on the computer where things work, but I'd like to figure
out why it's failing when it does fail.

Does anyone have a suggestion on what might be going wrong?

Thanks,
-Shay

George Law

unread,
Nov 27, 2022, 5:24:12 PM11/27/22
to UCLUG
When you call sed in your script make sure you're calling sed with it's full path.

cron has a more restricted shell so what may be in your users $PATH may not be seen from cron



~George Law

--
You received this message because you are subscribed to the Google Groups "Upstate Carolina Linux Users Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to uclug+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/uclug/CAKdjT5L_FVLHAs%3DOrPEc_mgVrtuFPeV_y_GtiFPve%2BBchM636A%40mail.gmail.com.

Robert Meier

unread,
Nov 28, 2022, 4:03:01 PM11/28/22
to 'Darrell Lee' via Upstate Carolina Linux Users Group
To expand what George said, POSIX is an IEEE/ISO standard (1003.2).
Posix compliance (non-exclusively) requires an (exclusively) conforming shell addressable as /bin/sh.
A /bin/sh script should therefore run WITHOUT ALTERATION on any posix compliant system, including linux, MacOS, HPUX, solaris, BSD, z/OS, ...

/bin/sh is intolerant of any imprecision or ambiguity, so it is often used for secure code, and is the default shell for cron(8) on most systems.

POSIX is character level, bytecode independent.
^M and ^@ are ASCII bytecodes that wont work on z/OS with EBCDIC.
^letter is an ascii convention that ands the letter bytecode with 0x1f.
Posix uses the "portable character set"
^@ -> \0      ^M -> \r

Try the following on most linux machines [1]
  > echo "a^Mb" | sed 's:^M:caret-m:g;'
  # acaret-mb
  # non-posix tolerance, might give warning

  > echo "a^Mb" | sed --posix 's:^M:caret-m:g;'
  # b
  # posix intolerance of code that wont work on non-ascii systems
  
  > echo "a^Mb" | sed --posix 's:\r:caret-m:g;'
  # acaret-mb
  # posix portable character set

Often, /bin/sh is a soft link to bash or other shell, in which case, posix compliance requires the shell implicitly assume --posix.
Some distros (e.g. openSuSE) do the same if sh is called.

This century, I usually use perl rather than sed, awk, lex, yacc.
perl is only slightly less secure, its overhead is usually tolerable, and it is installed on most internet connected machines.

Frequently writing secure code for multiple systems, I've been frquently bitten.

[1] Most shells, sed, and entire linux distros can be configured for default posix conformance, but you wouldn't ask the question if your machine were so configured.



Jas Eckard

unread,
Nov 30, 2022, 1:48:37 AM11/30/22
to uc...@googlegroups.com
>>> sed $'s/[^[:print:]\t]//g' /home/shay/rawdata.txt > /home/shay/output.txt

I'm curious why you have the `$` before the script? It doesn't seem
to matter on my system whether it's there or not, but I'm just curious
why it's there.

>> When you call sed in your script make sure you're calling sed with it's full path.
>>
>> cron has a more restricted shell so what may be in your users $PATH may not be seen from cron

I agree with George, it may be that you have multiple `sed`s on the
first system, and cron's `$PATH` is probably preferring `/bin/sed`,
which may be an older, or at least non-GNU, `sed` that is interpreting
your "character class" `[;print:]` as those individual characters (or
may be stopping the "bracket expression" at the `]`). And the `$PATH`
in your interactive `bash` prefers a different `sed`, say,
`/usr/bin/sed`, which understands "character classes". When you use
the full path, like George said, you know which `sed` you're using,
because it's not determined by the order of `$PATH`.

And what Bob said made me wonder if using GNU `sed`'s
`--regexp-extended` (or `-E`) for "extended regular expression"
support may help? This breaks the POSIX portability that Bob
recommended, however, but is closer to your original command.

Bob also makes a great suggestion that perhaps using `perl` instead
would be better (don't have to worry about POSIX/older/non-GNU
versions). But that also made me go in the opposite direction to a
simpler filter: `tr`. This does what you want with the exception that
it also joins the lines (which I assume is not desired):

tr --complement --delete '[:print:]\t' <shayrawdata.txt

...and follows the mantra "Don't use `awk` when `sed` will do. Don't
use `sed` when `tr` will do." ... etc

I personally love regular expressions (I don't use "character classes"
that much), but the problem is that there are 4 different types. And
the program you're using depends on which one is preferred. When you
use PCRE (Perl-Compatible Regular Expressions) in languages like Perl,
Python, and Javascript you don't have to guess which one it is, it's
PCRE. But different versions of `tr`, `sed`, `awk`, `vi`, etc may
have had PCRE added on much later, so one of the other 4 REs is
preferred.

--Jas
.
..:
> To view this discussion on the web visit https://groups.google.com/d/msgid/uclug/CADuw60tzs1LQLoTY9YMLuaB05yKTDNd-8QVtXpdg19VYO3JqRA%40mail.gmail.com.

shay walters

unread,
Nov 30, 2022, 11:40:34 AM11/30/22
to uc...@googlegroups.com
Thanks everyone... Here's the TL;DR

As it turns out, my last iteration where I removed the dollar-sign and
specified /bin/sed got things working in the crontab, which was my
original goal.

-Shay




Here's my original wordy response... :-)

All of that was helpful, thanks. I ran a search for sed and I have
some "snap" things show up, as follows:

/usr/bin/sed
/snap/core20/1634/usr/bin/sed
/snap/core20/1695/usr/bin/sed
/snap/core18/2632/bin/sed
/snap/core18/2620/bin/sed

Oddly enough, the /bin/sed didn't show up in my "find" command (sudo
find / -name sed), even though the file is there.

I do have /bin/sed as well as /usr/bin/sed, but they are the same date
and file size. /usr/bin shows up ahead of /bin in my $PATH, so that
could explain it, except that they appear to be the same file - same
size and file date, and same md5sum also)

But it appears that there's something going on with "snap" thap I
wasn't aware of, so that's something I'll have to investigate. The
sed's that show up there are older and different sizes. But I don't
see anything in the $PATH that references those snap paths. I didn't
intentionally turn snap on, but I do remember installing something
that only had snap as a method of installation, so that may have
turned it on without me realizing it.

I'm not sure what path cron is using, and I've been bitten by that
before. I was carelessly thinking that if I included /bin/sh in my
shell script, that would use /bin/sed in the shell script, but, of
course, that's not right. I need to specify /bin/sed in both the
shell script and the crontab entry. (That's two different ways of
testing the command, I'm not currently running the shell script from
the crontab - although I have tried that in the past, too.)

As for the dollar-sign, I got the command from a web search for
removing non-printable characters and it had the dollar-sign in it, so
I copied it as it was.

Thanks again for all the suggestions.

-Shay
> To view this discussion on the web visit https://groups.google.com/d/msgid/uclug/CAC%2BRRjs41w-wNg5Wae6vJ4urwFHLZ8L4iZbfisz6a%3DKyXJbCww%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages