Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How to re-write this bash script in Python?

74 views
Skip to first unread message

sutan...@gmail.com

unread,
Jul 30, 2015, 2:32:02 PM7/30/15
to
#!/bin/bash

_maillist='pa...@email.com'
_hname=`hostname`
_logdir=/hadoop/logs
_dirlog=${_logdir}/directory_check.log

_year=$(date -d "-5 hour" +%Y)
_month=$(date -d "-5 hour" +%m)
_day=$(date -d "-5 hour" +%d)
_hour=$(date -d "-5 hour" +%H)

_hdfsdir=`hdfs dfs -ls -d /hadoop/flume_ingest_*/$_year/$_month | awk '{print $8}'`

echo "Checking for HDFS directories:" > ${_dirlog}
echo >> ${_dirlog}

for _currdir in $_hdfsdir
do
hdfs dfs -ls -d $_currdir/$_day/$_hour &>> ${_dirlog}
done

if [[ `grep -i "No such file or directory" ${_dirlog}` ]];
then
echo "Verify Flume is working for all servers" | mailx -s "HDFS Hadoop Failure on Flume: ${_hname}" -a ${_dirlog} ${_maillist}
fi

Mark Lawrence

unread,
Jul 30, 2015, 3:36:53 PM7/30/15
to pytho...@python.org
Read the documentation here https://docs.python.org/3/ and then run up
your favourite editor and start typing. When and if you hit problems
come back with a snippet of code that shows the problem, what you
expected to happen, what actually happened, and the full traceback if
there is one. Please use cut and paste to ensure that you get the data
correct.

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

rand...@fastmail.us

unread,
Jul 30, 2015, 4:17:33 PM7/30/15
to pytho...@python.org
On Thu, Jul 30, 2015, at 14:31, sutan...@gmail.com wrote:
> _year=$(date -d "-5 hour" +%Y)
> _month=$(date -d "-5 hour" +%m)
> _day=$(date -d "-5 hour" +%d)
> _hour=$(date -d "-5 hour" +%H)

What is the purpose of the -5 hour offset? Is it an attempt to
compensate for timezones?

Chris Angelico

unread,
Jul 31, 2015, 3:47:27 AM7/31/15
to pytho...@python.org
> --
> https://mail.python.org/mailman/listinfo/python-list

There are two basic approaches to this kind of job.

1) Go through every line of bash code and translate it into equivalent
Python code. You should then have a Python script which blindly and
naively accomplishes the same goal by the same method.

2) Start by describing what you want to accomplish, and then implement
that in Python, using algorithmic notes from the bash code.

The second option seems like a lot more work, but long-term it often
isn't, because you end up with better code. For example, bash lacks
decent timezone support, so I can well believe random832's guess that
your five-hour offset is a simulation of that; but Python can do much
better work with timezones, so you can get that actually correct.
Also, file handling, searching, and text manipulation and so on can
usually be done more efficiently and readably in Python directly than
by piping things through grep and awk.

ChrisA

Cameron Simpson

unread,
Jul 31, 2015, 4:16:22 AM7/31/15
to pytho...@python.org
On 31Jul2015 17:47, Chris Angelico <ros...@gmail.com> wrote:
>On Fri, Jul 31, 2015 at 4:31 AM, <sutan...@gmail.com> wrote:
>> #!/bin/bash
[...]
>> _year=$(date -d "-5 hour" +%Y)
>> _month=$(date -d "-5 hour" +%m)
[...]
>For example, bash lacks
>decent timezone support, so I can well believe random832's guess that
>your five-hour offset is a simulation of that; but Python can do much
>better work with timezones, so you can get that actually correct.

Actually, bash has no timezone support but the date command _does_, and
probably neither better nor worse than Python. All one has to do is set the TZ
environment variable, eg (untested):

_year_gmt=$( TZ=GMT date +%Y )

>Also, file handling, searching, and text manipulation and so on can
>usually be done more efficiently and readably in Python directly than
>by piping things through grep and awk.

Again, depends a bit on the data. But in the general case probably true.

Cheers,
Cameron Simpson <c...@zip.com.au>

Chris Angelico

unread,
Jul 31, 2015, 4:26:56 AM7/31/15
to pytho...@python.org
On Fri, Jul 31, 2015 at 6:15 PM, Cameron Simpson <c...@zip.com.au> wrote:
>> For example, bash lacks
>> decent timezone support, so I can well believe random832's guess that
>> your five-hour offset is a simulation of that; but Python can do much
>> better work with timezones, so you can get that actually correct.
>
>
> Actually, bash has no timezone support but the date command _does_, and
> probably neither better nor worse than Python. All one has to do is set the
> TZ environment variable, eg (untested):
>
> _year_gmt=$( TZ=GMT date +%Y )

That's assuming that it's converting against the current system
timezone. I don't know how you'd use `date` to convert between two
arbitrary timezones. But anyway, still justification to rewrite from
original spec rather than reimplementing the five-hour hack.

ChrisA

Thomas 'PointedEars' Lahn

unread,
Jul 31, 2015, 6:39:07 AM7/31/15
to
[X-Post & F'up2 comp.unix.shell]

Chris Angelico wrote:

> On Fri, Jul 31, 2015 at 6:15 PM, Cameron Simpson <c...@zip.com.au> wrote:
>> Actually, bash has no timezone support but the date command _does_, and
>> probably neither better nor worse than Python. All one has to do is set
>> the TZ environment variable, eg (untested):
>>
>> _year_gmt=$( TZ=GMT date +%Y )
>
> That's assuming that it's converting against the current system
> timezone. I don't know how you'd use `date` to convert between two
> arbitrary timezones. […]

With POSIX date(1), ISTM all you could do is set the system time and for an
additional invocation the TZ variable accordingly for output.

<http://pubs.opengroup.org/onlinepubs/9699919799/utilities/date.html>

With GNU date(1):

$ (tz_source="Asia/Dubai"; time_source="$(LC_TIME=C TZ=$tz_source date -d
"today 00:00 UTC+4" -Im)"; tz_target="America/Chicago"; echo "When it was
$time_source in $tz_source, it was $(LC_TIME=C TZ=$tz_target date -d
"$time_source") in $tz_target.")
When it was 2015-07-31T00:00+0400 in Asia/Dubai, it was Thu Jul 30 15:00:00
CDT 2015 in America/Chicago.

$ date --version
date (GNU coreutils) 8.23
[…]

:)

--
PointedEars

Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.

Grant Edwards

unread,
Jul 31, 2015, 10:26:47 AM7/31/15
to
On 2015-07-31, Chris Angelico <ros...@gmail.com> wrote:

> There are two basic approaches to this kind of job.
>
> 1) Go through every line of bash code and translate it into
> equivalent Python code. You should then have a Python script which
> blindly and naively accomplishes the same goal by the same method.

In my experience, that works OK for C (with a little post-translation
tweaking and re-factoring). But, it's a pretty lousy method for bash
scripts. There are a lot of things that are trivial in Python and
complex/hard in bash (and a few vice versa), so a direct translation
usually turns out to be a mess. You end up with a lot of Python code
where only a couple lines are really needed. You also end up doing
things in a bizarre manner in Python because the simple, easy, right
way wasn't supported by bash.

> 2) Start by describing what you want to accomplish, and then
> implement that in Python, using algorithmic notes from the bash code.
>
> The second option seems like a lot more work, but long-term it often
> isn't, because you end up with better code.

And the code works. :)

For bash, I really recommend 2)

--
Grant Edwards grant.b.edwards Yow! GOOD-NIGHT, everybody
at ... Now I have to go
gmail.com administer FIRST-AID to my
pet LEISURE SUIT!!

Chris Angelico

unread,
Jul 31, 2015, 10:53:17 AM7/31/15
to pytho...@python.org
On Sat, Aug 1, 2015 at 12:26 AM, Grant Edwards <inv...@invalid.invalid> wrote:
> On 2015-07-31, Chris Angelico <ros...@gmail.com> wrote:
>
>> There are two basic approaches to this kind of job.
>>
>> 1) Go through every line of bash code and translate it into
>> equivalent Python code. You should then have a Python script which
>> blindly and naively accomplishes the same goal by the same method.
>
> In my experience, that works OK for C (with a little post-translation
> tweaking and re-factoring). But, it's a pretty lousy method for bash
> scripts. There are a lot of things that are trivial in Python and
> complex/hard in bash (and a few vice versa), so a direct translation
> usually turns out to be a mess. You end up with a lot of Python code
> where only a couple lines are really needed. You also end up doing
> things in a bizarre manner in Python because the simple, easy, right
> way wasn't supported by bash.

Right. The two techniques I suggested can be generalized to any
language pair, but some work better this way than others do. Shell
scripts are something of a special case, because they're massively
optimized toward running other programs and piping output into input,
which applications languages like Python are not as good at; so the
naive transformation leads to code that goes to ridiculous lengths to
invoke five subprocesses and move data between them, where a more
intelligent approach might invoke one process, and then do the rest in
Python code. The trouble is, you really need to know what your code is
doing, because the non-naive transformation generally has a different
set of assumptions. For instance, the OP's shell script calls on the
'mailx' command. What's it do? Presumably it sends an email... well,
Python can do that. But what if the mailx command on this host has
been carefully configured to pass mail along via a specific relay
host, and that direct access on port 25 has been blocked? How would
you know? So it's not just a matter of translating the script, you
have to know its execution environment as well.

>> 2) Start by describing what you want to accomplish, and then
>> implement that in Python, using algorithmic notes from the bash code.
>>
>> The second option seems like a lot more work, but long-term it often
>> isn't, because you end up with better code.
>
> And the code works. :)
>
> For bash, I really recommend 2)

Yeah. You remove the ability for environmental changes to unexpectedly
affect the script, which is often a feature and not a bug.

ChrisA

Akira Li

unread,
Jul 31, 2015, 8:40:56 PM7/31/15
to pytho...@python.org
Here's a corresponding Python code. I haven't seen the beginning of the
discussion. I apologize if it has been already posted:

#!/usr/bin/env python
from datetime import datetime
import pytz # $ pip install pytz

source_tz, target_tz = map(pytz.timezone, ['Asia/Dubai', 'America/Chicago'])
d = datetime.now(source_tz) # the current time in source_tz timezone
midnight = source_tz.localize(datetime(d.year, d.month, d.day), is_dst=None)
fmt = "%Y-%m-%dT%H:%M:%S%z"
print("When it was {:{fmt}} in {}, it was {:{fmt}} in {}".format(
midnight, source_tz.zone, target_tz.normalize(midnight),
target_tz.zone, fmt=fmt))

Output:

When it was 2015-08-01T00:00:00+0400 in Asia/Dubai, it was
2015-07-31T15:00:00-0500 in America/Chicago





Cameron Simpson

unread,
Jul 31, 2015, 11:31:45 PM7/31/15
to pytho...@python.org
On 31Jul2015 18:26, Chris Angelico <ros...@gmail.com> wrote:
>On Fri, Jul 31, 2015 at 6:15 PM, Cameron Simpson <c...@zip.com.au> wrote:
>>> For example, bash lacks
>>> decent timezone support, so I can well believe random832's guess that
>>> your five-hour offset is a simulation of that; but Python can do much
>>> better work with timezones, so you can get that actually correct.
>>
>>
>> Actually, bash has no timezone support but the date command _does_, and
>> probably neither better nor worse than Python. All one has to do is set the
>> TZ environment variable, eg (untested):
>>
>> _year_gmt=$( TZ=GMT date +%Y )
>
>That's assuming that it's converting against the current system
>timezone. I don't know how you'd use `date` to convert between two
>arbitrary timezones.

If date supports it I'd think one could use its -d option:

utc=$( TZ=src_zone date -d source-time-spec -u +%Y%m%dT%H%M%SZ )

then:

tz2=$( TZ=dst_zone date -d "$utc" )

Untested, but seems tractable.

>But anyway, still justification to rewrite from
>original spec rather than reimplementing the five-hour hack.

Yes indeed.

Cheers,
Cameron Simpson <c...@zip.com.au>
0 new messages