I am one of those who prefer bash over anything else for system scripting. Bash is minimal and I love this about it. In bash, one deals with only text although some primitive data-structures such as arrays are also available. Bash is one of those things which proves that much can be accomplished by sticking to few tricks. But scripting is much more than just managing and administering systems.
Python scripts are easy to maintain and write. They are easy to write because Python comes with large standard library. It also has great support for reading and parsing text files, easy to use data-structures, numerical recopies, binding to legacy C-programs. Moreover, a huge user community has created a great deal of work which can be easily plugged.
This much verbiage was to motivate. Let's see some code so we can do at least two-three things after reading it. We'll do some basic things in
Python. First thing first.
Also read a bit about the most preferred coding-style in the language of your choice.
Readability of your code matters a lot: whether it is your team-mates or poor TAs.
Scripts are also no exception even if you are the only one who is using it. Though writing comments in code is a fact everyone preach and few does well in practice.
One can write a single script containing everything or break it into many
scripts. The latter is always recommended if your script is larger than few
hundreds of lines. But if you want to write a portable script which you can pass
around, writing a single scripts is not a such a bad idea. Who cares even if a
script has few thousand lines but it is working correctly and you are the only
one who is responsible for maintaining it. Functions should not be large. If a function is more than 50 lines long, try to break it into two.
Notice the first line in these two scripts. Prefer first one on linux or Mac.
#!/usr/bin/env python
'''
This script can turn water into milk.
'''
def turnWaterIntoMilk():
print("I dont know how to do it, yet!")
return None
turnWaterIntoMilk()
and
#!/usr/bin/python
'''
This script can turn water into milk.
'''
def turnWaterIntoMilk():
print("I dont know how to do it, yet!")
return None
turnWaterIntoMilk()
Do not forget adding if __name__ == "__main__"
to your file.
This is how a minimal script will look like.
#!/usr/bin/env python
'''
This script can turn water into milk.
'''
def turnWaterIntoMilk():
print("I dont know how to do it, yet!")
return None
if __name__ == "__main__":
turnWaterIntoMilk()
try/catch
Beg forgiveness or ask permission compare with Look Before You Leap.
To read (write) a file, prefer
with open("filename.txt", "r") as f:
txt = f.read()
doSomething(txt)
over the following,
txt = open("filename.txt", "r").read()
doSomething(txt)
The keyword with
takes care of closing the file once everything is read or
written to the file. If you want to read line by line in a file, this is handy.
with open("filename.txt", "r") as f:
for line in f:
doSomethingWithLine(line)
Once a text-file is read, one process it contents. Some of the more basic operations on strings are listed below.
line = "Happiness can't buy me money"
tokens = line.split()
print tokens
This will print ['Happiness', "can't", 'buy', 'me', 'money']
. If function
split
is given a substring, the string will be break at given substring.
token = line.split("ne")
This gives ['Happi', "ss can't buy me mo", 'y']
. Notice that substring is
removed. If substring is not found, it does the obvious. Try it. More complex
string manipulation can be done using regular expressions. They are powerful
tools and it is very easy to write very inefficient regular expressions. It
would be a good project to write a function which minimizes a giver regular
expressions (state minimization in state-machine).
Say you want to replace all 'ness' with 'less' in a string.
newLine = line.replace('ness', 'less')
newLine
is now "Happiless can't buy me money"
. One can even chains these
operations.
newLine = line.replace('p', 't').replace('e', 'o')
newLine
is Hattinoss can't buy mo monoy
. Actually
str.translate
is a much better way to do this.
subprocess
Let's take an example: use pandoc
to convert a markown to html. The shell
command is following.
pandoc -f markdown -t html file.markdown > file.html
Equivalent in Python using subprocess
is following. One can also some
shorter version of this command. I prefer this because I get much finer control
over the system command.
This is a rather involved example. Mostly one runs a command and captures its output and decide what to do next depending on the output.
with open("file.markdown", "r") as f:
markdown = f.read()
cmd = ["pandoc", "-f", "markdown", "-t", "html"]
p = subprocess.Popen(cmd
, stdin = subprocess.PIPE
, stdout = subprocess.PIPE
)
# Write mardown content into stdin
p.stdin.write(markdown)
# Recieve html from stdout
html = p.communicate()[0]
Its not a easy topic to introduce in a small post. I don't know why I am even touching it.
Regular expression are like language grammar (perhaps less powerful): they accept or reject a given string. This is very useful for making sense out of user input or commands. They can also be used for simple parsing. You can read about it here. Understanding regex needs some reading and getting it work in any language is not always easy. If one can debug a wrong regex, one can debug almost anything.
I want to show it in action. If it does not make sense at all, then do read about it.
Suppose a user gives me a line suppose to be a valid voltage source line from a spice script. The grammar (roughly) is following (case insensitive).
v by name of device e.g. V1, va, vn1 etc. This is followed by name of its terminals and then its type (ac or dc). This is followed by its value in decimal (ignore 1e-5 syntax for now). Immediately after the value there is a character: k for kilo, m or mili, M or mega etc. For example.
line = "v1 in1 out1 dc 0.1k"
Following is a regular expression which will accept a valid string and reject a bad one.
import re
pat = re.compile(r'v(?P<name>\w+)\s+(?P<in>\w+)\s+(?P<out>\w+)\s+(?P<type>ac|dc)\s+(?P<value>\d+(\.\d+)?)\w', re.I)
m = pat.match(line)
if m:
print("This matches. Matched values are")
print(m.groupdict())
else:
print("Given line is not a valid spice line for volatage source")
raise UserWarning("Bad input")
This prints
This matches. Matched values are
{u'in': u'in1', u'type': u'dc', u'name': u'1', u'value': u'0.1', u'out': u'out1'}
Well, since we are here let's make some sense out of regex. Let's remove
'?P<some_name>' from the regex, it is used to name a subsection of regex so we
can access it later. The regex is now v(\w+)\s+(\w+)\s+(\w+)\s+(ac|dc)\s+(\d+(\.\d+)?)\w'
.
Symbol \w
means 'a alphanumeric character or the underscore' +
means one or
more time i.e. \w+
means 'a alphanumeric character or the underscore, one or
more times. Therefore v\w+
means character v
followed by one or more
alphanumeric character or underscore. Next we have \s+
which is one or more
whitespaces etc. (ac|dc) means either string ac
or dc
. \d
means a number. (\.\d+)
means
a literal .
followed by one or more number e.g. .99
, .001
but it is
postfixed by ?
which make it optional. In nutshell if it can match 1.22
as
well as 1
(.22 is optional). Phew!
PS: If you are here. Check out this summer school on "Computational approach to memory and plasticity" http://www.ncbs.res.in/camp/. Some of you might find worth your while.
- Dilawar