Python for scripting

22 views

Skip to first unread message

Dilawar Singh

unread,

Mar 22, 2014, 6:12:59 PM3/22/14

to wncc...@googlegroups.com

I am one of those who prefer bash over anything else for system scripting. Bash is minimal and I love this about it. In bash, one deals with only text although some primitive data-structures such as arrays are also available. Bash is one of those things which proves that much can be accomplished by sticking to few tricks. But scripting is much more than just managing and administering systems.

Python scripts are easy to maintain and write. They are easy to write because Python comes with large standard library. It also has great support for reading and parsing text files, easy to use data-structures, numerical recopies, binding to legacy C-programs. Moreover, a huge user community has created a great deal of work which can be easily plugged.

This much verbiage was to motivate. Let's see some code so we can do at least two-three things after reading it. We'll do some basic things in Python. First thing first.

Configure editor

Replace tab character in your editor with 4 or 3 spaces.
If you can break a line at column-width 80, do it. Don't ask why! Never make a line bigger than 100 characters.
Check if you editor is suitable for programming. If you are still to come to grip with vim/emacs etc then checkout kate, gedit, spyder, eclipse etc.

Also read a bit about the most preferred coding-style in the language of your choice. Readability of your code matters a lot: whether it is your team-mates or poor TAs. Scripts are also no exception even if you are the only one who is using it. Though writing comments in code is a fact everyone preach and few does well in practice.

Writing a standalone script

One can write a single script containing everything or break it into many scripts. The latter is always recommended if your script is larger than few hundreds of lines. But if you want to write a portable script which you can pass around, writing a single scripts is not a such a bad idea. Who cares even if a script has few thousand lines but it is working correctly and you are the only one who is responsible for maintaining it. Functions should not be large. If a function is more than 50 lines long, try to break it into two.

Notice the first line in these two scripts. Prefer first one on linux or Mac.

#!/usr/bin/env python
'''
This script can turn water into milk.
'''
def turnWaterIntoMilk():
    print("I dont know how to do it, yet!")
    return None

turnWaterIntoMilk()

and


#!/usr/bin/python
'''
This script can turn water into milk.
'''

def turnWaterIntoMilk():
    print("I dont know how to do it, yet!")
    return None

turnWaterIntoMilk()

Do not forget adding if __name__ == "__main__" to your file.

This is how a minimal script will look like.

#!/usr/bin/env python
'''
This script can turn water into milk.
'''
def turnWaterIntoMilk():
    print("I dont know how to do it, yet!")
    return None

if __name__ == "__main__":
    turnWaterIntoMilk()

Prefer `try/catch`

Beg forgiveness or ask permission compare with Look Before You Leap.

Working with text files

To read (write) a file, prefer

with open("filename.txt", "r") as f:
    txt = f.read()
doSomething(txt)

over the following,

txt = open("filename.txt", "r").read()
doSomething(txt)

The keyword with takes care of closing the file once everything is read or written to the file. If you want to read line by line in a file, this is handy.

with open("filename.txt", "r") as f:
    for line in f:
        doSomethingWithLine(line)

Manipulating string

Once a text-file is read, one process it contents. Some of the more basic operations on strings are listed below.

Break a string at a substring

line = "Happiness can't buy me money"
tokens = line.split()
print tokens

This will print ['Happiness', "can't", 'buy', 'me', 'money']. If function split is given a substring, the string will be break at given substring.

token = line.split("ne")

This gives ['Happi', "ss can't buy me mo", 'y']. Notice that substring is removed. If substring is not found, it does the obvious. Try it. More complex string manipulation can be done using regular expressions. They are powerful tools and it is very easy to write very inefficient regular expressions. It would be a good project to write a function which minimizes a giver regular expressions (state minimization in state-machine).

Replacing substring

Say you want to replace all 'ness' with 'less' in a string.

newLine = line.replace('ness', 'less')

newLine is now "Happiless can't buy me money". One can even chains these operations.

newLine = line.replace('p', 't').replace('e', 'o')

newLine is Hattinoss can't buy mo monoy. Actually str.translate is a much better way to do this.

Call system command using `subprocess`

Let's take an example: use pandoc to convert a markown to html. The shell command is following.

pandoc -f markdown -t html file.markdown > file.html

Equivalent in Python using subprocess is following. One can also some shorter version of this command. I prefer this because I get much finer control over the system command.

This is a rather involved example. Mostly one runs a command and captures its output and decide what to do next depending on the output.

with open("file.markdown", "r") as f:
    markdown = f.read()

cmd = ["pandoc", "-f", "markdown", "-t", "html"]
p = subprocess.Popen(cmd
        , stdin = subprocess.PIPE
        , stdout = subprocess.PIPE
        )
# Write mardown content into stdin
p.stdin.write(markdown)
# Recieve html from stdout
html = p.communicate()[0]

Regular expressions

Its not a easy topic to introduce in a small post. I don't know why I am even touching it.

Regular expression are like language grammar (perhaps less powerful): they accept or reject a given string. This is very useful for making sense out of user input or commands. They can also be used for simple parsing. You can read about it here. Understanding regex needs some reading and getting it work in any language is not always easy. If one can debug a wrong regex, one can debug almost anything.

I want to show it in action. If it does not make sense at all, then do read about it.

Suppose a user gives me a line suppose to be a valid voltage source line from a spice script. The grammar (roughly) is following (case insensitive).

v by name of device e.g. V1, va, vn1 etc. This is followed by name of its terminals and then its type (ac or dc). This is followed by its value in decimal (ignore 1e-5 syntax for now). Immediately after the value there is a character: k for kilo, m or mili, M or mega etc. For example.

line = "v1 in1 out1 dc 0.1k"

Following is a regular expression which will accept a valid string and reject a bad one.

import re
pat = re.compile(r'v(?P<name>\w+)\s+(?P<in>\w+)\s+(?P<out>\w+)\s+(?P<type>ac|dc)\s+(?P<value>\d+(\.\d+)?)\w', re.I)
m = pat.match(line)
if m:
    print("This matches. Matched values are")
    print(m.groupdict())
else:
    print("Given line is not a valid spice line for volatage source")
    raise UserWarning("Bad input")

This prints

This matches. Matched values are
{u'in': u'in1', u'type': u'dc', u'name': u'1', u'value': u'0.1', u'out': u'out1'}

Well, since we are here let's make some sense out of regex. Let's remove '?P<some_name>' from the regex, it is used to name a subsection of regex so we can access it later. The regex is now v(\w+)\s+(\w+)\s+(\w+)\s+(ac|dc)\s+(\d+(\.\d+)?)\w'. Symbol \w means 'a alphanumeric character or the underscore' + means one or more time i.e. \w+ means 'a alphanumeric character or the underscore, one or more times. Therefore v\w+ means character v followed by one or more alphanumeric character or underscore. Next we have \s+ which is one or more whitespaces etc. (ac|dc) means either string ac or dc. \d means a number. (\.\d+) means a literal . followed by one or more number e.g. .99, .001 but it is postfixed by ? which make it optional. In nutshell if it can match 1.22 as well as 1 (.22 is optional). Phew!

PS: If you are here. Check out this summer school on "Computational approach to memory and plasticity" http://www.ncbs.res.in/camp/. Some of you might find worth your while.