result.append("%s[%s]%s" % (lastprefix, tails, lastsuffix))
not
result.append("%s[%s]%s"%(lastprefix,tails,lastsuffix))
Similarly for many of the assignments and expressions.
I see that you initialize count3 at line 39 but don't use it until
line 123. I'd suggest that you move the initialization closer to its
use.
I think that you should move the call to open closer to where you're
going to write to the file. In fact, you can do the thing a bit
neater with a context manager like this:
#------Print the dial-peers to file----
with open(x, 'w') as o:
for line in catlist2:
figureDpn = count3 + 1000
dpn = str(figureDpn)
label = "dial-peer voice " + dpn
o.write(label)
o.write('\n')
...
Note that if you use this approach you don't need a separate call to close().
Also, you can do all the writing in one call to write(), something like this:
o.write(
label + '\n' +
"description *** local outbound dialpeer ***" + '\n' +
destpatt + '\n' +
"port " + p + '\n'
"forward-digits 7" if line[0:3] == y and q == "y" else ""
'\n'
)
Which keeps the eye focused on the data being written (at least for me).
--
Gerald Britton
> y = str(y)
> z = str(z)
> p = str(p)
> pagedef = ("http://www.localcallingguide.com/xmllocalprefix.php?npa="
+ y + "&nxx=" + z)
> print "Querying", pagedef
>
> #------Get info from NANPA.com ----------
> urllib2.install_opener(opener)
> page = urllib2.urlopen(pagedef)
> soup = BeautifulSoup(page)
> soup = str(soup)
>
> #------Parse Gathered Data----------
> for line in npaReg.findall(soup):
> npalist.insert(count, line)
> count = count + 1
>
> for line2 in nxxReg.findall(soup):
> nxxlist.insert(count2, line2)
> count2 = count2 + 1
>
enumerate will help you here:
for count, line in enumerate(npaReg.findall(soup)):
npalist.insert(count, line)
for count2, line2 in enumerate(nxxReg.findall(soup)):
nxxlist.insert(count2, line2)
> #-----Sort, remove duplicates, concatenate the last digits for
similiar NPA/NXX ------
> for makenewlist in range(0, count):
> sortlist.append(npalist.pop(0) + nxxlist.pop(0))
>
> sortlist.sort()
>
> for sortednumber in sortlist:
> if sortednumber not in sortedlist:
> sortedlist.append(sortednumber)
>
If you're going to sort them anyway (so you don't need to preserve the
existing order), the easiest way to remove duplicates is to use a set:
sortedlist = sorted(set(sortlist))
[snip]
>> for line2 in nxxReg.findall(soup):
>> nxxlist.insert(count2, line2)
>> count2 = count2 + 1
>>
> enumerate will help you here:
> for count2, line2 in enumerate(nxxReg.findall(soup)):
> nxxlist.insert(count2, line2)
An insert() at the end of a list is usually spelt append() in Python ;)
If you are looking for something less baroque
nxxlist = nxxReg.findall(soup)
will do, too.
This can alternatively be done with map():
sortlist = map(lambda x,y: x+y, npalist, nxxlist)
However, I'm not sure what your code is meant to do if the two lists
have differing numbers of elements (if the two regexps turn up
differing numbers of results). If you can assume that this won't
happen, the simple map call will do the job.
(It would have been a lot cleaner if Python exposed its operators as
functions. In Pike, that lambda would simply be `+ (backtick-plus).)
Chris Angelico
from operator import add
Ah yes, I forgot that module. Yep, that cleans up the code a bit.
ChrisA
> I've been working on this for 3 weeks as a project while I'm learning
> python. It's ugly, only function in there is from a fellow lister, but
> it works perfectly and does what it is intended to do.
>
> I'd love to hear comments on how I could improve this code, what would
> be good to turn into a function/class etc.
>
> # The function of this script is to gather user data from the user to
> # build appropriate local dial-peers in a cisco voice gateway.
> # Data gathered from the user:
> # name of file to write dial-peers to
> # NPA
> # NXX
> # outbound port to send calls
> # question if home npa local calls are 7 digit
> # script navigates to nanpa.com, gathers the local calling area
> # and returns the data
> # the data is converted to beautiful soup (I had a problem with the xml
> format)
> # the data is parsed for the proper NPA/NXXs
> # list is sorted, duplicates removed
> # data from list is formatted and printed to the file specified
>
>
> #--------Import dependencies-----------
> import urllib2
> from BeautifulSoup import BeautifulSoup
> import re
>
> #--------Define variables------------
> count = 0
> count2 = 0
> count3 = 0
> npalist = []
> nxxlist = []
> sortlist = []
> sortedlist = []
> npaReg = re.compile('(?<=<npa>)(...)(?=</npa>)')
> nxxReg = re.compile('(?<=<nxx>)(...)(?=</nxx>)')
> proxy = urllib2.ProxyHandler({'http': 'proxy.com:8080'})
> # --actual proxy was removed for confidentiality--
> opener = urllib2.build_opener(proxy)
>
> #----- function to concatenate numbers - thanks to Chris Angelico -----
> def combine(list_of_numbers, position):
> lastprefix=tails=lastsuffix=None
> result=[]
> for cur in list_of_numbers:
> prefix=cur[:position]; tail=cur[position];
> suffix=cur[position+1:]
> if prefix!=lastprefix or suffix!=lastsuffix:
> if lastprefix!=None:
> if len(tails)>1:
> result.append("%s[%s]%s"%(lastprefix,tails,lastsuffix))
> else: result.append(lastprefix+tails+lastsuffix)
> lastprefix,tails,lastsuffix=prefix,"",suffix
> tails+=tail
> if lastprefix!=None:
> if len(tails)>1:
> result.append("%s[%s]%s"%(lastprefix,tails,lastsuffix))
> else: result.append(lastprefix+tails+lastsuffix)
> return result
>
>
> #-------Gather info from user--------
> x = raw_input("please enter a filename: ")
> y = raw_input("please enter the npa: ")
> z = raw_input("please enter the nxx: ")
> p = raw_input("please enter the port: ")
> q = raw_input("Is home npa local dialing 7 digit?(y/n) ")
> #print x
>
> #-------Modify user data-------------
> o = open(x, 'w')
> y = str(y)
> z = str(z)
> p = str(p)
> pagedef = ("http://www.localcallingguide.com/xmllocalprefix.php?npa=" +
> y + "&nxx=" + z)
> print "Querying", pagedef
>
> #------Get info from NANPA.com ----------
> urllib2.install_opener(opener)
> page = urllib2.urlopen(pagedef)
> soup = BeautifulSoup(page)
> soup = str(soup)
>
> #------Parse Gathered Data----------
> for line in npaReg.findall(soup):
> npalist.insert(count,line)
> count = count + 1
>
> for line2 in nxxReg.findall(soup):
> nxxlist.insert(count2,line2)
> count2 = count2 + 1
>
> #-----Sort, remove duplicates, concatenate the last digits for similiar
> NPA/NXX ------
> for makenewlist in range(0,count):
> sortlist.append(npalist.pop(0) + nxxlist.pop(0))
>
> sortlist.sort()
>
> for sortednumber in sortlist:
> if sortednumber not in sortedlist:
> sortedlist.append(sortednumber)
>
> catlist = combine(sortedlist,5)
> catlist2 = combine(catlist,4)
>
> #------Print the dial-peers to file----
> for line in catlist2:
> figureDpn = count3 + 1000
> dpn = str(figureDpn)
> label = "dial-peer voice " + dpn
> o.write(label)
> o.write('\n')
> o.write("description *** local outbound dialpeer ***")
> o.write('\n')
> destpatt = "destination-pattern %s...." % line.rstrip()
> o.write(destpatt)
> o.write('\n')
> port = "port " + p
> o.write(port)
> o.write('\n')
> if line[0:3] == y and q == "y":
> o.write("forward-digits 7")
> o.write('\n')
> o.write('\n')
> count3 = count3 + 1
>
> o.close()
- Most important: read or reread the tutorial. It should give you an idea
about the basics like adding items to a list.
- Use self-evident variable names
- Avoid global variables
- Prefer docstrings over comments
- Split your code into small chunks that do one thing well, i. e. functions.
At the top-level your script could look like this
def main():
args = read_arguments()
url = make_url(args.npa, args.nxx)
page = download_page(url)
npa_nxx_pairs = scrape_page(page)
clusters = combine_pairs(npa_nxx_pairs)
with open(args.destination_file) as out:
write_data(out, clusters, args.npa, args.port, args.seven_digit)
if __name__ == "__main__"
main()
This makes dependencies clear, avoids globals, and reduces the need for
comments. You can test separate functionality separately. NB: As I have no
expertise in the problem domain the variable names I used above are a bit
too abstract.
- Convert flags to boolean early. Bad:
seven_digit = raw_input(...)
...
if seven_digit == "y": ...
Better:
seven_digit = raw_input(...) == "y"
...
if seven_digit: ...
- Relentlessly eliminate dead code. Example: page = str(BeautifulSoup(page))
is almost a no-op. You don't need BeautifulSoup at all!
- Don't comment out code, remove it. Rely on version control if you think
that you may need it again later.
- The page you are scraping is xml. Regular expressions are not the best
tool to deal with that. Here's a sample implementation of scrape_page()
using lxml and xpath:
from lxml import etree
def scrape_page(page):
root = etree.fromstring(page)
npas = root.xpath("//prefix/npa/text()")
nxxs = root.xpath("//prefix/nxx/text()")
pairs = (npa + nxx for npa, nxx in zip(npas, nxxs))
return sorted(set(pairs))
An xpath expert might be able to simplify that.
- I prefer to pass script arguments on the commandline rather than answering
a sequence of questions interactively. With that approach get_arguments()
becomes something like
import argparse
def get_arguments():
parser = argparse.ArgumentParser()
parser.add_argument("npa")
parser.add_argument("nxx")
parser.add_argument("destfile", nargs="?")
parser.add_argument("--port", default="1234")
parser.add_argument("-7", "--seven-digit", action="store_true")
return parser.parse_args()
Sadly the instructive help="..." parameters are all missing...