babynames exercise

398 views
Skip to first unread message

Yu Li

unread,
Nov 2, 2011, 10:32:10 AM11/2/11
to python-g...@googlegroups.com
Hi,
 
  I am learning python through the google python course and currently work on all the exercises too. for the babyname exercise. I got part A worked well. How ever when I try to run part B. Basically, generate .summary file, I always got an error saying that: invalid mode <'rU'> or filename: 'baby*.html '.
I tried the program in the solution and same thing happened.
 
  The operation system is windows 7 and I am runing it through MS command prompt. I use Notepad ++ for editing. Anyone know what is going on here? Many many thanks.

Robert Mandič

unread,
Nov 8, 2011, 2:58:19 PM11/8/11
to python-g...@googlegroups.com
Hello,

sorry for late post - had to get my hands on a windows machine with python cause I wasn't certain about my answer but now I am :D
The system doesn't know how to expand wildcard characeter "*" - therefore it's trying to open file named baby*.html.

If you have a linux/bsd/solaris OS available you can try it there and you'll see it works.
--
Lp, Robert

Yu Li

unread,
Nov 8, 2011, 3:13:58 PM11/8/11
to python-g...@googlegroups.com
Thanks. I posted to another forum and someone suggested me to use glob.glob. it works. Thanks for your reply.

Robert Mandič

unread,
Nov 8, 2011, 3:27:16 PM11/8/11
to python-g...@googlegroups.com
I see.
In case someone browses this forum in the future:

  import glob
  for arg in args:
    for filename in glob.glob(arg):
      filedata = extract_names(filename)

On Tue, Nov 8, 2011 at 9:13 PM, Yu Li <yul...@gmail.com> wrote:
Thanks. I posted to another forum and someone suggested me to use glob.glob. it works. Thanks for your reply.



--
Lp, Robert

J T Gillich

unread,
May 23, 2014, 11:47:33 PM5/23/14
to python-g...@googlegroups.com
Hi, I followed your suggestion below and I can now run the command in windows but it is only creating a summary file for the last file  in the directory. Any idea why?

Robert Mandić

unread,
May 24, 2014, 3:24:45 AM5/24/14
to python-g...@googlegroups.com
​Actually it's creating summary file for every file but it overwrites it for every iteration ... that's how you only see the summary for the last file.​


--
You received this message because you are subscribed to the Google Groups "Python GCU Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-gcu-for...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Lp, Robert

J T

unread,
May 24, 2014, 9:28:49 AM5/24/14
to python-g...@googlegroups.com
Thanks for the reply. I am not sure how I write it to do every file? I am very new to programming.

Robert Mandić

unread,
May 25, 2014, 2:32:40 AM5/25/14
to python-g...@googlegroups.com
Can u provide us with your code?

J T

unread,
May 25, 2014, 9:25:52 AM5/25/14
to python-g...@googlegroups.com
Here is my code.

#!/usr/bin/python
# Copyright 2010 Google Inc.
# Licensed under the Apache License, Version 2.0

# Google's Python Class

import sys
import re
   
def extract_names(filename):
   
#+++your code here+++
# LAB(begin solution)
# The list [year, name_and_rank, name_and_rank, ...] we'll eventually return.
    names = []
    
# Open and read the file.
    f = open(filename, 'rU')
    text = f.read()
# Could process the file line-by-line, but regex on the whole text
# at once is even easier.

# Get the year.
    year_match = re.search(r'Popularity\sin\s(\d\d\d\d)', text)
    if not year_match:
# We didn't find a year, so we'll exit with an error message.
        sys.stderr.write('Couldn\'t find the year!\n')
        sys.exit(1)
    year = year_match.group(1)
    names.append(year)
    
# Extract all the data tuples with a findall()
# each tuple is: (rank, boy-name, girl-name)
    tuples = re.findall(r'<td>(\d+)</td><td>(\w+)</td><td>(\w+)</td>', text)
    # print tuples

# Store data into a dict using each name as a key and that
# name's rank number as the value.
# (if the name is already in there, don't add it, since
# this new rank will be bigger than the previous rank).
    names_to_rank =  {}
    for rank_tuple in tuples:
        (rank, boyname, girlname) = rank_tuple  # unpack the tuple into 3 vars
        if boyname not in names_to_rank:
            names_to_rank[boyname] = rank
        if girlname not in names_to_rank:
            names_to_rank[girlname] = rank
    sorted_names = sorted(names_to_rank.keys())
  
    for name in sorted_names:
        names.append(name + " " + names_to_rank[name])

    return names
    
    # LAB(replace solution)
    # return
    # LAB(end solution)
    
    
def main():
   # This command-line parsing code is provided.
   # Make a list of command line arguments, omitting the [0] element
   #which is the script itself.
    args = sys.argv[1:]
    
    if not args:
        print 'usage: [--summaryfile] file [file ...]'
        sys.exit(1)

   # Notice the summary flag and remove it from args if it is present.
    summary = False
    if args[0] == '--summaryfile':
        summary = True
        del args[0]

  # +++your code here+++
  # For each filename, get the names, then either print the text output
  # or write it to a summary file
  # LAB(begin solution
  
    #for filename in args:
        #names = extract_names(filename)
        #text = '\n'.join(names)
    import glob
    for arg in args:
        for filename in glob.glob(arg):
            #print 'filename is', filename
            filedata = extract_names(filename)
            
            
            text = '\n'.join(filedata) + '\n'
            #print 'this is the text', text 
        
  # Make text out of the whole list
        
        if summary:
            outf = open(filename + '.summary', 'w')
            outf.write(text + '\n')
            outf.close()
        else:
            print text
  # LAB(end solution)

if __name__ == '__main__':
    main()

Robert Mandić

unread,
May 25, 2014, 11:07:25 AM5/25/14
to python-g...@googlegroups.com
This is working as intended:

Running:
$ python tst.py --summaryfile baby1990.html baby2002.html

Created:
$ ls -1 *.summary
baby1990.html.summary
baby2002.html.summary

Reply all
Reply to author
Forward
0 new messages