Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Problem loading a file of words
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  13 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
teoryn  
View profile  
 More options Jul 24 2005, 11:44 pm
Newsgroups: comp.lang.python
From: "teoryn" <teo...@gmail.com>
Date: 24 Jul 2005 20:44:08 -0700
Local: Sun, Jul 24 2005 11:44 pm
Subject: Problem loading a file of words
I've been spending today learning python and as an exercise I've ported
a program I wrote in java that unscrambles a word. Before describing
the problem, here's the code:

*--beginning of file--*
#!/usr/bin/python
# Filename: unscram.py

def sort_string(word):
        '''Returns word in lowercase sorted alphabetically'''
        word = str.lower(word)
        word_list = []
        for char in word:
                word_list.append(char)
        word_list.sort()
        sorted_word = ''
        for char in word_list:
                sorted_word += char
        return sorted_word

print 'Building dictionary...',

dictionary = { }

# Notice that you need to have a file named 'dictionary.txt'
# in the same directory as this file. The format is to have
# one word per line, such as the following (of course without
# the # marks):

#test
#hello
#quit
#night
#pear
#pare

f = file('dictionary.txt')

# This loop builds the dictionary, where the key is
# the string after calling sort_string(), and the value
# is the list of all 'regular' words (from the dictionary,
# not sorted) that passing to sort_string() returns the key

while True:
        line = f.readline()
        if len(line) == 0:
                break
        line = str.lower(line[:-1]) # convert to lowercase just in case
and
                                    # remove the return at the end of
the line
        sline = sort_string(line)
        if sline in dictionary:     # this key already exist, add to
existing list
                dictionary[sline].append(line)
                print 'Added %s to key %s' % (line,sline) #for testing
        else:                       # create new key and list
                dictionary[sline] = [line]
                print 'Created key %s for %s' % (sline,line) #for
testing
f.close()

print 'Ready!'

# This loop lets the user input a scrambled word, look for it in
# dictionary, and print all matching unscrambled words.
# If the user types 'quit' then the program ends.
while True:
        lookup = raw_input('Enter a scrambled word : ')

        results = dictionary[sort_string(lookup)]

        for x in results:
                print x,

        print

        if lookup == 'quit':
                break
*--end of file--*

If you create dictionary.txt as suggested in the comments, it should
work fine (assumeing you pass a word that creates a valid key, I'll
have to add exceptions later). The problem is when using a large
dictionary.txt file (2.9 MB is the size of the dictionary I tested) it
always gives an error, specifically:
(Note: ccehimnostyz is for zymotechnics, which is in the large
dictionary)

*--beginning of example--*
Enter a scrambled word : ccehimnostyz
Traceback (most recent call last):
  File "unscram.py", line 62, in ?
    results = dictionary[sort_string(lookup)]
KeyError: 'ccehimnostyz'
*--end of example--*

If you'd like a copy of the dictionary I'm using email me at teoryn at
gmail dot com or leave your email here and I'll send it to you (It's
702.2 KB compressed)

Thanks,
Kevin


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Devan L  
View profile  
 More options Jul 25 2005, 12:14 am
Newsgroups: comp.lang.python
From: "Devan L" <dev...@gmail.com>
Date: 24 Jul 2005 21:14:33 -0700
Local: Mon, Jul 25 2005 12:14 am
Subject: Re: Problem loading a file of words

Heh, it reminds me of the code I used to write.

def sort_string(word):
    return ''.join(sorted(list(word.lower())))
f = open('dictionary.txt','r')
lines = [line.rstrip('\n') for line in f.readlines()]
f.close()
dictionary = dict((sort_string(line),line) for line in lines)
lookup = ''
while lookup != 'quit':
    lookup = raw_input('Enter a scrambled word:')
    if dictionary.has_key(lookup):
        word = dictionary[lookup]
    else:
        word = 'Not found.'
    print word

You need python 2.4 to use this example.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Robert Kern  
View profile  
 More options Jul 25 2005, 12:19 am
Newsgroups: comp.lang.python
From: Robert Kern <rk...@ucsd.edu>
Date: Sun, 24 Jul 2005 21:19:34 -0700
Local: Mon, Jul 25 2005 12:19 am
Subject: Re: Problem loading a file of words

Devan L wrote:
> Heh, it reminds me of the code I used to write.

> def sort_string(word):
>     return ''.join(sorted(list(word.lower())))
> f = open('dictionary.txt','r')
> lines = [line.rstrip('\n') for line in f.readlines()]
> f.close()
> dictionary = dict((sort_string(line),line) for line in lines)

That's definitely not the kind of dictionary that he wants.

--
Robert Kern
rk...@ucsd.edu

"In the fields of hell where the grass grows high
  Are the graves of dreams allowed to die."
   -- Richard Harter


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Terrance N. Phillip  
View profile  
 More options Jul 25 2005, 12:19 am
Newsgroups: comp.lang.python
From: "Terrance N. Phillip" <mediocre_per...@hotmail.com>
Date: Sun, 24 Jul 2005 23:19:48 -0500
Local: Mon, Jul 25 2005 12:19 am
Subject: Re: Problem loading a file of words
Kevin,
        I'm pretty new to Python too. I'm not sure why you're seeing this
problem... is it possible that this is an "out-by-one" error? Is
zymotechnics the *last* word in dictionary.txt? Try this slightly
simplified version of your program and see if you have the same problem....

def sort_string(word):
     '''Returns word in lowercase sorted alphabetically'''
     return "".join(sorted(list(word.lower())))

dictionary = {}
f = open('/usr/bin/words') # or whatever file you like
for line in f:
         sline = sort_string(line[:-1])
         if sline in dictionary:
                 dictionary[sline].append(line)
         else:
                 dictionary[sline] = [line]
f.close()

lookup = raw_input('Enter a scrambled word : ')
while lookup:
         try:
             results = dictionary[sort_string(lookup)]
             for x in results:
                 print x,
             print
         except:
             print "?????"
         lookup = raw_input('Enter a scrambled word : ')

Good luck,

Nick.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Robert Kern  
View profile  
 More options Jul 25 2005, 12:18 am
Newsgroups: comp.lang.python
From: Robert Kern <rk...@ucsd.edu>
Date: Sun, 24 Jul 2005 21:18:02 -0700
Local: Mon, Jul 25 2005 12:18 am
Subject: Re: Problem loading a file of words

An idiomatic Python 2.4 version of this function would be:

def sort_string(word):
     word = word.lower()
     sorted_list = sorted(word)
     sorted_word = ''.join(sorted_list)
     return sorted_word

# this really should all be within a function, but let's just carry on
dictionary = {}
f = open('dictionary.txt')
try:
     # enclose this in a try: finally: block in case something goes wrong
     for line in f:
         line = line.strip().lower()
         sline = sort_string(line)
         val = dictionary.setdefault(sline, [])
         val.append(line)
         print "Added %s to key %s" % (line, sline)
finally:
     f.close()

Well, my version works (using /usr/share/dict/words from Debian as
dictionary.txt). Yours does, too. Are you sure that you are using the
right dictionary.txt?

--
Robert Kern
rk...@ucsd.edu

"In the fields of hell where the grass grows high
  Are the graves of dreams allowed to die."
   -- Richard Harter


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Devan L  
View profile  
 More options Jul 25 2005, 12:46 am
Newsgroups: comp.lang.python
From: "Devan L" <dev...@gmail.com>
Date: 24 Jul 2005 21:46:57 -0700
Local: Mon, Jul 25 2005 12:46 am
Subject: Re: Problem loading a file of words

Robert Kern wrote:
> That's definitely not the kind of dictionary that he wants.

> --
> Robert Kern
> rk...@ucsd.edu

> "In the fields of hell where the grass grows high
>   Are the graves of dreams allowed to die."
>    -- Richard Harter

Oh, I missed the part where he put values in a list.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Otten  
View profile  
 More options Jul 25 2005, 2:30 am
Newsgroups: comp.lang.python
From: Peter Otten <__pete...@web.de>
Date: Mon, 25 Jul 2005 08:30:29 +0200
Local: Mon, Jul 25 2005 2:30 am
Subject: Re: Problem loading a file of words

If 'zymotechnics' is the last line and that line is missing a trailing
newline

line[:-1]

mutilates 'zymotechnics' to 'zymotechnic'. In that case the dictionary would
contain the key 'ccehimnotyz'. Another potential problem could be
leading/trailing whitespace. Both problems can be fixed by using
line.strip() instead of line[:-1] as in Robert Kern's code.

Peter


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Steven D'Aprano  
View profile  
 More options Jul 25 2005, 8:01 am
Newsgroups: comp.lang.python
From: Steven D'Aprano <st...@REMOVETHIScyber.com.au>
Date: Mon, 25 Jul 2005 22:01:22 +1000
Local: Mon, Jul 25 2005 8:01 am
Subject: Re: Problem loading a file of words

On Sun, 24 Jul 2005 20:44:08 -0700, teoryn wrote:
> I've been spending today learning python and as an exercise I've ported
> a program I wrote in java that unscrambles a word. Before describing
> the problem, here's the code:

> *--beginning of file--*
> #!/usr/bin/python
> # Filename: unscram.py

> def sort_string(word):
>         '''Returns word in lowercase sorted alphabetically'''
>         word = str.lower(word)

It is generally considered better form to write that line as:

    word = word.lower()

>         word_list = []
>         for char in word:
>                 word_list.append(char)

If you want a list of characters, the best way of doing that is just:

    word_list = list(word)

>         word_list.sort()
>         sorted_word = ''
>         for char in word_list:
>                 sorted_word += char
>         return sorted_word

And the above four lines are best written as:

    return ''.join(word_list)

Your while-loop seems to have been mangled a little thanks to word-wrap.
In particular, I can't work out what that "and" is doing in the middle of
it.

Unless you are expecting really HUGE dictionary files (hundreds of
millions of lines) perhaps a better way of writing the above while-loop
would be:

print 'Building dictionary...',
dictionary = { }
f = file('dictionary.txt', 'r')
for line in f.readlines()
    line = line.strip()  # remove whitespace at both ends
    if line:  # line is not the empty string
        line = line.lower()
        sline = sort_string(line)
        if sline in dictionary:
            dictionary[sline].append(line)
            print 'Added %s to key %s' % (line,sline)
        else:
            dictionary[sline] = [line]
            print 'Created key %s for %s' % (sline,line)
f.close()

> print 'Ready!'

> # This loop lets the user input a scrambled word, look for it in
> # dictionary, and print all matching unscrambled words.
> # If the user types 'quit' then the program ends.
> while True:
>         lookup = raw_input('Enter a scrambled word : ')

>         results = dictionary[sort_string(lookup)]

This will fail if the scrambled word you enter is not in the dictionary.

>         for x in results:
>                 print x,

>         print

>         if lookup == 'quit':
>                 break

You probably want the test for quit to happen before printing the
"unscrambled" words.

If this error is always happening for the LAST line in the text file, I'm
guessing there is no newline after the word. So when you read the text
file and build the dictionary, you inadvertently remove the "s" from the
word before storing it in the dictionary.

--
Steven.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
teoryn  
View profile  
 More options Jul 25 2005, 9:27 am
Newsgroups: comp.lang.python
From: "teoryn" <teo...@gmail.com>
Date: 25 Jul 2005 06:27:33 -0700
Local: Mon, Jul 25 2005 9:27 am
Subject: Re: Problem loading a file of words
Thanks to everyone for all the help!

Here's the (at least for now) final script, although note I'm using
2.3.5, not 2.4, so I can't use some of the tips that were given.

#!/usr/bin/python
# Filename: unscram.py

def sort_string(word):
        '''Returns word in lowercase sorted alphabetically'''
        word_list = list(word.lower())
        word_list.sort()
        return ''.join(word_list)

print 'Building dictionary...',

dictionary = { }

f = file('/usr/share/dict/words', 'r')

for line in f.readlines():
     line = line.strip()  # remove whitespace at both ends
     if line:  # line is not the empty string
          line = line.lower()
          sline = sort_string(line)
          if sline in dictionary:
               dictionary[sline].append(line)
               #print 'Added %s to key %s' % (line,sline)
          else:
               dictionary[sline] = [line]
               #print 'Created key %s for %s' % (sline,line)
f.close()

print 'Ready!'

lookup = raw_input('Enter a scrambled word : ')
while lookup:
     try:
          results = dictionary[sort_string(lookup)]
          for x in results:
               print x,
          print
     except:
          print "?????"
     lookup = raw_input('Enter a scrambled word : ')

As for the end of the file idea, that word wasn't at the end of the
file, and there was a blank line, so that's out of the question. The
word list I was using was 272,520 words long, and I got it a while back
when doing this same thing in java, but as you can see now I'm just
using /usr/share/dict/words which I found after not finding it in the
place listed in Nick's comment.

I'm still lost as to why my old code would only work for the small
file, and another interesting note is that with the larger file, it
would only write "zzz for zzz" (or whatever each word was) instead of
"Created key zzz for zzz". However, it works now, so I'm happy.

Thanks for all the help,
Kevin


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Otten  
View profile  
 More options Jul 25 2005, 9:55 am
Newsgroups: comp.lang.python
From: Peter Otten <__pete...@web.de>
Date: Mon, 25 Jul 2005 15:55:10 +0200
Local: Mon, Jul 25 2005 9:55 am
Subject: Re: Problem loading a file of words

teoryn wrote:
> I'm still lost as to why my old code would only work for the small
> file, and another interesting note is that with the larger file, it
> would only write "zzz for zzz" (or whatever each word was) instead of
> "Created key zzz for zzz". However, it works now, so I'm happy.

Happy as long as you don't know what happened? How can that be?
Another guess then -- there may be inconsistent newlines, some "\n" and some
"\r\n":

>>> garbled = "garbled\r\n"[:-1]
>>> print "created key %s for %s" % ("".join(sorted(garbled)), garbled)

abdeglr for garbled

Peter


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
teoryn  
View profile  
 More options Jul 25 2005, 10:30 am
Newsgroups: comp.lang.python
From: "teoryn" <teo...@gmail.com>
Date: 25 Jul 2005 07:30:13 -0700
Local: Mon, Jul 25 2005 10:30 am
Subject: Re: Problem loading a file of words
I was just happy that it worked, but was still curious as to why it
didn't before. Thanks for the idea, I'll look into it and see if this
is the case.

Thanks,
Kevin


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
teoryn  
View profile  
 More options Jul 25 2005, 10:39 am
Newsgroups: comp.lang.python
From: "teoryn" <teo...@gmail.com>
Date: 25 Jul 2005 07:39:28 -0700
Local: Mon, Jul 25 2005 10:39 am
Subject: Re: Problem loading a file of words
I changed to using line = line.strip() instead of line = line [:-1] in
the original and it it worked.

Thanks!


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Hansen  
View profile  
 More options Jul 25 2005, 11:03 am
Newsgroups: comp.lang.python
From: Peter Hansen <pe...@engcorp.com>
Date: Mon, 25 Jul 2005 11:03:45 -0400
Local: Mon, Jul 25 2005 11:03 am
Subject: Re: Problem loading a file of words

teoryn wrote:
> I changed to using line = line.strip() instead of line = line [:-1] in
> the original and it it worked.

Just to be clear, these don't do nearly the same thing in general,
though in your specific case they might appear similar.

The line[:-1] idiom says 'return a string which is a copy of the
original but with the last character, if any, removed, regardless of
what character it is'.

The line.strip() idiom says 'return a string with all whitespace
characters removed from the end *and* start of the string'.

In certain cases, you might reasonably prefer .rstrip() (which removes
only from the right-hand side, or end), or even something like
.rstrip('\n') which would remove only newlines from the end.

-Peter


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »