Google Groups Home
Help | Sign in
Comparing Similar Strings
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  6 messages - Collapse all
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
Adam  
View profile
 More options May 9, 9:28 pm
Newsgroups: comp.lang.smalltalk
From: Adam <a...@crumptons.org>
Date: Fri, 9 May 2008 18:28:04 -0700 (PDT)
Local: Fri, May 9 2008 9:28 pm
Subject: Comparing Similar Strings
Ideally, I am looking for an elegant way of comparing two non-
identical strings. I want to find the longest substring of stringOne
that is the #sameAs any substring of stringTwo, what that substring
is, and have some indicator of how similar the two strings are to each
other (like 68% similar) . My first few attempts at writing this
method have sprawled out well beyond anything I could call worthy of
ST style and grace. Besides its inherent ugliness, I haven't even got
it fully functional yet.
Basically I am just taking the substrings of stringOne, from largest
to smallest, and comparing them to the substrings of stringTwo,
largest to smallest. I am making the substrings by removing characters
#at: index whatever . It is very tedious and ugly. There must be a
better way.

thanks, Adam


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Adam  
View profile
 More options May 9, 9:30 pm
Newsgroups: comp.lang.smalltalk
From: Adam <a...@crumptons.org>
Date: Fri, 9 May 2008 18:30:18 -0700 (PDT)
Local: Fri, May 9 2008 9:30 pm
Subject: Re: Comparing Similar Strings
I left out that I am using VW 7.5.

Adam


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
andrewgi...@aol.com  
View profile
 More options May 9, 10:04 pm
Newsgroups: comp.lang.smalltalk
From: andrewgi...@aol.com
Date: Fri, 9 May 2008 19:04:58 -0700 (PDT)
Local: Fri, May 9 2008 10:04 pm
Subject: Re: Comparing Similar Strings

> I want to find the longest substring of stringOne
> that is the #sameAs any substring of stringTwo, what that substring
> is, and have some indicator of how similar the two strings are to each
> other (like 68% similar) .

Hi Adam,
I have not implemented this, but might suggest checking 'The Algorithm
Design Manual', S. Skiena,  ISBN 0-387-94860-0. Among the algorithms
is pg 422-424. It discusses and input of a set, S, of strings S1...Sn.
To answer, what the longest string S' such that for each Si, 1<=i<=n,
the characters of S appear as a subsequence of Si ? There is an
accompanying CD with source and links to implementations. There are
also related problems discussed as approximate string matching, and
shortest commmon superstring.
Hope this point you in the rigth direction.
ACG

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Adam  
View profile
 More options May 10, 12:39 pm
Newsgroups: comp.lang.smalltalk
From: Adam <a...@crumptons.org>
Date: Sat, 10 May 2008 09:39:39 -0700 (PDT)
Local: Sat, May 10 2008 12:39 pm
Subject: Re: Comparing Similar Strings
On May 9, 9:04 pm, andrewgi...@aol.com wrote:

thanks. I found a copy online. It was (and still is) a little over my
head. I waded through it last night until about midnight, went to
sleep, woke up, and rewrote my method like this:
findLongestSubstringBetween: S and: T
        | substring l t i n m test |
        l := S size.
        t := 0.
        i := 1.
        [l > 0] whileTrue:
                        [[t < i] whileTrue:
                                        [n := t + 1.
                                        m := t + l.
                                        substring := S copyFrom: n to: m.
                                        test := T findString: substring startingAt: 1.
                                        test = 0 ifFalse: [^substring].
                                        t := t + 1].
                        l := l - 1.
                        i := i + 1.
                        t := 0]
        ^nil

Besides being relatively short, it actually finds the longest
substring. Right now this method resides in a class I made, but I
wonder if it go somewhere else. Any thoughts?

Adam


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Stefan Schmiedl  
View profile
 More options May 10, 1:15 pm
Newsgroups: comp.lang.smalltalk
From: Stefan Schmiedl <s...@xss.de>
Date: Sat, 10 May 2008 19:15:53 +0200
Local: Sat, May 10 2008 1:15 pm
Subject: Re: Comparing Similar Strings
On Sat, 10 May 2008 09:39:39 -0700 (PDT)

I'd either use S as receiver and define
        String>>findLongestSubstringIn: T
or use T as receiver and define
        String>>findLongestSubstringOf: S

"Between ... and ..." sounds very symmetrical in its arguments to me,
while your code searches for the longest substring of S in T. IMO, it's
a better match to your implementation.

s.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
daredevil  
View profile
 More options May 12, 8:56 am
Newsgroups: comp.lang.smalltalk
From: daredevil <hithac...@gmail.com>
Date: Mon, 12 May 2008 05:56:52 -0700 (PDT)
Local: Mon, May 12 2008 8:56 am
Subject: Re: Comparing Similar Strings
On May 10, 9:39 pm, Adam <a...@crumptons.org> wrote:

Elegant solution.

Below code is same as yours. I just made it more Smalltalkish(I hope).

findLongestSubstringBetween: S and: T
        | length subString |
        length := S size.
        length to: 1 by: -1 do: [:l |
                0 to: length - l do: [:t |
                        subString := S copyFrom: t + 1 to: t + l.
                        (T findString: subString startingAt: 1) ~= 0 ifTrue:
[^subString] ]].
        ^nil

---
Hiren


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2008 Google