Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Algorithm Used in Similar() function.
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  11 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Rob  
View profile  
 More options Apr 23 2007, 12:27 pm
Newsgroups: sybase.public.sqlanywhere.general
From: "Rob" <rgiord...@myrealbox.com>
Date: 23 Apr 2007 09:27:16 -0700
Local: Mon, Apr 23 2007 12:27 pm
Subject: Algorithm Used in Similar() function.
Does anyone know the algorithm used to compare strings in the Similar()
function?  I have to create a similar function, excuse the pun, in SQL
Server.

TIA

P.S  I hate SQL Server!


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
John Smirnios  
View profile  
 More options Apr 23 2007, 1:23 pm
Newsgroups: sybase.public.sqlanywhere.general
From: John Smirnios <smirnios_at_sybase.com>
Date: 23 Apr 2007 10:23:28 -0700
Local: Mon, Apr 23 2007 1:23 pm
Subject: Re: Algorithm Used in Similar() function.
Yes. I'll send you a sample via email.

-john.

--
John Smirnios
Senior Software Developer
iAnywhere Solutions Engineering

Whitepapers, TechDocs, bug fixes are all available through the iAnywhere
Developer Community at http://www.ianywhere.com/developer


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Volker Barth  
View profile  
 More options Apr 24 2007, 8:49 am
Newsgroups: sybase.public.sqlanywhere.general
From: "Volker Barth" <No_VBarth@Spam_GLOBAL-FINANZ.de>
Date: 24 Apr 2007 05:49:06 -0700
Local: Tues, Apr 24 2007 8:49 am
Subject: Re: Algorithm Used in Similar() function.
John,

would it be possible to send the sample to me, too?
I have used SIMILAR since about 10 years to do "automated customer data
clearing" stuff, and it works quite well - in conjunction with LIKE and the
like...
I once had done tests with user-defined and external functions (e.g. with
the "Levenshtein" algorithm) but they were far too slow in contrast to
SIMILAR.

Fortunately, the algorithm seems to have been the same since V5.5...so I
would like to have a chance to look at it more closely.

TIA
Volker
(Feeling fine not to have to port that to MS SQL / ASE)

"John Smirnios" <smirnios_at_sybase.com> wrote in
news:462ceb90$1@forums-1-dub...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
John Smirnios  
View profile  
 More options Apr 24 2007, 10:58 am
Newsgroups: sybase.public.sqlanywhere.general
From: John Smirnios <smirnios_at_sybase.com>
Date: 24 Apr 2007 07:58:50 -0700
Local: Tues, Apr 24 2007 10:58 am
Subject: Re: Algorithm Used in Similar() function.
You may need to send me an email first or give me another email address.
My email to you bounced with the following message: "Mail rejected for
policy reasons."

-john.

--
John Smirnios
Senior Software Developer
iAnywhere Solutions Engineering

Whitepapers, TechDocs, bug fixes are all available through the iAnywhere
Developer Community at http://www.ianywhere.com/developer


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Steven J. Serenska  
View profile  
 More options Apr 26 2007, 12:33 pm
Newsgroups: sybase.public.sqlanywhere.general
From: "Steven J. Serenska" <sjs@RemoveTheseWordsAndTheUnderscore_trainMinder.com>
Date: 26 Apr 2007 09:33:03 -0700
Local: Thurs, Apr 26 2007 12:33 pm
Subject: Re: Algorithm Used in Similar() function.

John Smirnios wrote:
> Yes. I'll send you a sample via email.

I would like a sample as well.

Would it be possible to post this on a website somewhere (or just
include it in a response to this post)?  I'm sure others might be
interested too.

Thanks.

SJS


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Volker Barth  
View profile  
 More options Apr 27 2007, 4:55 am
Newsgroups: sybase.public.sqlanywhere.general
From: "Volker Barth" <No_VBarth@Spam_GLOBAL-FINANZ.de>
Date: 27 Apr 2007 01:55:48 -0700
Local: Fri, Apr 27 2007 4:55 am
Subject: Re: Algorithm Used in Similar() function.
John,

thanks for sending the source code! - I still have to study it in detail...

I just have read about UCA and collation tailoring in the 10.0.1 docs.

As this seems to one of your favourite topics:

In the application I talked of, we typically have to compare German names
(of persons and places).
As you may know, these may contain "umlauts" like 'ä' or special characters
like 'ß' (the "sharp s").
However, in older, restricted charsets (or in internationalized uses like
mail addresses), these umlauts have often been expanded to two characters,
e.g. 'ä' to 'ae' or 'ß' to 'ss'.
So one task we face is to have 'ä' and 'ae' to compare to be the same.

AFAIK, single-byte collations can only compare characters one by one and
therefore can not treat 'ä' and 'ae' as wanted.
Is this the same for unicode collations, or could I establish some rule to
make 'ä' and 'ae' the same?

(So far, we have solved this problem by storing both the original names and
an "normalized" form, where umlauts are expanded and everything is uppercase
and some phonetic simplifications are done (e.g. 'ph' sounds like 'f' and is
therefore normalized to 'f'). The normalized form is stored as an computed
field and is automatically calculated by an user-defined function.
Comparisons are then done on the normalized forms.
This works well with the typical German '1252LATIN1" single-byte collation.)

Any hint if UCA may give better facilities is highly appreciated...

Volker

"John Smirnios" <smirnios_at_sybase.com> schrieb im Newsbeitrag
news:462e1b2a$1@forums-1-dub...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
John Smirnios  
View profile  
 More options Apr 27 2007, 11:39 am
Newsgroups: sybase.public.sqlanywhere.general
From: John Smirnios <smirnios_at_sybase.com>
Date: 27 Apr 2007 08:39:05 -0700
Local: Fri, Apr 27 2007 11:39 am
Subject: Re: Algorithm Used in Similar() function.
In queries, UCA collations definitely use UCA to perform the comparison
in a linguistically correct way so that 'SS' = 'ß' (not sure off hand if
you need to specify the right locale/tailoring for that though).

However, the code for the "similar" function in SQLAnywhere is still
performed using a character-by-character match as seen in the code I
sent you. When scanning a two strings such as 'SS' and 'ß', it will try
to match the first 'S' with 'ß' and not find a match (since 'S' != 'ß').

If it's any consolation, the UPPER function will convert 'ß' to 'SS' if
you are using an ICU collation (again, the correct locale/tailoring may
be needed).

-john.

--
John Smirnios
Senior Software Developer
iAnywhere Solutions Engineering

Whitepapers, TechDocs, bug fixes are all available through the iAnywhere
Developer Community at http://www.ianywhere.com/developer


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
John Smirnios  
View profile  
 More options Apr 27 2007, 11:42 am
Newsgroups: sybase.public.sqlanywhere.general
From: John Smirnios <smirnios_at_sybase.com>
Date: 27 Apr 2007 08:42:29 -0700
Local: Fri, Apr 27 2007 11:42 am
Subject: Re: Algorithm Used in Similar() function.
I would love to; however, when I asked for permission a long time ago to
send out the source for "similar" I was told to include a statement to
the effect of "you are free to use it and modify it however you like but
you cannot redistribute the source without Sybase's permission". That
pretty much precludes posting the source. C'est la vie. I don't mind
emailing it to whomever asks for it.

-john.

--
John Smirnios
Senior Software Developer
iAnywhere Solutions Engineering

Whitepapers, TechDocs, bug fixes are all available through the iAnywhere
Developer Community at http://www.ianywhere.com/developer


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Volker Barth  
View profile  
 More options Apr 27 2007, 11:48 am
Newsgroups: sybase.public.sqlanywhere.general
From: "Volker Barth" <No_VBarth@Spam_GLOBAL-FINANZ.de>
Date: 27 Apr 2007 08:48:51 -0700
Local: Fri, Apr 27 2007 11:48 am
Subject: Re: Algorithm Used in Similar() function.
John,

thanks for the explanation.

So I guess I'm going to do some tests with UCA in the (not so near)
future...
...though the particular solution we are using now may still be more
appropriate to treat names like 'Stefan' and 'Stephan' (which are
mis-spelled or mixed up quite often) as equal.

Thanks again!

Volker

"John Smirnios" <smirnios_at_sybase.com> schrieb im Newsbeitrag
news:46321919$1@forums-1-dub...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
stormsena...@gmail.com  
View profile   Translate to Translated (View Original)
 More options Sep 28 2012, 7:54 am
Newsgroups: sybase.public.sqlanywhere.general
From: stormsena...@gmail.com
Date: Fri, 28 Sep 2012 04:54:12 -0700 (PDT)
Local: Fri, Sep 28 2012 7:54 am
Subject: Re: Algorithm Used in Similar() function.
пятница, 27 апреля 2007 г., 17:42:29 UTC+2 пользователь John Smirnios написал:
> I would love to; however, when I asked for permission a long time ago to
> send out the source for "similar" I was told to include a statement to
> the effect of "you are free to use it and modify it however you like but
> you cannot redistribute the source without Sybase's permission". That
> pretty much precludes posting the source. C'est la vie. I don't mind
> emailing it to whomever asks for it.

> -john.

> --
> John Smirnios
> Senior Software Developer
> iAnywhere Solutions Engineering

Hi John,
I would like sample as well.

Can you please send it to me?

Thank you in advance.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
shobhan.baner...@gmail.com  
View profile  
 More options Jan 16, 5:02 pm
Newsgroups: sybase.public.sqlanywhere.general
From: shobhan.baner...@gmail.com
Date: Wed, 16 Jan 2013 14:02:20 -0800 (PST)
Local: Wed, Jan 16 2013 5:02 pm
Subject: Re: Algorithm Used in Similar() function.

John,

I need to implement the Sybase Similar() function on Teradata. It'll be great if you can share the algorithm with me.

Thanks in advance for your help!


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »