Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Algorithm Used in Similar() function.

124 views
Skip to first unread message

Rob

unread,
Apr 23, 2007, 12:27:16 PM4/23/07
to
Does anyone know the algorithm used to compare strings in the Similar()
function? I have to create a similar function, excuse the pun, in SQL
Server.

TIA

P.S I hate SQL Server!


John Smirnios

unread,
Apr 23, 2007, 1:23:28 PM4/23/07
to
Yes. I'll send you a sample via email.

-john.

--
John Smirnios
Senior Software Developer
iAnywhere Solutions Engineering

Whitepapers, TechDocs, bug fixes are all available through the iAnywhere
Developer Community at http://www.ianywhere.com/developer

Volker Barth

unread,
Apr 24, 2007, 8:49:06 AM4/24/07
to
John,

would it be possible to send the sample to me, too?
I have used SIMILAR since about 10 years to do "automated customer data
clearing" stuff, and it works quite well - in conjunction with LIKE and the
like...
I once had done tests with user-defined and external functions (e.g. with
the "Levenshtein" algorithm) but they were far too slow in contrast to
SIMILAR.

Fortunately, the algorithm seems to have been the same since V5.5...so I
would like to have a chance to look at it more closely.

TIA
Volker
(Feeling fine not to have to port that to MS SQL / ASE)


"John Smirnios" <smirnios_at_sybase.com> wrote in
news:462ceb90$1@forums-1-dub...

John Smirnios

unread,
Apr 24, 2007, 10:58:50 AM4/24/07
to
You may need to send me an email first or give me another email address.
My email to you bounced with the following message: "Mail rejected for
policy reasons."

-john.

--
John Smirnios
Senior Software Developer
iAnywhere Solutions Engineering

Whitepapers, TechDocs, bug fixes are all available through the iAnywhere
Developer Community at http://www.ianywhere.com/developer

Steven J. Serenska

unread,
Apr 26, 2007, 12:33:03 PM4/26/07
to
John Smirnios wrote:

> Yes. I'll send you a sample via email.
>

I would like a sample as well.

Would it be possible to post this on a website somewhere (or just
include it in a response to this post)? I'm sure others might be
interested too.

Thanks.

SJS

Volker Barth

unread,
Apr 27, 2007, 4:55:48 AM4/27/07
to
John,

thanks for sending the source code! - I still have to study it in detail...

I just have read about UCA and collation tailoring in the 10.0.1 docs.

As this seems to one of your favourite topics:

In the application I talked of, we typically have to compare German names
(of persons and places).
As you may know, these may contain "umlauts" like 'ä' or special characters
like 'ß' (the "sharp s").
However, in older, restricted charsets (or in internationalized uses like
mail addresses), these umlauts have often been expanded to two characters,
e.g. 'ä' to 'ae' or 'ß' to 'ss'.
So one task we face is to have 'ä' and 'ae' to compare to be the same.

AFAIK, single-byte collations can only compare characters one by one and
therefore can not treat 'ä' and 'ae' as wanted.
Is this the same for unicode collations, or could I establish some rule to
make 'ä' and 'ae' the same?

(So far, we have solved this problem by storing both the original names and
an "normalized" form, where umlauts are expanded and everything is uppercase
and some phonetic simplifications are done (e.g. 'ph' sounds like 'f' and is
therefore normalized to 'f'). The normalized form is stored as an computed
field and is automatically calculated by an user-defined function.
Comparisons are then done on the normalized forms.
This works well with the typical German '1252LATIN1" single-byte collation.)

Any hint if UCA may give better facilities is highly appreciated...

Volker


"John Smirnios" <smirnios_at_sybase.com> schrieb im Newsbeitrag
news:462e1b2a$1@forums-1-dub...

John Smirnios

unread,
Apr 27, 2007, 11:39:05 AM4/27/07
to
In queries, UCA collations definitely use UCA to perform the comparison
in a linguistically correct way so that 'SS' = 'ß' (not sure off hand if
you need to specify the right locale/tailoring for that though).

However, the code for the "similar" function in SQLAnywhere is still
performed using a character-by-character match as seen in the code I
sent you. When scanning a two strings such as 'SS' and 'ß', it will try
to match the first 'S' with 'ß' and not find a match (since 'S' != 'ß').

If it's any consolation, the UPPER function will convert 'ß' to 'SS' if
you are using an ICU collation (again, the correct locale/tailoring may
be needed).

John Smirnios

unread,
Apr 27, 2007, 11:42:29 AM4/27/07
to
I would love to; however, when I asked for permission a long time ago to
send out the source for "similar" I was told to include a statement to
the effect of "you are free to use it and modify it however you like but
you cannot redistribute the source without Sybase's permission". That
pretty much precludes posting the source. C'est la vie. I don't mind
emailing it to whomever asks for it.

-john.

--
John Smirnios
Senior Software Developer
iAnywhere Solutions Engineering

Whitepapers, TechDocs, bug fixes are all available through the iAnywhere
Developer Community at http://www.ianywhere.com/developer

Volker Barth

unread,
Apr 27, 2007, 11:48:51 AM4/27/07
to
John,

thanks for the explanation.

So I guess I'm going to do some tests with UCA in the (not so near)
future...
...though the particular solution we are using now may still be more
appropriate to treat names like 'Stefan' and 'Stephan' (which are
mis-spelled or mixed up quite often) as equal.

Thanks again!

Volker


"John Smirnios" <smirnios_at_sybase.com> schrieb im Newsbeitrag

news:46321919$1@forums-1-dub...

storms...@gmail.com

unread,
Sep 28, 2012, 7:54:12 AM9/28/12
to
пятница, 27 апреля 2007 г., 17:42:29 UTC+2 пользователь John Smirnios написал:
> I would love to; however, when I asked for permission a long time ago to
> send out the source for "similar" I was told to include a statement to
> the effect of "you are free to use it and modify it however you like but
> you cannot redistribute the source without Sybase's permission". That
> pretty much precludes posting the source. C'est la vie. I don't mind
> emailing it to whomever asks for it.
>
> -john.
>
> --
> John Smirnios
> Senior Software Developer
> iAnywhere Solutions Engineering
Hi John,
I would like sample as well.

Can you please send it to me?

Thank you in advance.

shobhan....@gmail.com

unread,
Jan 16, 2013, 5:02:20 PM1/16/13
to
John,

I need to implement the Sybase Similar() function on Teradata. It'll be great if you can share the algorithm with me.

Thanks in advance for your help!

subi....@gmail.com

unread,
Jun 22, 2016, 6:46:58 AM6/22/16
to
Hi Volker,

Could you please send the sample code to my mail subi.kumar_at_gmail.com


Thanks & Regards
Subi
0 new messages