TIA
P.S I hate SQL Server!
-john.
--
John Smirnios
Senior Software Developer
iAnywhere Solutions Engineering
Whitepapers, TechDocs, bug fixes are all available through the iAnywhere
Developer Community at http://www.ianywhere.com/developer
would it be possible to send the sample to me, too?
I have used SIMILAR since about 10 years to do "automated customer data
clearing" stuff, and it works quite well - in conjunction with LIKE and the
like...
I once had done tests with user-defined and external functions (e.g. with
the "Levenshtein" algorithm) but they were far too slow in contrast to
SIMILAR.
Fortunately, the algorithm seems to have been the same since V5.5...so I
would like to have a chance to look at it more closely.
TIA
Volker
(Feeling fine not to have to port that to MS SQL / ASE)
"John Smirnios" <smirnios_at_sybase.com> wrote in
news:462ceb90$1@forums-1-dub...
-john.
--
John Smirnios
Senior Software Developer
iAnywhere Solutions Engineering
Whitepapers, TechDocs, bug fixes are all available through the iAnywhere
Developer Community at http://www.ianywhere.com/developer
> Yes. I'll send you a sample via email.
>
I would like a sample as well.
Would it be possible to post this on a website somewhere (or just
include it in a response to this post)? I'm sure others might be
interested too.
Thanks.
SJS
thanks for sending the source code! - I still have to study it in detail...
I just have read about UCA and collation tailoring in the 10.0.1 docs.
As this seems to one of your favourite topics:
In the application I talked of, we typically have to compare German names
(of persons and places).
As you may know, these may contain "umlauts" like 'ä' or special characters
like 'ß' (the "sharp s").
However, in older, restricted charsets (or in internationalized uses like
mail addresses), these umlauts have often been expanded to two characters,
e.g. 'ä' to 'ae' or 'ß' to 'ss'.
So one task we face is to have 'ä' and 'ae' to compare to be the same.
AFAIK, single-byte collations can only compare characters one by one and
therefore can not treat 'ä' and 'ae' as wanted.
Is this the same for unicode collations, or could I establish some rule to
make 'ä' and 'ae' the same?
(So far, we have solved this problem by storing both the original names and
an "normalized" form, where umlauts are expanded and everything is uppercase
and some phonetic simplifications are done (e.g. 'ph' sounds like 'f' and is
therefore normalized to 'f'). The normalized form is stored as an computed
field and is automatically calculated by an user-defined function.
Comparisons are then done on the normalized forms.
This works well with the typical German '1252LATIN1" single-byte collation.)
Any hint if UCA may give better facilities is highly appreciated...
Volker
"John Smirnios" <smirnios_at_sybase.com> schrieb im Newsbeitrag
news:462e1b2a$1@forums-1-dub...
However, the code for the "similar" function in SQLAnywhere is still
performed using a character-by-character match as seen in the code I
sent you. When scanning a two strings such as 'SS' and 'ß', it will try
to match the first 'S' with 'ß' and not find a match (since 'S' != 'ß').
If it's any consolation, the UPPER function will convert 'ß' to 'SS' if
you are using an ICU collation (again, the correct locale/tailoring may
be needed).
-john.
--
John Smirnios
Senior Software Developer
iAnywhere Solutions Engineering
Whitepapers, TechDocs, bug fixes are all available through the iAnywhere
Developer Community at http://www.ianywhere.com/developer
thanks for the explanation.
So I guess I'm going to do some tests with UCA in the (not so near)
future...
...though the particular solution we are using now may still be more
appropriate to treat names like 'Stefan' and 'Stephan' (which are
mis-spelled or mixed up quite often) as equal.
Thanks again!
Volker
"John Smirnios" <smirnios_at_sybase.com> schrieb im Newsbeitrag
news:46321919$1@forums-1-dub...