Unexpected string sorting anomaly [since forever]

36 views
Skip to first unread message

Richard Sargent

unread,
May 7, 2020, 5:31:03 PM5/7/20
to VA Smalltalk
I have learned (the hard way) that VA Smalltalk has an unusual sorting characteristic for Strings and has for a very long time. If you sort a collection of strings and there are strings differing only in case from each other, the sort is not stable. Sometimes one will sort before the other and sometime they will sort the other way.

'false'  <  'FALSE'    false
'FALSE'  <  'false'    false <<<

'false'  =  'FALSE'    false
'FALSE'  =  'false'    false

'false'  >  'FALSE'    false <<<
'FALSE'  >  'false'    false

'false'  ~=  'FALSE'    true
'FALSE'  ~=  'false'    true

'false'  <=  'FALSE'    true <<<
'FALSE'  <=  'false'    true

'false'  >=  'FALSE'    true
'FALSE'  >=  'false'    true <<<


$h  <  $H    false
$H  <  $h    false <<<

$h  =  $H    false
$H  =  $h    false

$h  >  $H    false <<<
$H  >  $h    false

$h  ~=  $H    true
$H  ~=  $h    true

$h  <=  $H    true <<<
$H  <=  $h    true

$h  >=  $H    true
$H  >=  $h    true <<<

Hans-Martin Mosner

unread,
May 11, 2020, 5:29:34 AM5/11/20
to VA Smalltalk
I wouldn't call that wrong, it's just a characteristic of a partial ordering which can be overcome by more complex locale specific rules.
In Smalltalk, we've historically had a complete ordering (originally ASCII with the exception of the special up-arrow return and left-arrow assignment glyphs which were mapped to ASCII code points 16r5E and 16r5F) which sorted uppercase before lowercase, so 'TRUE' < 'false' but 'FALSE' < 'true'.
With locale collation, things get a lot more complicated (and admittedly VAST doesn't do everything right here.) Case differences play almost no role when you want to achieve dictionary order.
In some natural languages there are different rules which were developed in different contexts, and you can't really say that one is wrong while the other is right. See https://german.stackexchange.com/questions/52765/ordering-german-special-characters-and-those-from-other-languages-when-sorting for a discussion on german sort order, for example.
There are cases when single characters should be treated like character groups (for example, ß should be sorted like ss), and for case differences you might want to have a precedence rule such that case differences only matter if the words are completely equal when compared as lowercase. This would lead to an order 'THRU' < 'thru' < 'TRUE' < 'true' which probably feels most natural to most people.

What you show in your example is basically a partial case-insensistive comparison, so the "<" or ">" messages return false when sender and recipient differ just in letter case, because then neither is considered less than or greater than the other one. For a practical application, you should define a comparison method which orders strings the way you want them ordered :-)
Reply all
Reply to author
Forward
0 new messages