I'll paste in a table, which came from VDE on 80-column screen and looks
fine in Courier in my email window, but may not survive transmission.
The differences in the sort output are significant.
John Woodruff
2010-03-10 10:03
TEST OF SORTING A FILE OF ASCII CHARACTERS 32 THROUGH 126
command syntax: { SORTXL | RPSORT | SORT }<source>outfile
--------------------------------------------------------------------------
SOURCE SORTXL RPSORT WIN XP SORT DOS 5 SORT
WIN 2K SORT
--------------------------------------------------------------------------
SP 20 32 SP 20 32 SP 20 32 ' 27 39 SP 20 32
! 21 33 ! 21 33 ! 21 33 - 2D 45 ! 21 33
" 22 34 " 22 34 " 22 34 SP 20 32 " 22 34
# 23 35 # 23 35 # 23 35 ! 21 33 # 23 35
$ 24 36 $ 24 36 $ 24 36 " 22 34 $ 24 36
% 25 37 % 25 37 % 25 37 # 23 35 % 25 37
& 26 38 & 26 38 & 26 38 $ 24 36 & 26 38
' 27 39 ' 27 39 ' 27 39 % 25 37 ' 27 39
( 28 40 ( 28 40 ( 28 40 & 26 38 ( 28 40
) 29 41 ) 29 41 ) 29 41 ( 28 40 ) 29 41
* 2A 42 * 2A 42 * 2A 42 ) 29 41 * 2A 42
+ 2B 43 + 2B 43 + 2B 43 * 2A 42 + 2B 43
, 2C 44 , 2C 44 , 2C 44 , 2C 44 , 2C 44
- 2D 45 - 2D 45 - 2D 45 . 2E 46 - 2D 45
. 2E 46 . 2E 46 . 2E 46 / 2F 47 . 2E 46
/ 2F 47 / 2F 47 / 2F 47 : 3A 58 / 2F 47
0 30 48 0 30 48 0 30 48 ; 3B 59 0 30 48
1 31 49 1 31 49 1 31 49 ? 3F 63 1 31 49
2 32 50 2 32 50 2 32 50 @ 40 64 2 32 50
3 33 51 3 33 51 3 33 51 [ 5B 91 3 33 51
4 34 52 4 34 52 4 34 52 \ 5C 92 4 34 52
5 35 53 5 35 53 5 35 53 ] 5D 93 5 35 53
6 36 54 6 36 54 6 36 54 ^ 5E 94 6 36 54
7 37 55 7 37 55 7 37 55 _ 5F 95 7 37 55
8 38 56 8 38 56 8 38 56 ` 60 96 8 38 56
9 39 57 9 39 57 9 39 57 { 7B 123 9 39 57
: 3A 58 : 3A 58 : 3A 58 | 7C 124 : 3A 58
; 3B 59 ; 3B 59 ; 3B 59 } 7D 125 ; 3B 59
< 3C 60 < 3C 60 < 3C 60 ~ 7E 126 < 3C 60
= 3D 61 = 3D 61 = 3D 61 + 2B 43 = 3D 61
> 3E 62 > 3E 62 > 3E 62 < 3C 60 > 3E 62
? 3F 63 ? 3F 63 ? 3F 63 = 3D 61 ? 3F 63
@ 40 64 @ 40 64 @ 40 64 > 3E 62 @ 40 64
A 41 65 A 41 65 A 41 65 0 30 48 A 41 65
B 42 66 a 61 97 a 61 97 1 31 49 a 61 97
C 43 67 B 42 66 B 42 66 2 32 50 B 42 66
D 44 68 b 62 98 b 62 98 3 33 51 b 62 98
E 45 69 C 43 67 C 43 67 4 34 52 C 43 67
F 46 70 c 63 99 c 63 99 5 35 53 c 63 99
G 47 71 D 44 68 D 44 68 6 36 54 D 44 68
H 48 72 d 64 100 d 64 100 7 37 55 d 64 100
I 49 73 E 45 69 E 45 69 8 38 56 E 45 69
J 4A 74 e 65 101 e 65 101 9 39 57 e 65 101
K 4B 75 F 46 70 F 46 70 A 41 65 F 46 70
L 4C 76 f 66 102 f 66 102 a 61 97 f 66 102
M 4D 77 G 47 71 G 47 71 B 42 66 G 47 71
N 4E 78 g 67 103 g 67 103 b 62 98 g 67 103
O 4F 79 H 48 72 H 48 72 C 43 67 H 48 72
P 50 80 h 68 104 h 68 104 c 63 99 h 68 104
Q 51 81 I 49 73 I 49 73 D 44 68 I 49 73
R 52 82 i 69 105 i 69 105 d 64 100 i 69 105
S 53 83 J 4A 74 J 4A 74 E 45 69 J 4A 74
T 54 84 j 6A 106 j 6A 106 e 65 101 j 6A 106
U 55 85 K 4B 75 K 4B 75 F 46 70 K 4B 75
V 56 86 k 6B 107 k 6B 107 f 66 102 k 6B 107
W 57 87 L 4C 76 L 4C 76 G 47 71 L 4C 76
X 58 88 l 6C 108 l 6C 108 g 67 103 l 6C 108
Y 59 89 M 4D 77 M 4D 77 H 48 72 M 4D 77
Z 5A 90 m 6D 109 m 6D 109 h 68 104 m 6D 109
[ 5B 91 N 4E 78 N 4E 78 I 49 73 N 4E 78
\ 5C 92 n 6E 110 n 6E 110 i 69 105 n 6E 110
] 5D 93 O 4F 79 O 4F 79 J 4A 74 O 4F 79
^ 5E 94 o 6F 111 o 6F 111 j 6A 106 o 6F 111
_ 5F 95 P 50 80 P 50 80 K 4B 75 P 50 80
` 60 96 p 70 112 p 70 112 k 6B 107 p 70 112
a 61 97 Q 51 81 Q 51 81 L 4C 76 Q 51 81
b 62 98 q 71 113 q 71 113 l 6C 108 q 71 113
c 63 99 R 52 82 R 52 82 M 4D 77 R 52 82
d 64 100 r 72 114 r 72 114 m 6D 109 r 72 114
e 65 101 S 53 83 S 53 83 N 4E 78 S 53 83
f 66 102 s 73 115 s 73 115 n 6E 110 s 73 115
g 67 103 T 54 84 T 54 84 O 4F 79 T 54 84
h 68 104 t 74 116 t 74 116 o 6F 111 t 74 116
i 69 105 U 55 85 U 55 85 P 50 80 U 55 85
j 6A 106 u 75 117 u 75 117 p 70 112 u 75 117
k 6B 107 V 56 86 V 56 86 Q 51 81 V 56 86
l 6C 108 v 76 118 v 76 118 q 71 113 v 76 118
m 6D 109 W 57 87 W 57 87 R 52 82 W 57 87
n 6E 110 w 77 119 w 77 119 r 72 114 w 77 119
o 6F 111 X 58 88 X 58 88 S 53 83 X 58 88
p 70 112 x 78 120 x 78 120 s 73 115 x 78 120
q 71 113 Y 59 89 Y 59 89 T 54 84 Y 59 89
r 72 114 y 79 121 y 79 121 t 74 116 y 79 121
s 73 115 Z 5A 90 Z 5A 90 U 55 85 Z 5A 90
t 74 116 z 7A 122 z 7A 122 u 75 117 z 7A 122
u 75 117 [ 5B 91 [ 5B 91 V 56 86 [ 5B 91
v 76 118 \ 5C 92 \ 5C 92 v 76 118 \ 5C 92
w 77 119 ] 5D 93 ] 5D 93 W 57 87 ] 5D 93
x 78 120 ^ 5E 94 ^ 5E 94 w 77 119 ^ 5E 94
y 79 121 _ 5F 95 _ 5F 95 X 58 88 _ 5F 95
z 7A 122 ` 60 96 ` 60 96 x 78 120 ` 60 96
{ 7B 123 { 7B 123 { 7B 123 Y 59 89 { 7B 123
| 7C 124 | 7C 124 | 7C 124 y 79 121 | 7C 124
} 7D 125 } 7D 125 } 7D 125 Z 5A 90 } 7D 125
~ 7E 126 ~ 7E 126 ~ 7E 126 z 7A 122 ~ 7E 126
> I'll paste in a table, which came from VDE on 80-column
> screen and looks fine in Courier in my email window, but may
> not survive transmission.
It appears to have survived as it looks fine here in:
CHARSET_COLLECTIONS ASCII ISO8859-1 ADOBE-STANDARD
FULL_NAME Courier Bold
-- Gary
I use RPSORT here, or a Cygwin port of the *nix sort command, so I
never noticed.
> I'll paste in a table, which came from VDE on 80-column screen and looks
> fine in Courier in my email window, but may not survive transmission. The
> differences in the sort output are significant.
It looks funny in GMail, which uses a proportionate font, but shows as
intended copied and pasted into a text editor (Notepad++)
> John Woodruff
______
Dennis
I'm an old person, so I happen to think that ALL of your output is
wrong: I prefer to have my computers use ASCII collating sequence, so
all the lowercase letters should come after all the uppercase letters.
Unfortunately for me, many current Unix/Linux releases have decided in
part to use the natural-language sorting you are getting, even with
directory listings, so upper- and lowercase filenames are intermingled
-- but not on all systems or under all circumstances. And it makes no
sense to me to put some punctuation or special characters before the
alphabet and others after, once you've decided not to sort them in
ASCII order; some of the sorted output you show seems to imply that
there are "less used" and "more used" special characters, so (for
example) the [ and ] are sorted to the end, but the ( and ) are sorted
before the letters. The Win2K/XP sort order at least puts all of these
characters together.
Usually the collating sequence is set according to a locale, specified
in either an environment variable or a file. There are many standard
collating sequences; MS-Windows 2K seems to use this one:
http://www.collation-charts.org/win2k/win2k.0409.CP1252.English_United_States.html
which seems on quick glance to be the same as these:
http://www.collation-charts.org/winxp/winxp.0409.CP1252.English_United_States.html
http://www.collation-charts.org/vista/vista.0409.CP1252.English_United_States.html
What's really weird is how newer versions of MS-Windows treat file and
directory names that contain numbers: they treat the numeric portion
as a number, not as text, so where one might expect (from the
collating sequence)
Ie4_01
Ie4_128
Ie401sp2
Ie5
Ie501sp2
Ie6
one now gets
Ie4_01
Ie4_128
Ie5
Ie6
Ie401sp2
Ie501sp2
on the theory that four-hundred-one is a larger value than six, so it
sorts later.
I imagine that what we really want is a sorting program for which it
is easy to specify the desired collating sequence, either in a text
file or through some simple utility. It should use an efficient
sorting algorithm and be available as source code so we can use it on
any operating system.
Eric?
-- Mark F.
On Thu, Mar 11, 2010 at 2:34 PM, John Woodruff <desu...@gmail.com> wrote:
> I was surprised to find that SORT as provided in Win 2000 and XP doesn't
> sort ASCII characters 32 through 126 the same as DOS 5, SORTXL, and RPSORT,
> which provide identical output.
>
> I'll paste in a table, which came from VDE on 80-column screen and looks
> fine in Courier in my email window, but may not survive transmission. The
> differences in the sort output are significant.
>
> John Woodruff
>
>
> 2010-03-10 10:03
>
> TEST OF SORTING A FILE OF ASCII CHARACTERS 32 THROUGH 126
>
> command syntax: { SORTXL | RPSORT | SORT }<source>outfile
--
"If you have any trouble sounding condescending, find a Unix user to
show you how it's done."
-- Scott Adams
Your SORT discussion got me going, so I'm just going to ask your
indulgence in this little puzzle I found in Explorer in Windows XP. Do
the following:
In a test directory, create folders with the following names (you can
even add a few of your own, for more better effect). Those are zeroes
in the folder names, not letter "O". The idea is to make folder names
that have combinations of numerals and alphabetic characters.
00123
00ABC
0123
0ABC
123
12300
123000
1AEF
ABC00
ABC000
See how these names sort in DIR listing in a CMD window. For example,
using "DIR /ON" gave the order above.
Now see (below) how these names "sort" when shown in an Explorer window
after clicking the "Name" column to sort on the folder names. Can
someone explain to me how they are being sorted?
00ABC
0ABC
1AEF
00123
0123
12300
ABC000
ABC00
-moy
When I'm using VDE to process lists, calling an external SORT command
to process marked blocks, I sometimes set up the list with permanent
first and last lines, each consisting of one of the non-alphanumeric
characters on the keyboard. Now that the SORT provided by MS no longer
collates any of those characters to the end, I just won't use it.
John Woodruff
And I've forever been annoyed by the way Windows puts "marmot closeup.jpg"
BEFORE "marmot.jpg"... never could even figure out a rationale for that. (Not
excluding the filetype from the sort?)
-- Eric Meyer.
http://simtel.img.digitalriver.com/product/view/id/51623
--
Jim
--
Jim Oliver
How much flexibility in collating sequence is desired?
RPSort allows you to specify ASCII collation, and an assortment of
other things. The main issue I see offhand is the assumption of DOS
CRLF line endings. It would be nice to specify *nix LF or MacOS CR
endings instead.
ASM source is available, but not exactly portable.
> -- Mark F.
_____
Dennis
> Can someone explain to me how they are being sorted?
>
> 00ABC
> 0ABC
> 1AEF
> 00123
> 0123
> 12300
> ABC000
> ABC00
It is "numerical" sorting which from 4DOS Help appears
would be an MS-DOS default:
> DIR /O:a
>
> Sort names and extensions in standard ASCII order,
> rather than sorting numerically when digits are included
> in the name or extension.
> C:\MOYTEST>dir /b /o:a
> 00123
> 00ABC
> 0123
> 0ABC
> 123
> 12300
> 123000
> 1AEF
> ABC00
> ABC000
SORT.EXE and the DESQview/X File Manager use the ASCII,
which causes one to wonder origins of numerical sorting or
why they chose to retain it Explorer but not CMD.
-- Gary
Gary, I hear what you're saying about "numeric" sorting, but I just
could not make it fit the sequence I was seeing. Besides, it appears to
be something not quite numeric.
IIRC, MS-DOS 6.22 does have a default "sort" order, but it does not have
a "dir /o:a" switch. "Default" literally is the order of the file name
entries in the DOS directory structure.
I guess the real problem we're looking at is Explorer not clearly
defining the data type being sorted. After all, file and directory
names really are just character strings and not values. Let's say that,
in the example, one argues that all the names (especially "1AEF") are
hex sequences--Then "ABC000" really could sort *before* "ABC00".
So I'd argue for always using ASCII collating sequence for file
names--exactly what MS-DOS "dir /on" provides, rather than the
mix-and-match mess that Windows Explorer makes.
-m
]Moy asks:
]
]
]--
]You are subscribed to the Google Group "VDE_Editor".
]To unsubscribe, send email to vde_editor+...@googlegroups.com
]For more options, visit the group at http://groups.google.com/group/vde_editor
]
> IIRC, MS-DOS 6.22 does have a default "sort" order, but it does not have
> a "dir /o:a" switch. "Default" literally is the order of the file name
> entries in the DOS directory structure.
And there were utilities back then that you could use to actually sort
the directory entries on disk so the order *was* alphabetic.
> I guess the real problem we're looking at is Explorer not clearly
> defining the data type being sorted. After all, file and directory
> names really are just character strings and not values. Let's say that,
> in the example, one argues that all the names (especially "1AEF") are
> hex sequences--Then "ABC000" really could sort *before* "ABC00".
>
> So I'd argue for always using ASCII collating sequence for file
> names--exactly what MS-DOS "dir /on" provides, rather than the
> mix-and-match mess that Windows Explorer makes.
I don't especially care what order Windows Explorer uses. For the
things I do in it, I'm either dealing with individual files or with
file groups, and the order I will need for groups will be either a
Name sort or a Date sort with most recently modified first. The sorts
of file names I have aren't the ones that will generate the confusion
John's example engenders.
If I need to generate a file list I'm going to precess with something
else, I'll got to a command line and probably pass the list through an
external sort utility.
> -m
_____
Dennis
At a basic level, everything is just a byte stream (which is why Unix
works the way it does), and the application gets to decide how to
interpret it. MS-Windows Explorer, like any shell program, is an
application.
In MS-Windows XP, Microsoft chose to interpret file names and folder
names as mixed data types, able to contain both text and numeric
values. Digits -- 0-9 -- in a file name or folder name are interpreted
as numbers, while alphabetic characters are interpreted as text. So,
where 1AEF comes AFTER 00123 in an ASCII collating sequence, a numeric
value of one (1) comes BEFORE a numeric value of
one-hundred-twenty-three (00123); and a textual comparison is done to
sort 00123 before 0123, which have the same numeric value.
> Let's say that,
> in the example, one argues that all the names (especially "1AEF") are
> hex sequences--Then "ABC000" really could sort *before* "ABC00".
>
> So I'd argue for always using ASCII collating sequence for file
> names--exactly what MS-DOS "dir /on" provides, rather than the
> mix-and-match mess that Windows Explorer makes.
I agree with you, and Microsoft doesn't. -- Mark F.
--
> IIRC, MS-DOS 6.22 does have a default "sort" order, but
> it does not have a "dir /o:a" switch. "Default"
> literally is the order of the file name entries in the
> DOS directory structure.
I long ago replaced COMMAND.COM with 4DOS.COM. As the DOS
SHELL it need to be compatible so I tend to assume it's
defaults are those of COMMAND.COM. Apparently
COMMAND.COM's "unsorted" is of so little use 4DOS.COM
makes it optional.
So your inexplicable sort would also be 4DOS's default
numerical sort.
-- Gary
To add to the melee, I realized that I actually *inverted* the meaning
of one of my statements! About the file/directories names that might be
interpreted as hex values, I incorrectly "sorted" "ABC000" before
"ABC00".
I actually would like Explorer to be able to give me at least an
intuitive sort order when I request a sort by name. After all, what
good is a name if it becomes onerous to find a file by, uh, name? (Oh,
and try sorting by file "type"--gack).
While trivial in DOS (I can do "DIR ACT*.TXT" to limit my search, or
even "DIR ACTUAL.TXT" to find an exact file, or "DIR ACT*.TXT /S" for a
little more reach), Explorer makes me pore over a sometimes
indeciperable list of names.
But then, this *is* a discussion list about VDE and I have digressed a
bit :^)
-moy
]Moy is correct:
--
Oh, yes, absolutely. Merely finding a file with DIR would be
straightforward, assuming the current directory in CMD is the one you
want to be in.
In the WinXP default filesystem, you'd get your data directories
branching off "C:\Documents and Settings\Moy\My Documents\whatever,"
which makes for a lot of keystrokes inside CMD in order to *do* anything
with the file you want. Oh, and don't forget the quotation marks
surrounding file/folder names that have embedded spaces.
Yes, I could open an application, then use its own file browser to do a
masked listing to narrow the list of files to look at, but that's more
clicks plus keystrokes as opposed to one click (for sort order) then a
drag to scroll quickly.
On the other hand, in the GUI, you'd have to navigate folders in the GUI
before opening one (uh, the need for a realistic name sort--again).
Finding a file in the GUI is often followed by the gesture I'd have to
make in order to *do* something with it. Often, this will be your
pedestrian actions: "open," "copy," "delete," or "move," with "open"
probably being the most often used command. So *opening* a file often
requires that I *find* it first.
In other words, I want to be able to double-click on something
relatively soon after opening the folder where I know it should reside.
This becomes non-trivial when the "sort" is broken.
BTW, I do keep some 8.3-compliant work folders on C:\, just to keep my
typing fingers and my favorite DOS apps happy.
But again, we digress. Funny how DOS v Windows (or VDE v other editors)
debates often distill down to their *interfaces*?
-moy
]
]You do realize, I hope, that you can still do the same things in a
I don't think it's so funny at all, when the differences are just a lack of
respect for established standards. There already was a generally recognized
way of sorting filenames under DOS, which everyone with a PC was used to. Why
doesn't Windows observe it? Or even give the *option* of observing it?
(There is some minor sorting option in Explorer setup but it makes no
difference to any of this.)
-- Eric Meyer.
> Oh, yes, absolutely. Merely finding a file with DIR would be
> straightforward, assuming the current directory in CMD is the one you
> want to be in.
> In the WinXP default filesystem, you'd get your data directories
> branching off "C:\Documents and Settings\Moy\My Documents\whatever,"
> which makes for a lot of keystrokes inside CMD in order to *do* anything
> with the file you want. Oh, and don't forget the quotation marks
> surrounding file/folder names that have embedded spaces.
By default, from Start/Run, I believe CMD opens in <Windows
drive>\Documents and Settings\<username> (My Windows XP boot drive is
H:.) But you can put a shortcut to it on the desktop and set the
properties to make it start up in whatever directory you like.
And you can use tab completion of the command line to make life easier.
> Yes, I could open an application, then use its own file browser to do a
> masked listing to narrow the list of files to look at, but that's more
> clicks plus keystrokes as opposed to one click (for sort order) then a
> drag to scroll quickly.
>
> On the other hand, in the GUI, you'd have to navigate folders in the GUI
> before opening one (uh, the need for a realistic name sort--again).
And you can put shortcuts to directories on the desktop, too. I have
an assortment of them.
> Finding a file in the GUI is often followed by the gesture I'd have to
> make in order to *do* something with it. Often, this will be your
> pedestrian actions: "open," "copy," "delete," or "move," with "open"
> probably being the most often used command. So *opening* a file often
> requires that I *find* it first.
>
> In other words, I want to be able to double-click on something
> relatively soon after opening the folder where I know it should reside.
> This becomes non-trivial when the "sort" is broken.
Finding things isn't that hard. I have a default set of standards
about where I keep things. And I generally don't have the sort of
file names that would confuse things the way the posted examples do.
Windows' idea of a name sort, while arguably incorrect, is adequate
for my purposes.
Of course, I do have half a dozen physical hard drives in my main
machine. So one tool I use is locate32, a Win32 port of the *nix
"locate" utility. Locate builds an index of my file system, and lets
me search for files using a variety of criteria. I can open or
otherwise manipulate the file directly from the list locate returns.
For more complicated cases, like "What file was the following in", I
use Google Desktop, which will let me search for content *within*
files.
In addition, there are several third-party utilities that install as
shell extensions, and let you open a command window in a directory
selected from Windows Explorer.
> BTW, I do keep some 8.3-compliant work folders on C:\, just to keep my
> typing fingers and my favorite DOS apps happy.
>
> But again, we digress. Funny how DOS v Windows (or VDE v other editors)
> debates often distill down to their *interfaces*?
Essentially, that's what it comes down to. One of the main efforts of
development over the life of the personal computer was making it
easier for users to perform tasks, and reducing what they ahd to know
to use the machine at all. GUIs and point and click are a logical
progression in that effort.
> -moy
_____
Dennis
The installation is in the form of a tiny (1Kb) "inf" file. Just
download it from the url below and unzip it, then right click on it (in
Windows Explorer) and choose INSTALL. Thats it!
I use it all the time. It works in Win XP.
http://www.petri.co.il/software/doshere.zip
Cheers - Jim.
--
Jim Oliver
> You do realize, I hope, that you can still do the same
> things in a command window?
From my experience with J.P. Software's
<http://www.jpsoft.com> COMMAND.com replacement/alternative
4DOS.com and the DESQview/X File Manger, I'd expect their $100
Take Command would allow one to do more at a command prompt
and rely less on Explorer.
-- Gary
Considerably more, as it comes in a GUI version as well as a console
application.
They offer a freeware "Limited Edition" version with a healthy subset
of the full feature set that is well worth installing. (Stuff missing
is mostly related to a networked environment used by a corporate
desktop.)
Go here: http://www.jpsoft.com/tccledes.htm
> -- Gary
_____
Dennis
> Considerably more, as it comes in a GUI version as well as a
> console application.
A boffin at MIT.'s Lincoln Labs, who advised me when I first
"went DOS" to "get 4DOS", once complained on the JP Software
forum that he found the GUI to be useless and he wouldn't buy
it if it weren't bundled with the console command processor.
Perhaps he's become more GUI dependent since, but I suspect
it's for users who expect a GUI.
-- Gary