Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

The Unix 'sort' utility...

29 views
Skip to first unread message

Kenny McCormack

unread,
Feb 2, 2015, 11:59:42 AM2/2/15
to
I've never liked it. It seems to me a classic case of "trying to make it
possible to do anything, they made it very difficult to do the normal, easy
things". Every time I've had to try to decipher the man page, I've ended
up saying "There's gotta be a better way".

Specific question: how to sort a file based on the columns X to the end,
regardless of any field delimiters (spaces, tabs, whatever) ?

The problem is that it seems to be fields oriented; there's not a direct
way to make it line oriented. There is an option (-t) to change the field
separator character; you could try to set that to something "not found" in
your data, but we all know what a rabbit hole that can be.

Anyway, after trying to figure this out, I realized that it would take more
mental energy to decipher it (and test it) than it would take to write an
equivalent program in AWK (using the built-in sorting capability of AWK),
which is what I did.

Anyway, is there any way to do this in 'sort'? FWIW, this is on OSX, but
it seems the 'sort' is GNU:

$ sort --version
sort (GNU coreutils) 5.93
Copyright (C) 2005 Free Software Foundation, Inc.
This is free software. You may redistribute copies of it under the terms of
the GNU General Public License <http://www.gnu.org/licenses/gpl.html>.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and Paul Eggert.
$


--
In the corner of the room on the ceiling is a large vampire bat who
is obviously deranged and holding his nose.

Barry Margolin

unread,
Feb 2, 2015, 12:06:33 PM2/2/15
to
In article <maoadr$edq$1...@news.xmission.com>,
gaz...@shell.xmission.com (Kenny McCormack) wrote:

> I've never liked it. It seems to me a classic case of "trying to make it
> possible to do anything, they made it very difficult to do the normal, easy
> things". Every time I've had to try to decipher the man page, I've ended
> up saying "There's gotta be a better way".
>
> Specific question: how to sort a file based on the columns X to the end,
> regardless of any field delimiters (spaces, tabs, whatever) ?

-k 1.c

where c is the column number. This treats the entire line as part of
field 1.

--
Barry Margolin, bar...@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***

A. Mehoela

unread,
Feb 2, 2015, 12:27:55 PM2/2/15
to
"-k X" should do it: you don't specify the ending key, so sort should use all those that follow too.

Lew Pitcher

unread,
Feb 2, 2015, 12:32:36 PM2/2/15
to
On Monday February 2 2015 12:06, in comp.unix.shell, "Barry Margolin"
<bar...@alum.mit.edu> wrote:

> In article <maoadr$edq$1...@news.xmission.com>,
> gaz...@shell.xmission.com (Kenny McCormack) wrote:
>
>> I've never liked it. It seems to me a classic case of "trying to make it
>> possible to do anything, they made it very difficult to do the normal,
>> easy
>> things". Every time I've had to try to decipher the man page, I've ended
>> up saying "There's gotta be a better way".
>>
>> Specific question: how to sort a file based on the columns X to the end,
>> regardless of any field delimiters (spaces, tabs, whatever) ?
>
> -k 1.c
>
> where c is the column number. This treats the entire line as part of
> field 1.

Don't forget the -b option; without it, the -k option identifies field 1 as
first nonblank character on the line.

--
Lew Pitcher
"In Skills, We Trust"
PGP public key available upon request

Sivaram Neelakantan

unread,
Feb 2, 2015, 12:43:59 PM2/2/15
to
On Mon, Feb 02 2015,Barry Margolin wrote:

> In article <maoadr$edq$1...@news.xmission.com>,
> gaz...@shell.xmission.com (Kenny McCormack) wrote:
>

[snipped 5 lines]

>> Specific question: how to sort a file based on the columns X to the end,
>> regardless of any field delimiters (spaces, tabs, whatever) ?
>
> -k 1.c
>
> where c is the column number. This treats the entire line as part of
> field 1.

So, -k 1.27 means that cols 27 till EOL is the sorting columns of
interest? and the first 26 cols are not considered as part of the
sort sequence?

sivaram
--

Barry Margolin

unread,
Feb 2, 2015, 12:44:57 PM2/2/15
to
In article <rsOzw.546848$HR5.5...@fx27.am4>,
The question wasn't how to go to the end, the question was how to start
at the Xth character column rather than the Xth field.

Kenny McCormack

unread,
Feb 2, 2015, 1:05:31 PM2/2/15
to
In article <barmar-6E55A4....@88-209-239-213.giganet.hu>,
Barry Margolin <bar...@alum.mit.edu> wrote:
...
>The question wasn't how to go to the end, the question was how to start
>at the Xth character column rather than the Xth field.

Actually, it was.

The docu may actually contain that particle of info - that if you specify
column X (in field 1), it means X-end-of-line [*], but it is far from clear.

Hence the reason not to use this utility.

[*] As opposed to "through end of field 1", which is a lot more
common-sense.

--
Is God willing to prevent evil, but not able? Then he is not omnipotent.
Is he able, but not willing? Then he is malevolent.
Is he both able and willing? Then whence cometh evil?
Is he neither able nor willing? Then why call him God?
~ Epicurus

Barry Margolin

unread,
Feb 2, 2015, 2:30:39 PM2/2/15
to
In article <maoe98$edq$2...@news.xmission.com>,
gaz...@shell.xmission.com (Kenny McCormack) wrote:

> In article <barmar-6E55A4....@88-209-239-213.giganet.hu>,
> Barry Margolin <bar...@alum.mit.edu> wrote:
> ...
> >The question wasn't how to go to the end, the question was how to start
> >at the Xth character column rather than the Xth field.
>
> Actually, it was.

Go back and read the OP. The main issue he was struggling with was how
to specify columns without respect to fields.

A. Mehoela

unread,
Feb 2, 2015, 3:56:19 PM2/2/15
to
Barry Margolin wrote:
> In article <maoe98$edq$2...@news.xmission.com>,
> gaz...@shell.xmission.com (Kenny McCormack) wrote:
>
>> In article <barmar-6E55A4....@88-209-239-213.giganet.hu>,
>> Barry Margolin <bar...@alum.mit.edu> wrote:
>> ...
>>> The question wasn't how to go to the end, the question was how to start
>>> at the Xth character column rather than the Xth field.
>>
>> Actually, it was.
>
> Go back and read the OP. The main issue he was struggling with was how
> to specify columns without respect to fields.
>

I mistook columns for fields, and the author politely corrected me on that.

He'll probably quite soon correct himself, thinking he can write an "equivalent program in awk", once he gets some serious sorting
to do.

Christian Weisgerber

unread,
Feb 2, 2015, 8:15:14 PM2/2/15
to
On 2015-02-02, A. Mehoela <a.me...@hoetmeel.com> wrote:

> I mistook columns for fields, and the author politely corrected me on that.
>
> He'll probably quite soon correct himself, thinking he can write an "equivalent program in awk", once he gets some serious sorting
> to do.

The concept of "column" becomes very interesting with multi-byte
and double-width characters. Or even just tabs.

--
Christian "naddy" Weisgerber na...@mips.inka.de

Janis Papanagnou

unread,
Feb 3, 2015, 6:44:51 AM2/3/15
to
On 03.02.2015 00:35, Christian Weisgerber wrote:
> On 2015-02-02, A. Mehoela <a.me...@hoetmeel.com> wrote:
>
>> I mistook columns for fields, and the author politely corrected me on that.
>>
>> He'll probably quite soon correct himself, thinking he can write an "equivalent program in awk", once he gets some serious sorting
>> to do.
>
> The concept of "column" becomes very interesting with multi-byte
> and double-width characters. Or even just tabs.

My expectation is that the tools would behave correct with multi-byte
characters and an appropriate locale setting. WRT tabs I don't even
see any problem; a TAB is one physical [whitespace] character. That
output devices will interpret it as a control character to position
an output "cursor" at some other place should not influence meaning
of a column in the data.

I agree that it's interesting whether tools handle "logical" characters
or "physical" low-level octets (bytes). That should be well defined.

For sort(1) I'd expect it to support a locale specific alphabetical
sorting.

It's somewhat surprising that low-level tools like cut(1) seem to
distinguish characters and bytes (options -c and -b resp.) but don't
operate correctly - on my system -c and -b produce the same trashy
result with an UTF-8 locale setting.

Janis

0 new messages