Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Sort the data according to the date including the abbreviation of English months.

30 views
Skip to first unread message

hongy...@gmail.com

unread,
Apr 19, 2022, 10:50:07 PM4/19/22
to
My data includes the following data formats in each row:

$ cat 1111
Jul 23 2021
Apr 30 2019
Jan 2 2022
Oct 24 2004

I want to sort them according to the date, and tried with sort as follows:

$ sort -k1 < 1111
Apr 30 2019
Jan 2 2022
Jul 23 2021
Oct 24 2004

Obviously, this doesn't work as expected. Any hints for dealing with this problem?

Regards,
HZ

Percival John Hackworth

unread,
Apr 19, 2022, 11:29:08 PM4/19/22
to
On 19-Apr-2022 at 7:50:04PM PDT, "hongy...@gmail.com"
Sort does have a flag to sort on month abbreviations, but you really want to
sort on the entire date which is all three fields. You'll have to add a field
that's the concatination of all three fields in the form YYYY-MM-DD where you
convert the string month into a 01-12 number, then sort on that. I don't see a
way to do this with just the shell built-ins or standard utilities. You'll
have to write a perl or python script to process this data line by line. Or
pull all the data into Excel and use it's sort columns feature. That's
probably the easiest way you'll be able to deal with this given your questions
on this forum.
--
DeeDee, don't press that button! DeeDee! NO! Dee...

Janis Papanagnou

unread,
Apr 20, 2022, 12:36:21 AM4/20/22
to
Generally I'd try to avoid pathological date formats in the first place
and create and use ISO-dates or convert the dates to ISO format.

Otherwise we probably need more or less clumsy workarounds, e.g. like

while read usdate
do date -d "${usdate}" "+%Y-%m-%d ${usdate}"
done <your_file | sort | cut -d\ -f2-


Janis

>
> Regards,
> HZ
>

Janis Papanagnou

unread,
Apr 20, 2022, 12:48:30 AM4/20/22
to
On 20.04.2022 05:29, Percival John Hackworth wrote:
> On 19-Apr-2022 at 7:50:04PM PDT, "hongy...@gmail.com"
> <hongy...@gmail.com> wrote:
>
>> My data includes the following data formats in each row:
>>
>> $ cat 1111
>> Jul 23 2021
>> Apr 30 2019
>> Jan 2 2022
>> Oct 24 2004
>>
>> I want to sort them according to the date, and tried with sort as follows:
>>
>> $ sort -k1 <1111> Apr 30 2019
>> Jan 2 2022
>> Jul 23 2021
>> Oct 24 2004
>>
>> Obviously, this doesn't work as expected. Any hints for dealing with this
>> problem?
>>
>> Regards,
>> HZ
>
> Sort does have a flag to sort on month abbreviations, but you really want to
> sort on the entire date which is all three fields. You'll have to add a field
> that's the concatination of all three fields in the form YYYY-MM-DD where you
> convert the string month into a 01-12 number, then sort on that. I don't see a
> way to do this with just the shell built-ins or standard utilities.

There are many ways to do that in shell with the Unix utilities;
I posted one in this thread.

> You'll
> have to write a perl or python script to process this data line by line.

Unnecessary.

> Or pull all the data into Excel and use it's sort columns feature. That's
> probably the easiest way you'll be able to deal with this given your questions
> on this forum.

This is about the most stupid suggestion I can think of (yet more
so in a Unix newsgroup). - Don't do any manual steps if you can
automate the task. Don't transfer data from one system to another
if it's unnecessary. Don't switch from a powerful Unix OS to a
Windows OS (or equivalent). - YMMV.

Janis

hongy...@gmail.com

unread,
Apr 20, 2022, 3:06:06 AM4/20/22
to
$ cat usdate
Jul 23 2021
Apr 30 2019
Jan 2 2022
Oct 24 2004

$ while read usdate
do date -d "${usdate}" "+%Y-%m-%d ${usdate}"
done <usdate | sort | cut -d\ -f2-
cut: the delimiter must be a single character
Try 'cut --help' for more information.

marrgol

unread,
Apr 20, 2022, 5:56:54 AM4/20/22
to
Yeah, the same hint you are always given and never take:
read the man page and use the information from there.

For dates in the format shown above this should work:

$ sort -b -k3n,3 -k1M,1 -k2n,2 < 1111


--
mrg

hongy...@gmail.com

unread,
Apr 20, 2022, 6:40:16 AM4/20/22
to
Yes. This works as shown below:

$ cat usdate
Jul 23 2021
Apr 30 2021
Jan 2 2021
Oct 24 2021

$ sort -b -k3n,3 -k1M,1 -k2n,2 usdate
Jan 2 2021
Apr 30 2021
Jul 23 2021
Oct 24 2021

$ sort -b -k3n,3 -k1rM,1 -k2n,2 usdate
Oct 24 2021
Jul 23 2021
Apr 30 2021
Jan 2 2021

But what's the meaning of `,3` used in the `-k3n,3` and other two similar options?

Regards,
HZ

Kenny McCormack

unread,
Apr 20, 2022, 7:23:18 AM4/20/22
to
In article <b57b47bd-f1ad-4af7...@googlegroups.com>,
hongy...@gmail.com <hongy...@gmail.com> wrote:
>My data includes the following data formats in each row:
>
>$ cat 1111
>Jul 23 2021
>Apr 30 2019
>Jan 2 2022
>Oct 24 2004
>
>I want to sort them according to the date, and tried with sort as follows:

(untested)

#!/usr/bin/gawk
BEGIN {
split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec",T)
for (i in T) months[T[i]] = i
}
{ x[mktime[$3 " " months[$1] " " $2 " 0 0 0"] = $0 }
END {
n = asort(x,x,"@val_num_asc")
for (i=1; i<=n; i++) print x[i]
}

P.S. More to type than the "sort(1)" based idea, but a lot more clear
(once you understand GAWK - and if you don't already, you should make it
your top priority to do so). I totally dislike sort(1) - freaky ugly
syntax!

--
People who want to share their religious views with you
almost never want you to share yours with them. -- Dave Barry

Kees Nuyt

unread,
Apr 20, 2022, 7:44:35 AM4/20/22
to
On Wed, 20 Apr 2022 00:06:03 -0700 (PDT), "hongy...@gmail.com"
<hongy...@gmail.com> wrote:

> cut -d\ -f2-
> cut: the delimiter must be a single character
> Try 'cut --help' for more information.

Did you try 'man cut' ?
Did you even try to solve this yourself.
What about cut -d ' ' -f2- ?
Sigh.
--
k

hongy...@gmail.com

unread,
Apr 20, 2022, 8:07:47 PM4/20/22
to
But this code snippet is given by Janis Papanagnou, an expert in this group. So I don't know how he can write like this.

Ed Morton

unread,
Apr 20, 2022, 10:13:05 PM4/20/22
to
They didn't, you copy/pasted it wrong. Take a few seconds to look at the
code Janis provided and think about the way in which the code you
executed is not the same as that and what that means to the shell
interpreting it.

Ed.

Ben

unread,
Apr 21, 2022, 6:38:58 AM4/21/22
to
It's only Hongyi Zhao's fault rather indirectly for using Google Groups.
The -d\ -f2- is correct when I view Janis's post in my newsreader, but
it has already been broken when I view it in GG. And now there's
nothing one can do but spot and correct the fault by hand. Copying the
text just copies the error GG has introduced.

GG has been going downhill for a long time (yes, hard to imagine there
was ever a hill it could go down), but now it seems that 100% correct
advice will be seen by the GG world as wrong for ever!

--
Ben.

Chris Elvidge

unread,
Apr 21, 2022, 7:12:37 AM4/21/22
to
On 21/04/2022 11:38, Ben wrote:
> It's only Hongyi Zhao's fault rather indirectly for using Google Groups.
> The -d\ -f2- is correct when I view Janis's post in my newsreader, but
> it has already been broken when I view it in GG.

HTML is probably the culprit.
Multiple spaces (and tabs) are 'condensed' into one space.

--
Chris Elvidge
England

Richard Harnden

unread,
Apr 21, 2022, 7:13:09 AM4/21/22
to
"usdate" is the variable that read assignes, not your test data file.

You want:
$ cat your_file
Apr 30 2019
Jan 2 2022
Jul 23 2021
Oct 24 2004

>
> $ while read usdate
> do date -d "${usdate}" "+%Y-%m-%d ${usdate}"
> done <usdate | sort | cut -d\ -f2-
> cut: the delimiter must be a single character
> Try 'cut --help' for more information.

and:
$ while read usdate
do
date -d "${usdate}" "+%F ${usdate}"

Spiros Bousbouras

unread,
Apr 21, 2022, 9:59:37 AM4/21/22
to
On Thu, 21 Apr 2022 11:38:53 +0100
Ben <ben.u...@bsb.me.uk> wrote:
> Ed Morton <morto...@gmail.com> writes:
>
> > On 4/20/2022 7:07 PM, hongy...@gmail.com wrote:
> >> On Wednesday, April 20, 2022 at 7:44:35 PM UTC+8, Kees Nuyt wrote:
> >>> On Wed, 20 Apr 2022 00:06:03 -0700 (PDT), "hongy...@gmail.com"
> >>> <hongy...@gmail.com> wrote:
> >>>
> >>>> cut -d\ -f2-
> >>>> cut: the delimiter must be a single character
> >>>> Try 'cut --help' for more information.
[...]

> It's only Hongyi Zhao's fault rather indirectly for using Google Groups.
> The -d\ -f2- is correct when I view Janis's post in my newsreader, but
> it has already been broken when I view it in GG. And now there's
> nothing one can do but spot and correct the fault by hand. Copying the
> text just copies the error GG has introduced.
>
> GG has been going downhill for a long time (yes, hard to imagine there
> was ever a hill it could go down), but now it seems that 100% correct
> advice will be seen by the GG world as wrong for ever!

Not necessarily for ever. If googlegroups internally stores the messages
as they came through NNTP then , when the HTML rendering bug gets fixed ,
the post (and all the others which screw up indendation) will display
correctly. On the other hand , messages which respond to a googlegroups
post through googlegroups , may end up having the mistake forever.

Ben

unread,
Apr 21, 2022, 11:34:45 AM4/21/22
to
Ah, good point. In fact it's just a surface rendering issue that could
be fixed with the right CSS style. If I edit the text of the element
using Chrome's dev tool, the two spaces are there. So not only is the
text stored correctly, it's pushed out to HTML correctly.

--
Ben.

Ben

unread,
Apr 21, 2022, 11:37:17 AM4/21/22
to
Well, it's still GG's fault as that can be fixed using CSS.

--
Ben.

hongy...@gmail.com

unread,
Apr 21, 2022, 9:16:33 PM4/21/22
to
On Wednesday, April 20, 2022 at 5:56:54 PM UTC+8, marrgol wrote:
sort: option '-b' is ignored

$ cat sort-date
Jul 23 2021
Apr 30 2021
Jan 2 2021
Oct 24 2021

$ sort --debug -b -k3n,3 -k1M,1 -k2n,2 sort-date
sort: using ‘en_CA.UTF-8’ sorting rules
sort: option '-b' is ignored
Jan 2 2021
____
___
_
__________
Apr 30 2021
____
___
__
___________
Jul 23 2021
____
___
__
___________
Oct 24 2021
____
___
__
____________

Janis Papanagnou

unread,
Apr 22, 2022, 5:03:23 AM4/22/22
to
Meanwhile others have pointed out to you a couple issues with your
interpretation and newsgroup interface, so I just add two points...

Is there any incentive for using Google Groups instead of a Real
Newsreader? Especially since you post with an extremely high rate
it would certainly be a gain to switch to something more appropriate.

You seem to "work" heavily based on a copy/paste concept; I suggest
to try to _understand_ the solutions provided. Here you had an issue
with the Google Groups interface, but posters may also decide to post
just hints or untested code (where the idea is valid but typos could
have slipped in). Here the posted solution is just three lines long,
and I'd expect that you can read and understand that code, or can
"analyze" it - if there would really be some tricky part you could
always ask about it -, and have a look into the man pages - ideally
before you post questions - to get the usage information of the used
program patterns (if necessary) as a base to understand the intention.

Janis

Chris Elvidge

unread,
Apr 22, 2022, 6:55:15 AM4/22/22
to
On 22/04/2022 10:03, Janis Papanagnou wrote:
> I'd expect that you can read and understand that code, or can
> "analyze" it

As if he would!

--
Chris Elvidge
England

Janis Papanagnou

unread,
Apr 22, 2022, 8:21:37 AM4/22/22
to
On 22.04.2022 12:55, Chris Elvidge wrote:
> On 22/04/2022 10:03, Janis Papanagnou wrote:
>> I'd expect that you can read and understand that code, or can
>> "analyze" it
>
> As if he would!

The nice thing about pipelined commands is that they can be (and
often are) incrementally built, and that the same is possible for
functional analysis (or error tracking); with the code

while read usdate
do date -d "${usdate}" "+%Y-%m-%d ${usdate}"
done <your_file | sort | cut -d\ -f2-

if the syntax around 'cut' produced a copy/paste error, he'd just
need to remove that command from the pipeline and make his own
thoughts about how to extract the data he needs from the output
of the preceding commands. Of course he would have to call a man
page to get the details for a solution, and - granted - I have my
doubts about him taking that approach on a regular basis. But at
leasts that's a possible problem solving step he should be aware of.

It was also a good hint in the replies about Google Groups spoiling
the posting format. That didn't occur to me and I certainly wouldn't
adjust posted solutions to fit for being unharmed by Google Groups'
interface behavior. But a [Google Groups] user may keep that in mind
in cases when some posted code "doesn't [seem to] work".

Janis

0 new messages