Martin Τrautmann wrote:
> On Mon, 08 May 2023 11:46:35 -0400, Benjamin Esham wrote:
>> Martin Τrautmann wrote:
>>
>>> Hi all,
>>>
>>> how do I sort by multiple columns?
>>>
>>> [snip]
>>>
>>> I want to sort
>>> * first by column 4, numerical,
>>> * second by column 2
>>> * third by column 3
>>>
>>> So the result should be
>>> +++
>>> Borgentreich;D9444;Auf der Lindenstätte;1;;32;519950.850;5708982.109
>>> Borgentreich;D9444;Auf der Lindenstätte;2;;32;519926.937;5708966.116
>>> Borgentreich;D9444;Auf der Lindenstätte;3;;32;520008.619;5709083.464
>>> Borgentreich;D9444;Auf der Lindenstätte;4;;32;519860.278;5709041.468
>>> Borgentreich;D9386;Lindenstätte;1;;32;520150.696;5709236.354
>>> Borgentreich;T2960;Lindenstätte;1;;32;519778.725;5709026.584
>>> [snip]
>>> +++
>>
>> Hi Martin,
>>
>> Are you sure that this example result is correct? The second, third, and
>> fourth lines have values of 2, 3, and 4 in column 4, but then the fifth
>> line has a value of 1 in column 4. That doesn't seem to match your
>> description of the sorting logic, unless I'm missing something.
>
> Maybe my description is wrong, determined what is a "sticky" sort from
> other applications.
>
> The first sort is done numerical - resulting in the correct order of 1
> to 4, from column 4.
>
> The next order is by column 2, keeping the numerical sort of column 4
> and grouping thise together.
>
> But then I want to do the final sort on column 3, which does resort by
> those names, but does group "Lindenstätte" together, keeping the sort
> oder of the former D9386 vs. T2960
>
> So in "sort" terms, the expected order is not 4>2>3, but 3>2>4,
> depending on how the sort actually does proceed. That's why I had given
> this example - that's the order I have to apply in spreadsheets or
> relational databases for a stepwise pipe.
Ah, I think I understand. What you are calling the "first" sort is the
"innermost" sort, i.e., the sort that is applied *last*, and only if it is
necessary to break the tie between two rows that have been considered equal
by all of the previous sorting steps.
If I weren't able to get the results I wanted from sort(1), not even with
the GNU extensions, I would probably jump straight to importing the data
into an in-memory SQLite database and expressing the sort as a SQL query. In
your case, you would need to replace the semicolon separators with commas
and add a header line like
col1,col2,col3,col4,col5,col6,col7,col8
at the top of the file. Then, assuming the input is in a file named "input",
you could run
sqlite3 -noheader :memory: '.mode csv' .output \
".import '|cat -' data"
'select * from data order by col3, col2, cast(col4 as integer)' \
< input > output.csv
This produces a CSV file that obeys the informal standard that empty values,
and values with spaces in them, are double-quoted. If you don't want that,
you could replace the semicolons in the input with tabs, change the commas
to tabs in the header line, and run with ".mode tabs" instead of ".mode
csv". This would give you tab-separated output with no extra double quotes.
Of course, you allude to having this data in a relational database already.
I don't know the details of your situation, but in general a database seems
much better suited to this kind of manipulation than command-line tools that
can only deal with text.
Hope this helps!
Benjamin