approaches for reformatting data points into pairs

Bryan

unread,

Mar 17, 2023, 10:21:09 AM3/17/23

to

I am interested in generally organizing a long string of comma-separated numbers ("CSV" or "CVS") in different ways. For instance, I'd like to get every other pair of numbers (see below for work). This might be useful and extendable for basic mathematical analysis, or practical reformatting of program output. E.g. svg files have paths with such features (see the "q" or "c" commands), or for plotting different sets the data, e.g. every other pair, or other combinations. (However, I note that the gnuplot "every" command is also useful for this).

For example this sequence:

-10.000000,-9.000000,-8.000000,-7.000000,-[...trim...]7.000000,8.000000,9.000000,10.000000

can be put into different groups, for example these "x,y" data points :

-10.000, -9.000
-8.000, -7.000
-6.000, -5.000
-4.000, -3.000
-2.000, -1.000
0.000, 1.000
2.000, 3.000
4.000, 5.000
6.000, 7.000
8.000, 9.000
10.000,

(note there is no partner for the last pair). This script will do that (with extra details shown to help follow the processes):

awk_dev_test_seq=$(seq -s',' -f '%f' -10 10)
gawk -F, '
{
{
for (i=1;i<=NF;i++ )
{
if ( i % 2 == 0 ) printf("i=%s Y:%3.3f%s ", i, $i, "\n")
else
printf("i=%s X:%3.3f%s ", i, $i, ",")
}
}
}' <<EOF
${awk_dev_test_seq}
EOF

The number in (i % 2 == 0 ) can be adjusted to get e.g. each line containing the three consecutive numbers by changing "i % 2" to "i % 3". results :

i=1 X:-10.000, i=2 X:-9.000, i=3 Y:-8.000

... and so on. I have been looking at how to do other groupings of the data - for example, getting every other *pair* of numbers would be interesting, illustrated in this pseudo-output :

keep this line : -10.000, -9.000
Skip this line->-8.000, -7.000
keep this line : -6.000, -5.000
Skip this line-> -4.000, -3.000
keep this line : -2.000, -1.000

I am asking what approaches might be best to do that in awk - if/else, while, for, or other control sequences (I think is the term for those).

Tried to keep this short, but I'll note some interesting postings on this topic :

"Parsing standard CVS data by gawk"
https://lists.gnu.org/archive/html/bug-gawk/2015-07/msg00002.html
"CSV parsing with awk"
https://backreference.org/2010/04/17/csv-parsing-with-awk/index.html

-Bryan

Janis Papanagnou

unread,

Mar 17, 2023, 11:09:30 AM3/17/23

to

I'm not sure you want some "universal" script or just hints for coding
variants. For the former case you should specify the requirements
accurately. In the latter case see below...

> awk_dev_test_seq=$(seq -s',' -f '%f' -10 10)
> gawk -F, '
> {
> {
> for (i=1;i<=NF;i++ )
> {
> if ( i % 2 == 0 ) printf("i=%s Y:%3.3f%s ", i, $i, "\n")
> else
> printf("i=%s X:%3.3f%s ", i, $i, ",")
> }
> }
> }' <<EOF
> ${awk_dev_test_seq}
> EOF

Personally I'd take a (slightly) different approach here, like doing
a handling of irregular (odd) cases

awk -F, '
NF % 2 == 1 { ...in case of odd number of fields - what to do?... }
NF % 2 == 0 { ...(regular?) case of even number of fields... }
'

(The second condition may be irrelevant if you use the first action
to fix your data, and you can fall through in the regular case.)

For the iteration I'd do

for (i=1; i<=NF; i+=2) # i.e. increment by 2

and print a pair of numbers in one single print statement

printf "X:%3.3f%s,Y:%3.3f%s\n", $i, $(i+1)

(adjust the formatting string and arguments as desired).

In case you want to skip a data pair adjust the increment
appropriately, say, by i+=4 (for your example below), or by
i+=3 if you want to skip a data value (say a Z-coordinate).

Janis

Bryan

unread,

Mar 17, 2023, 12:32:05 PM3/17/23

to

On Friday, March 17, 2023 at 11:09:30 AM UTC-4, Janis Papanagnou wrote:
> Personally I'd take a (slightly) different approach here, like doing
> a handling of irregular (odd) cases
>
> awk -F, '
> NF % 2 == 1 { ...in case of odd number of fields - what to do?... }
> NF % 2 == 0 { ...(regular?) case of even number of fields... }
> '
>
> (The second condition may be irrelevant if you use the first action
> to fix your data, and you can fall through in the regular case.)

This is interesting, thanks.

> For the iteration I'd do
>
> for (i=1; i<=NF; i+=2) # i.e. increment by 2
>
> and print a pair of numbers in one single print statement
>
> printf "X:%3.3f%s,Y:%3.3f%s\n", $i, $(i+1)
>
> (adjust the formatting string and arguments as desired).
>
> In case you want to skip a data pair adjust the increment
> appropriately, say, by i+=4 (for your example below), or by
> i+=3 if you want to skip a data value (say a Z-coordinate).

that idea - in the following script - appears to be exactly what I mean:

awk_dev_test_seq=$(seq -s',' -f '%f' -10 10)
gawk -F, '
{

for (i=1; i<=NF; i+=4 )
printf ( "i=%s %3.3f %3.3f \n", i, $i, $(i+1) )
}' <<EOF
${awk_dev_test_seq}
EOF

output:

i=1 -10.000 -9.000
i=5 -6.000 -5.000
i=9 -2.000 -1.000
i=13 2.000 3.000
i=17 6.000 7.000

That helped a lot, thank you.

-Bryan

Janis Papanagnou

unread,

Mar 17, 2023, 11:14:34 PM3/17/23

to

On 17.03.2023 17:32, Bryan wrote:
> printf ( "i=%s %3.3f %3.3f \n", i, $i, $(i+1) )

I see you added parenthesis. But note that 'printf' - as 'print',
but as opposed to 'sprintf()' - is a statement, not a function.
Just by the way.

Janis

Kenny McCormack

unread,

Mar 18, 2023, 12:17:28 AM3/18/23

to

In article <tv3aao$2aucf$1...@dont-email.me>,

Janis Papanagnou <janis_pap...@hotmail.com> wrote:
>On 17.03.2023 17:32, Bryan wrote:
>> printf ( "i=%s %3.3f %3.3f \n", i, $i, $(i+1) )
>
>I see you added parenthesis. But note that 'printf' - as 'print',
>but as opposed to 'sprintf()' - is a statement, not a function.

Although you don't say so explicitly, the implication is that using
parentheses with printf is wrong. This implication is incorrect.

Although the parens are optional in most cases, they are necessary in
certain cases. I always use them (when I use printf in awk), because:

1) It looks better (IMHO, of course). It conforms more to what we
would expect to see in C.
2) It is necessary in certain cases, so might as well use them always.

--
The randomly chosen signature file that would have appeared here is more than 4
lines long. As such, it violates one or more Usenet RFCs. In order to remain
in compliance with said RFCs, the actual sig can be found at the following URL:
http://user.xmission.com/~gazelle/Sigs/DanQuayle

Janis Papanagnou

unread,

Mar 18, 2023, 1:20:01 AM3/18/23

to

On 18.03.2023 05:17, Kenny McCormack wrote:
> In article <tv3aao$2aucf$1...@dont-email.me>,
> Janis Papanagnou <janis_pap...@hotmail.com> wrote:
>> On 17.03.2023 17:32, Bryan wrote:
>>> printf ( "i=%s %3.3f %3.3f \n", i, $i, $(i+1) )
>>
>> I see you added parenthesis. But note that 'printf' - as 'print',
>> but as opposed to 'sprintf()' - is a statement, not a function.
>
> Although you don't say so explicitly, the implication is that using
> parentheses with printf is wrong. This implication is incorrect.

There was no implication that they are wrong - actually they work.

But to know that it is a statement and not a function allows you
to understand how the mechanics are, and to derive explanations
for cases in which expressions parentheses are necessary, and in
these cases it's not because of [wrongly assuming] that it is a
function. In other words; knowing the difference allows to grasp
the semantics of these language construct.

>
> Although the parens are optional in most cases, they are necessary in
> certain cases. I always use them (when I use printf in awk), because:
>
> 1) It looks better (IMHO, of course). It conforms more to what we
> would expect to see in C.
> 2) It is necessary in certain cases, so might as well use them always.

That's worth religious wars. :-) I'll abstain.

Janis