script to format line

roger vaede

unread,

May 25, 2012, 3:52:16 PM5/25/12

to

I have a file that doesn't have a field delimiter between the columns and I want to add a colon between each column. I can identify the columns by hard coding the range:
such as "col1 to col 10" "col11 to col24" etc...

I am stuck.

Ed Morton

unread,

May 25, 2012, 4:13:29 PM5/25/12

to

If there's no delimiter, how do you know when one column ends and the next
begins? Post some sample input and expected output.

Ed.

Barry Margolin

unread,

May 25, 2012, 4:26:34 PM5/25/12

to

In article <jpop5a$vpq$1...@dont-email.me>,

He said he knows them by character position: the first column is
characters 1-10, the next column is 11-24, and so on.

sed 's/^$.\{10\}$$.\{14\}$.../\1:\2:.../'

--
Barry Margolin, bar...@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***

roger vaede

unread,

May 25, 2012, 4:45:38 PM5/25/12

to

A brilliant way to handle this, thank you

Jon LaBadie

unread,

May 25, 2012, 4:52:02 PM5/25/12

to

Look at gnu awk's FIELDWIDTHS variable. As an example:

gawk '
BEGIN { FIELDWIDTHS = "2 6 3 7 21" }
{ print $1 ":" $2 ":" $3 ":" $4 ":" $5 }
' datafile

Kenny McCormack

unread,

May 25, 2012, 5:08:05 PM5/25/12

to

In article <jpordj$fdh$1...@dont-email.me>,

Better:

BEGIN { FIELDWIDTHS = "2 6 3 7 21";OFS=":" }
$1=$1

Note1: Purist will object that there is a (subtle) bug in the above code. I
personally consider it a feature, but you are, of course, free to edit it as
you see fit.

Note2: (Obligatory net-copping) Please don't multi-post.

--
The motto of the GOP "base": You can't be a billionaire, but at least you
can vote like one.

Ed Morton

unread,

May 25, 2012, 10:46:55 PM5/25/12

to

On 5/25/2012 3:26 PM, Barry Margolin wrote:
> In article<jpop5a$vpq$1...@dont-email.me>,
> Ed Morton<morto...@gmail.com> wrote:
>
>> On 5/25/2012 2:52 PM, roger vaede wrote:
>>> I have a file that doesn't have a field delimiter between the columns and I
>>> want to add a colon between each column. I can identify the columns by
>>> hard coding the range:
>>> such as "col1 to col 10" "col11 to col24" etc...
>>>
>>> I am stuck.
>>
>> If there's no delimiter, how do you know when one column ends and the next
>> begins? Post some sample input and expected output.
>
> He said he knows them by character position: the first column is
> characters 1-10, the next column is 11-24, and so on.

I had no idea that's what he was trying to say. If he'd said "char1 to char 10"
instead of "col1 to col 10" I'd have got it!

> sed 's/^$.\{10\}$$.\{14\}$.../\1:\2:.../'

I'd probably fine gawks FIELDWIDTHS simpler but either way...

Ed.

Barry Margolin

unread,

May 26, 2012, 9:10:35 AM5/26/12

to

In article <jppg71$va5$1...@dont-email.me>,

Ed Morton <morto...@gmail.com> wrote:

> On 5/25/2012 3:26 PM, Barry Margolin wrote:
> > In article<jpop5a$vpq$1...@dont-email.me>,
> > Ed Morton<morto...@gmail.com> wrote:
> >
> >> On 5/25/2012 2:52 PM, roger vaede wrote:
> >>> I have a file that doesn't have a field delimiter between the columns and
> >>> I
> >>> want to add a colon between each column. I can identify the columns by
> >>> hard coding the range:
> >>> such as "col1 to col 10" "col11 to col24" etc...
> >>>
> >>> I am stuck.
> >>
> >> If there's no delimiter, how do you know when one column ends and the next
> >> begins? Post some sample input and expected output.
> >
> > He said he knows them by character position: the first column is
> > characters 1-10, the next column is 11-24, and so on.
>
> I had no idea that's what he was trying to say. If he'd said "char1 to char
> 10"
> instead of "col1 to col 10" I'd have got it!

I couldn't interpret "I can identify the columns by hard coding the
range" any other way. I realized he confused things when reusing the
word "column" in the description, though, but I was able to work through
it.

> > sed 's/^$.\{10\}$$.\{14\}$.../\1:\2:.../'
>
> I'd probably fine gawks FIELDWIDTHS simpler but either way...

I'm on a Mac, it doesn't come with gawk, for some reason. Sed is more
universal, although I wonder if all versions recognize \{n\} syntax.

Kenny McCormack

unread,

May 26, 2012, 11:09:17 AM5/26/12

to

In article <jppg71$va5$1...@dont-email.me>,
Ed Morton <morto...@gmail.com> wrote:
...

>> sed 's/^$.\{10\}$$.\{14\}$.../\1:\2:.../'
>
>I'd probably fine gawks FIELDWIDTHS simpler but either way...

But will GAWK be willing to pay the fine?

Anyway, yes, agreed, and sed solutions are always ugly. That's why sed
people like them.

P.S. As noted earlier, the best way to do this is using FIELDWIDTHS and OFS.

--
But the Bush apologists hope that you won't remember all that. And they
also have a theory, which I've been hearing more and more - namely,
that President Obama, though not yet in office or even elected, caused the
2008 slump. You see, people were worried in advance about his future
policies, and that's what caused the economy to tank. Seriously.

(Paul Krugman - Addicted to Bush)

Kenny McCormack

unread,

May 26, 2012, 11:20:42 AM5/26/12

to

In article <barmar-069A1E....@news.eternal-september.org>,

Barry Margolin <bar...@alum.mit.edu> wrote:
...
>I'm on a Mac, it doesn't come with gawk, for some reason. Sed is more
>universal, although I wonder if all versions recognize \{n\} syntax.

The reason is because OSX is based on the BSD line of Unix, not the
Solaris/Linux/GNU line. So, you get what BSD'ers get.

Just out of cuiosity, are you looking to get GAWK for your Mac?

I ask because I had to go through a bit of pain to get it for myself, and
having done so, I'm willing to talk about it. I could send you a link to
the binary. Or, you could install Xcode - which does work, and allows you
to compile all the usual stuff - but Apple, in their majesty makes it a lot
harder to install than it should be. They make you install not only the
sink, but the entire kitchen, when all you want is a faucet and a drain...

--

Some of the more common characteristics of Asperger syndrome include:

* Inability to think in abstract ways (eg: puns, jokes, sarcasm, etc)
* Difficulties in empathising with others
* Problems with understanding another person's point of view
* Hampered conversational ability
* Problems with controlling feelings such as anger, depression
and anxiety
* Adherence to routines and schedules, and stress if expected routine
is disrupted
* Inability to manage appropriate social conduct
* Delayed understanding of sexual codes of conduct
* A narrow field of interests. For example a person with Asperger
syndrome may focus on learning all there is to know about
baseball statistics, politics or television shows.
* Anger and aggression when things do not happen as they want
* Sensitivity to criticism
* Eccentricity
* Behaviour varies from mildly unusual to quite aggressive
and difficult

Ben Bacarisse

unread,

May 26, 2012, 12:27:49 PM5/26/12

to

roger vaede <rvae...@gmail.com> writes:

> I have a file that doesn't have a field delimiter between the columns and I want to add a colon between each column. I can identify the columns by hard coding the range:
> such as "col1 to col 10" "col11 to col24" etc...

A bit late because you have a solution, but if you have a suitable
version of cut, you get the simplest solution so far:

cut --output-delimiter=: -b 1-10,11-24

(extend the list of column positions as required).

--
Ben.

Ed Morton

unread,

May 26, 2012, 1:19:54 PM5/26/12

to

On 5/26/2012 8:10 AM, Barry Margolin wrote:
> In article<jppg71$va5$1...@dont-email.me>,

> Ed Morton<morto...@gmail.com> wrote:
>
>> On 5/25/2012 3:26 PM, Barry Margolin wrote:
>>> In article<jpop5a$vpq$1...@dont-email.me>,
>>> Ed Morton<morto...@gmail.com> wrote:
>>>
>>>> On 5/25/2012 2:52 PM, roger vaede wrote:
>>>>> I have a file that doesn't have a field delimiter between the columns and
>>>>> I
>>>>> want to add a colon between each column. I can identify the columns by
>>>>> hard coding the range:
>>>>> such as "col1 to col 10" "col11 to col24" etc...
>>>>>
>>>>> I am stuck.
>>>>
>>>> If there's no delimiter, how do you know when one column ends and the next
>>>> begins? Post some sample input and expected output.
>>>
>>> He said he knows them by character position: the first column is
>>> characters 1-10, the next column is 11-24, and so on.
>>
>> I had no idea that's what he was trying to say. If he'd said "char1 to char
>> 10"
>> instead of "col1 to col 10" I'd have got it!
>
> I couldn't interpret "I can identify the columns by hard coding the
> range" any other way. I realized he confused things when reusing the
> word "column" in the description, though, but I was able to work through
> it.

I thought at this point:

he was describing his tool's desired output, not the input. I thought he was
saying that the intent of the tool was to let him select ranges of columns using
hard-coded numbers.

Ed.

Barry Margolin

unread,

May 26, 2012, 2:48:23 PM5/26/12

to

In article <jpr3bs$s00$1...@dont-email.me>,

I thought the intent of the tool was to add colons between the fields.

Ed Morton

unread,

May 26, 2012, 8:41:56 PM5/26/12

to

On 5/26/2012 1:48 PM, Barry Margolin wrote:
> In article<jpr3bs$s00$1...@dont-email.me>,

Me too but I also thought he was selecting which fields to output.

Ed.

roger vaede

unread,

May 27, 2012, 10:05:12 AM5/27/12

to

Thanks all for the imput. I found out that needed to filter a field so I can not use sed. On Tuesday I will try to use awk.

roger vaede

unread,

May 29, 2012, 8:40:49 AM5/29/12

to

On Friday, May 25, 2012 3:52:16 PM UTC-4, roger vaede wrote:

I would like to filter out fields as I am reading each record. How can I do this using this? I wanted to read two bytes example the 23rd field and then
use FIELDWIDTHS = "2 6 3 7 21" if the value is 15
use FIELDWIDTHS = "5 10 7 9 32" if the value is 16
use FIELDWIDTHS = "21 16 13 17 18" if the value is 17

Barry Margolin

unread,

May 29, 2012, 9:37:23 AM5/29/12

to

In article <540a216d-2145-4de8...@googlegroups.com>,

roger vaede <rvae...@gmail.com> wrote:

> On Friday, May 25, 2012 3:52:16 PM UTC-4, roger vaede wrote:
> > I have a file that doesn't have a field delimiter between the columns and I
> > want to add a colon between each column. I can identify the columns by
> > hard coding the range:
> > such as "col1 to col 10" "col11 to col24" etc...
> >
> > I am stuck.
>
> I would like to filter out fields as I am reading each record. How can I do
> this using this? I wanted to read two bytes example the 23rd field and then

What does "example the 23rd field" mean? Did you mean "examine the 23rd
field"? But your list of field widths only has 5 fields, how can there
be a 23rd field?

> use FIELDWIDTHS = "2 6 3 7 21" if the value is 15
> use FIELDWIDTHS = "5 10 7 9 32" if the value is 16
> use FIELDWIDTHS = "21 16 13 17 18" if the value is 17
>
> gawk '
> BEGIN { FIELDWIDTHS = "2 6 3 7 21" }
> { print $1 ":" $2 ":" $3 ":" $4 ":" $5 }
> ' datafile

I'm not really sure what you're trying to do (is English your native
language?). "Filter out fields" doesn't seem to mean the same thing as
"change character ranges". If you want to filter out fields, just leave
those out of the print statement:

{ print $1 ":" $3 ":" $7 }

will filter out fields 2, 4, 5, and 6.

If you really mean that the character ranges are changing, you can
change FIELDWIDTHS while processing a record, then use $1 = $1 to force
gawk to reparse the record. Then you can access $1, $2, $3, etc. to get
the fields using the new widths.

roger vaede

unread,

May 29, 2012, 9:43:49 AM5/29/12

to

On Friday, May 25, 2012 3:52:16 PM UTC-4, roger vaede wrote:

What I need to do is read each record and check byte 23 for two characters.
if the two characters = 14 then I need to format the record by adding a colon
after columns 15, 18 , 26, 56, 78
if the two characters = 15 then I need format the record by adding a colon
after columns 34, 56, 78, 92. etc.....

all these are examples
Let me know if thats clear

Barry Margolin

unread,

May 29, 2012, 10:07:25 AM5/29/12

to

In article <c9730420-2604-419e...@googlegroups.com>,

gawk 'BEGIN { OFS = ":" }
{ key = substr($0, 23, 2);
if (key == 14) { FIELDWIDTHS = "15 3 8 30 20" }
else if (key == 15) { FIELDWIDTHS = "34 22 22 14" }
...
$1 = $1; print $0 }
' filename

roger vaede

unread,

May 29, 2012, 10:45:13 AM5/29/12

to

On Friday, May 25, 2012 3:52:16 PM UTC-4, roger vaede wrote:

Thanks. Its very close here is my script and output. When the script looks at key 12, its suppose to output this:
shu:tt:ing :

[root@server05 tmp]# ./test2.sh
+ gawk 'BEGIN { OFS = ":" }
{ key = substr($0, 35, 2);
if (key == "14") { FIELDWIDTHS = "5 3 8" }
else if (key == "12") { FIELDWIDTHS = "3 2 4" }
else if (key == "17") { FIELDWIDTHS = "7 6 4" }

$1 = $1; print $0 }

' out3

OUTPUT:

Shutting:hello:ducky:mama:12
zoo:lo:o
Bring:ing: papa
today : koo:l t

INPUT FILE:
Shutting hello ducky mama 12
zooloo cool baby cookie 14
Bringing papa goody shoe 17
today kool tommy who 12

Barry Margolin

unread,

May 29, 2012, 12:03:27 PM5/29/12

to

In article <c0af6945-6aec-491f...@googlegroups.com>,

roger vaede <rvae...@gmail.com> wrote:

> On Friday, May 25, 2012 3:52:16 PM UTC-4, roger vaede wrote:
> > I have a file that doesn't have a field delimiter between the columns and I
> > want to add a colon between each column. I can identify the columns by
> > hard coding the range:
> > such as "col1 to col 10" "col11 to col24" etc...
> >
> > I am stuck.
>
> Thanks. Its very close here is my script and output. When the script looks
> at key 12, its suppose to output this:

Looks like $1=$1 isn't forcing a reparse of the current line, like I
expected. So the change to FIELDWIDTHS affects the next line instead of
the current line.

> shu:tt:ing :
>
>
> [root@server05 tmp]# ./test2.sh
> + gawk 'BEGIN { OFS = ":" }
> { key = substr($0, 35, 2);
> if (key == "14") { FIELDWIDTHS = "5 3 8" }
> else if (key == "12") { FIELDWIDTHS = "3 2 4" }
> else if (key == "17") { FIELDWIDTHS = "7 6 4" }
> $1 = $1; print $0 }
> ' out3
>
> OUTPUT:
>
> Shutting:hello:ducky:mama:12
> zoo:lo:o
> Bring:ing: papa
> today : koo:l t
>
> INPUT FILE:
> Shutting hello ducky mama 12
> zooloo cool baby cookie 14
> Bringing papa goody shoe 17
> today kool tommy who 12

Harrie

unread,

May 29, 2012, 2:27:45 PM5/29/12

to

roger vaede said the following on 2012-05-29 16:45 (+0200):

[snip]
> [root@server05 tmp]# ./test2.sh

You're (mis)using the root account for developing and testing shell scripts?

Your data seems just "user data", I don't see a need for root here. If
the data isn't accessible for a user account you use (*if* you use that
and you should), copy it so that user has read access or make other
(sane) arrangements.

Learn to use su/sudo (or equivalent) and only use root privileges when
needed.

--
Regards, Override Internet Solutions
Harrie http://www.override.nl/