Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

scripting help

12 views
Skip to first unread message

rvae...@gmail.com

unread,
May 25, 2012, 4:01:05 PM5/25/12
to
I have a file that doesn't have a field delimiter between the columns and I want to add a colon between each column. I can identify the columns by hard coding the range:
such as "col1 to col 10" "col11 to col24" etc...

I am stuck.

J.O. Aho

unread,
May 25, 2012, 4:32:31 PM5/25/12
to
I'm assuming you have a space between the columns, then you could run
the following:

sed 's/ /:/g' -i filename

before you run that one, you may want to test and see it give the result
you want by not using the -i option:


sed 's/ /:/g' filename


--

//Aho

Aragorn

unread,
May 26, 2012, 4:54:54 AM5/26/12
to
On Friday 25 May 2012 22:32, J.O. Aho conveyed the following to
alt.linux...

> On 25/05/12 22:01, rvae...@gmail.com wrote:
>
>> I have a file that doesn't have a field delimiter between the columns
>> and I want to add a colon between each column. I can identify the
>> columns by hard coding the range: such as "col1 to col 10" "col11 to
>> col24" etc...
>>
>> I am stuck.
>
> I'm assuming you have a space between the columns, [...

A space is a field separator too. From what he says, his file is simply
comprised of strings of characters.

> ...] then you could run the following:
>
> sed 's/ /:/g' -i filename
>
> before you run that one, you may want to test and see it give the
> result you want by not using the -i option:
>
> sed 's/ /:/g' filename

A solution would be to use /bin/cut with the "-c" option, given that he
knows the ranges of characters, and to then create a file with a proper
field separator from that.

The problem is most likely going to be that his fields have variable
lengths, so that he can't just cut up the records - i.e. the newline-
delimited strings - into equally sized slices. That makes it harder to
automate this procedure by way of a script.

The OP would probably get more specialized help in comp.unix.shell -
where all the scripting gurus live :p - but he will either way need to
provide more information, such as the number of fields he wishes to
define and whether these fields are of variable length. I'm no expert
on awk, but I think awk /might/ possibly provide for an easier way to do
this than pure Bourne shell code.

It is also going to be tricky in that, if there is a way to automate
this by way of a loop - i.e. reiteratively insert a field separator
character into each string until the end of the record - then one must
also keep into account that with every insertion, the number of
characters in the record increases, because field separator characters
are characters too. So the positioning of the insertion point will
always be off by 1 against the previous iteration.

As stated higher up, OP needs to provide more information.

--
= Aragorn =
(registered GNU/Linux user #223157)

rvae...@gmail.com

unread,
May 27, 2012, 10:15:18 AM5/27/12
to
The fields are not variable.
Example:
name 1-10
city 11-20
phone 21-30

J G Miller

unread,
May 27, 2012, 11:49:44 AM5/27/12
to
On Friday, May 25th, 2012, at 22:32:31h +0200, J.O. Aho suggested:

> I'm assuming you have a space between the columns, then you could run
> the following:
>
> sed 's/ /:/g' -i filename

I think the problem is that there are multiple spaces between the columns
if on a particular line the value does not fill the field, so that sed
expression would just replace all of the spaces with colons.

Perhaps this would be a good time to use awk if what R V Aedex23
actually wants to do is to replace the multiple spaces with a single colon?

Assuming three columns of data ...

awk ' { printf ("%s:%s:%s\n", $1, $2, $3) } ' data.txt

J.O. Aho

unread,
May 27, 2012, 11:55:05 AM5/27/12
to
sed 's/\(.\{10\}\)\(.\{10\}\)\([0-9+-]*\)/\1;\2;\3/' -i filename

This will not strip white spaces, this would:

sed 's/\(\w\{1,10\}\)\s\{0,9\}\(\w\{1,10\}\)\s\{0,9\}\([0-9+-]*\)/\1;\2;\3/'

In case you would like to make a proper csv:

sed 's/\(\w\{1,10\}\)\s\{0,9\}\(\w\{1,10\}\)\s\{0,9\}\([0-9+-]*\)/"\1",
"\2", "\3"/'

--

//Aho

Tom P

unread,
May 28, 2012, 5:45:32 AM5/28/12
to
Do you really need a script? Do you have Open office or LibreOffice?
Then start the spreadsheet (Calc), Edit ->Paste Special, and use the
"fixed width" option to get the data into columns. Then save the file as
a CSV file and choose ":" as the field separator.

J.O. Aho

unread,
May 28, 2012, 8:39:21 AM5/28/12
to
i can't answer for the OP, but there are some drawbacks using something
like Open/Libre/StarOffice:

- You may have to move the files from a remote server and then back to
the remote server (a proper server don't have a X11 running).
- You may have more than a handful files which has to be processed
- You may need to do this on a regular bases

Having a small script which does this can be a lot more handy than using a
big office application, but if it's a one time thing, then it would be
faster to use the office package.

--
//Aho

Tom P

unread,
May 28, 2012, 8:47:12 AM5/28/12
to
I agree. We don't know if the OP is just doing this as a one-off job or
on a regular basis. I had a similar problem to his just recently, and
did it this way.

rvae...@gmail.com

unread,
May 28, 2012, 9:27:01 PM5/28/12
to
On Friday, May 25, 2012 4:01:05 PM UTC-4, rvae...@gmail.com wrote:
Yes I do need a script, I have to filter each record and then use an if condition.

J G Miller

unread,
May 29, 2012, 7:33:17 AM5/29/12
to
On Monday, May 28th, 2012, at 18:27:01h -0700, R V Aedex23 wrote:

> Yes I do need a script, I have to filter each record and then use an if condition.

So awk or perl would probably be a good choice.

Did you look at the awk I suggested?
0 new messages