matched search string

ee...@hotmail.com

unread,

Nov 1, 2005, 9:19:01 PM11/1/05

to

I am trying to read in a data file that is comma separated and match 4
chars and keep the subsequent date chars and add that to my output
file. e.g.

My input file format is something like this:
BBBB:2005/11/01,BBBC:2005/12/01,BBBD:2005/12/07,BBBB:2005/12/08

I want to read the file in and match BBBB and save <date string> and
output a new 4 char name like ZZZZ and append the saved date string for
that match (in this case ":2005/12/01" and ":2005/12/08" ) so that my
file would then read.

BBBB:2005/11/01,ZZZZ:2005/11/01,BBBC:2005/12/01,BBBD:2005/11/07,BBBB:2005/12/08,ZZZZ:2005/12/08

I have scoured the sed & awk and vi book by O'Reily but cannot see a
similar option other than hold space in sed. Maybe perl??

Thanks in advance,

Mike D

Janis Papanagnou

unread,

Nov 1, 2005, 10:05:59 PM11/1/05

to

It's not apparent whether your file consists of multiple lines like in
the example above or just a stream of characters in one line. In case
it's a one line stream the following awk program does what you want...

BEGIN { ORS=RS="," ; OFS=FS=":" }
{ print $1,$2 ; if ($1 == "BBBB") print "ZZZZ",$2 }

Janis

ee...@hotmail.com

unread,

Nov 2, 2005, 1:15:42 AM11/2/05

to

The file consists of several lines, one record per line where one
record can have many dates and publications (BBBB, BBBD etc). I showed
only one line, sorry!

this worked on all but one occurrence during my tests. Additionally, I
had to modify script as my input file actually contains quotes and
reads:

""BBBB:2005/10/31"",""BBBC:2005/11/01"",""BBBD:2005/11/01""

etc.

thus

BEGIN { ORS=RS="," ; OFS=FS=":" }
{ print $1,$2 ; if ($1 == "\"\"BBBB") print "\"\"ZZZZ",$2 }

The only instance that fails is if BBBB is the last record on the line
and isn't followed by the RS comma. The script fails to translate this
single record.

I am almost at the end of my shift. I will continue trying to get this
working or hopefully you may have a solution by tomorrow.

Thanks again for your excellent solution. You turned my 30 line ksh
script that took about 2 hours to run (on my sparc 20) into a
lightspeed one liner!

Mike D

ee...@hotmail.com

unread,

Nov 2, 2005, 1:24:12 AM11/2/05

to

a quick fix is to append each line with a comma using sed, run the awk
one-liner and strip off the comma afterwards.

sed 's/"$/",/g' inputfile > appended.file
awk -f scriptfile appended.file > awked.file
sed 's/",$/"/g' awked.file > datemod.file

I am sure there is a much more elegant, efficient way to accomplish
this!

thanks again

Mike D

Ed Morton

unread,

Nov 2, 2005, 7:09:13 AM11/2/05

to

Please read these before posting again:

http://cfaj.freeshell.org/google
http://en.wikipedia.org/wiki/Top-posting
http://en.wikipedia.org/wiki/Netiquette

Now, wrt your problem, does this do what you want:

$ cat file

""BBBB:2005/10/31"",""BBBC:2005/11/01"",""BBBD:2005/11/01""

""BBBA:2005/10/31"",""BBBC:2005/11/01"",""BBBB:2005/11/01""
$ awk 'BEGIN {OFS=FS="," }{ for (i=1;i<=NF;i++) if ($i ~ /^\"\"BBBB:/) {
tmp = $i; sub(/BBBB/,"ZZZZ",tmp); $i = $i OFS tmp }}1' file
""BBBB:2005/10/31"",""ZZZZ:2005/10/31"",""BBBC:2005/11/01"",""BBBD:2005/11/01""
""BBBA:2005/10/31"",""BBBC:2005/11/01"",""BBBB:2005/11/01"",""ZZZZ:2005/11/01""

Regards,

Ed.

William James

unread,

Nov 2, 2005, 2:49:49 PM11/2/05

to

BEGIN { FS=OFS="\"\",\"\"" }
{ gsub( /^""|""$/, "" )

for (i=1;i<=NF;i++)

if ( $i ~ /^BBBB/ )
$i = $i FS "ZZZZ" substr($i,5)
print "\"\"" $0 "\"\""
}

laura fairhead

unread,

Nov 2, 2005, 5:11:59 PM11/2/05

to

Hi Mike,

You could just stay with one invocation of 'sed';

sed 's|BBBB:$[^,]*$$,*$|BBBB:\1,ZZZZ:\1\2|g' datemod.file

Provided of course the fields are all in the format given
in the original post ( "BBBB:" can't be embedded in the data).

byefornow
laura

>
>thanks again
>
>Mike D
>

--
echo alru_aa...@ittnreen.tocm |sed 's/$.$$.$/\2\1/g'

Mike Dundas

unread,

Nov 4, 2005, 10:39:38 AM11/4/05

to

"laura fairhead" <run_signature_sc...@INVALID.com> wrote in
message news:436937d8...@news.btinternet.com...

> You could just stay with one invocation of 'sed';
>
> sed 's|BBBB:$[^,]*$$,*$|BBBB:\1,ZZZZ:\1\2|g' datemod.file
>
> Provided of course the fields are all in the format given
> in the original post ( "BBBB:" can't be embedded in the data).
>
> byefornow
> laura

I will try the two new solutions tonight.

Thanks,

Mike