Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

changing <CR><LF> to <FS><CR>

50 views
Skip to first unread message

Harry

unread,
Nov 28, 2016, 6:32:45 PM11/28/16
to
I have input file like this, with each record started with "MSH".

MSH|...<CR>
Line 2|...<CR>
Line 3|...<CR>
Line N|...<CR><LF>
MSH|...<CR>
Line 2|...<CR>
Line 3|...<CR>
Line N|...<CR><LF>

I would like to change it so the output would look like this.

<VT>MSH|...<CR>
Line 2|...<CR>
Line 3|...<CR>
Line N|...<FS><CR>
<VT>MSH|...<CR>
Line 2|...<CR>
Line 3|...<CR>
Line N|...<FS><CR>

Where <VT> = vertical tab character,
<CR> = carrige return character,
<LF> = line feed character
<FS> = File Separactor character.

I tried the following ...

sed -e 's#^MSH#\x0BMSH#g' -e 's#\x0D\x0A#\x1C\0D#g' < infile.txt > outfile.txt

But it only work for the "MSH" to <VT>MSH part.
The conversion from <CR><LF> to <FS><CR> did not work;
it ended up being a <LF> only.

What did I do wrong?

TIA

Ben Bacarisse

unread,
Nov 28, 2016, 9:42:50 PM11/28/16
to
Harry <harryoo...@hotmail.com> writes:

> I have input file like this, with each record started with "MSH".
>
> MSH|...<CR>
> Line 2|...<CR>
> Line 3|...<CR>
> Line N|...<CR><LF>
> MSH|...<CR>
> Line 2|...<CR>
> Line 3|...<CR>
> Line N|...<CR><LF>

Something like cat -A (or a hex dump) is better as there's less
ambiguity. Here you say "Line 2" and "Line 3" etc. but there seem to be
only two line feeds in the whole extract. I'm going to assume you've
laid it out with "lines" for convenience and there are no linefeed
characters other than the two you wrote out. I.e.:

$ cat -A data
MSH|...^MLine 2|...^MLine N|...^M$
MSH|...^MLine 2|...^MLine N|...^M$

> I would like to change it so the output would look like this.
>
> <VT>MSH|...<CR>
> Line 2|...<CR>
> Line 3|...<CR>
> Line N|...<FS><CR>
> <VT>MSH|...<CR>
> Line 2|...<CR>
> Line 3|...<CR>
> Line N|...<FS><CR>

So that it's all one (unterminated) line, yes? I.e.:

cat -A data2
^KMSH|...^MLine 2|...^MLine N|...^\^M^KMSH|...^MLine 2|...^MLine N|...^\^M

> I tried the following ...
>
> sed -e 's#^MSH#\x0BMSH#g' -e 's#\x0D\x0A#\x1C\0D#g' < infile.txt > outfile.txt
>
> But it only work for the "MSH" to <VT>MSH part.

sed is fiddly when you want to remove linefeeds. I'd use

$ sed -e 's/^MSH/\x0bMSH/' -e 's/\r$/\x1c\r/' <data | tr -d '\n' | cat -A
^KMSH|...^MLine 2|...^MLine N|...^\^M^KMSH|...^MLine 2|...^MLine N|...^\^M

(I don't think you need the g modifiers.)

--
Ben.

Harry

unread,
Nov 28, 2016, 10:18:09 PM11/28/16
to
On Monday, November 28, 2016 at 6:42:50 PM UTC-8, Ben Bacarisse wrote:

> $ sed -e 's/^MSH/\x0bMSH/' -e 's/\r$/\x1c\r/' <data | tr -d '\n' | cat -A
> ^KMSH|...^MLine 2|...^MLine N|...^\^M^KMSH|...^MLine 2|...^MLine N|...^\^M

There were two characters (^K) in front of 1st MSH, when it should be one (<VT>).
There were two characters (^K) in front of 2nd MSH, when there should be three
(<FS><CR><VT>).

Rakesh Sharma

unread,
Nov 28, 2016, 10:23:12 PM11/28/16
to
First look at how "sed" is splitting up your file into individual records:
sed -n 'l' < infile.txt

MSH|...\rLine 2|...\rLine 3|...\rLine N|...$
MSH|...\rLine 2|...\rLine 3|...\rLine N|...$

that means the "\r\n" combo at the end of record is eaten.
Remember that "sed" implicitly places a "\n" when printing.

sed -e '
s/^/\x0B/; # prefix record with a vertical tab
s/$/\x1C\x0D/; # suffix record with FS+CR AND sed will add a \n of its own
' < infile.txt | tr -d '\n' > outfile.txt

Or equivalently you can write like as below:

sed -e 's/.*/\x0B&\x1C\x0D/' < infile.txt | tr -d '\n' > outfile.txt

N.B.: outfile.txt will have no newlines.

Also note that there was a typo in your sed code in \0D in the replacement
portion of the second s/// statement. It helps if you choose a different
delimiter to the s/// statement other than #. The global flag to the s///
statements are superfluous since the s/// happen once at the end in one s///
and at the beginning in the other s/// for each record.

> sed -e 's#^MSH#\x0BMSH#g' -e 's#\x0D\x0A#\x1C\0D#g'

You could do it staying within bash also:

# constants defined
eval "`echo 'VT=qnq' | tr 'qn' '\47\13'`"
eval "`echo 'CR=qnq' | tr 'qn' '\47\15'`"
eval "`echo 'FS=qnq' | tr 'qn' '\47\34'`"

while IFS= read -r record
do
record=$VT${record%?}$FS$CR
printf '%s' "$record"
done < infile.txt > outfile.txt

Harry

unread,
Nov 28, 2016, 10:42:22 PM11/28/16
to
On Monday, November 28, 2016 at 7:23:12 PM UTC-8, Rakesh Sharma wrote:

> sed -e '
> s/^/\x0B/; # prefix record with a vertical tab
> s/$/\x1C\x0D/; # suffix record with FS+CR AND sed will add a \n of its own
> ' < infile.txt | tr -d '\n' > outfile.txt
>
> Or equivalently you can write like as below:
>
> sed -e 's/.*/\x0B&\x1C\x0D/' < infile.txt | tr -d '\n' > outfile.txt

That did it. Thanks
Nice to know that sed added the '\n' and hence I need tr to get rid of it.

$ od -bc < infile
0000000 115 123 110 174 056 056 056 015 114 151 156 145 040 062 174 056
M S H | . . . \r L i n e 2 | .
0000020 056 056 015 114 151 156 145 040 063 174 056 056 056 015 012 115
. . \r L i n e 3 | . . . \r \n M
0000040 123 110 174 056 056 056 015 114 151 156 145 040 062 174 056 056
S H | . . . \r L i n e 2 | . .
0000060 056 015 114 151 156 145 040 063 174 056 056 056 015 012
. \r L i n e 3 | . . . \r \n
0000076


$ sed -e 's/^/\x0B/;s/$/\x1C\x0D/;' < infile | tr -d '\n' > outfile


$ od -bc < outfile
0000000 013 115 123 110 174 056 056 056 015 114 151 156 145 040 062 174
\v M S H | . . . \r L i n e 2 |
0000020 056 056 056 015 114 151 156 145 040 063 174 056 056 056 034 015
. . . \r L i n e 3 | . . . 034 \r
0000040 013 115 123 110 174 056 056 056 015 114 151 156 145 040 062 174
\v M S H | . . . \r L i n e 2 |
0000060 056 056 056 015 114 151 156 145 040 063 174 056 056 056 034 015
. . . \r L i n e 3 | . . . 034 \r
0000100

Ben Bacarisse

unread,
Nov 29, 2016, 5:25:46 AM11/29/16
to
No there aren't! Maybe I should have posted a hex dump, but I see you
have a similar solution that you can see is working, so all good in the
end.

--
Ben.
0 new messages