Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

replacing cr/lf with <p>

1 view
Skip to first unread message

G. S.

unread,
Jan 28, 2003, 7:56:49 PM1/28/03
to
Hi--

I'm taking wordperfect attachments, stripping the text out of them,
dumping them in a directory and now I am trying to figure out how to
replace cr/lf with <p>cr/lf so that the resulting files can be Blosxom
(weblog) entries.

I have a TCSH script to do everything except for the replacement:

#!/bin/tcsh -v
# convert pissy attachments to text

cd /library/webserver/documents/communicator/

foreach f (*)

strings "$f" > /library/webserver/documents/blosxom/"$f".txt

end
rm *.doc

and it struck me that awk would be the simplest way to do this,
perhaps.

I know I've got to do something with gsub:

{ gsub(/USA/, "United States"); print }

but at that point I'm stuck. I'll look for an online book, but if
anyone wants to throw me a bone that'd be cool :)

Peter S Tillier

unread,
Jan 29, 2003, 2:54:19 AM1/29/03
to
"G. S." <groko...@yahoo.com> wrote in message
news:839599e4.03012...@posting.google.com...

How about:

{ gsub(/USA/, "United States"); print $0 "<p>" }

which ought to do it.

HTH
--
Peter S Tillier
"Who needs perl when you can write dc and sokoban in sed?"


Patrick TJ McPhee

unread,
Jan 29, 2003, 1:31:01 PM1/29/03
to
In article <839599e4.03012...@posting.google.com>,
G. S. <groko...@yahoo.com> wrote:

% I'm taking wordperfect attachments, stripping the text out of them,
% dumping them in a directory and now I am trying to figure out how to
% replace cr/lf with <p>cr/lf so that the resulting files can be Blosxom
% (weblog) entries.

You might be better off starting with, say, wp2latex rather than
using strip to get at the text.

I'm assuming from your use of tcsh that you're on a Unix system. Normally,
awk delimits records using lf under Unix, so cr will be the last character
on each record which has a cr/lf. You could deal with this a few ways. For
instance, you could examine the last character using substr():

{
if (substr($0, length($0) - 1) == "\r") {
print substr($0, 1, length($0) - 1) "<p>\r"
}
else
print
}

or you could make cr be the field separator, and test to see if the last
field on the line is empty, and if so stick <p> into the preceding field:

BEGIN { FS = OFS = "\r" }
NF > 1 && $NF == "" { $(NF-1) = $(NF-1) "<p>" }
{ print }
--

Patrick TJ McPhee
East York Canada
pt...@interlog.com

G. S.

unread,
Jan 29, 2003, 10:53:34 PM1/29/03
to
Thank you, thank you, thank you. I will probably try wp2latex, then
latex-->html. That'd be sweet. I appreciate your help.


pt...@interlog.com (Patrick TJ McPhee) wrote in message news:<F5VZ9.2829$Sq1.1...@news.ca.inter.net>...

Loki Harfagr

unread,
Jan 30, 2003, 11:11:03 AM1/30/03
to
groko...@yahoo.com (G. S.) wrote in
news:839599e4.03012...@posting.google.com:

> { gsub(/USA/, "United States"); print }

Well, as for the awk question context you may
find your grace in the use of output separator chosing (OS OFS)

For what's geopolitical context the assertion is of course false :D)

0 new messages