In the first pass, install perl. :-)
In the second pass, feed your file to a perl script that says
#!/usr/bin/perl
$/ = "\0"; # line sep is something non-existent
$_ = <>; # whomp in entire file
s/([a-z])\n([a-z])/${1}_$2/g; # do it
s/([a-z])\n([a-z])/${1}_$2/g; # in case of single char identifiers
print; # whomp out entire file
Alternately, it's pretty easy to do with sed too. Something like
N
:again
/[a-z]\n[a-z]/{
s/\([a-z]\)\n\([a-z]\)/\1_\2/g
N
b again
}
P
D
In awk, we get something like
{if ($0 ~ /^[a-z]/ && prev ~ /[a-z]$/) ORS="_"
else ORS="\n"
if (prev != "") print prev
prev = $0}
END{ORS="\n"
print prev}
(I'm sure that that could be indented more readably, but I'm scared of
the awk parser.)
Running that through the awk-to-perl translator, we get the following fluff:
#!/usr/bin/perl
eval "exec /usr/local/bin/perl -S $0 $*"
if $running_under_some_shell;
# this emulates #! processing on NIH machines.
# (remove #! line above if indigestible)
eval '$'.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z_]+=)(.*)/ && shift;
# process any FOO=bar switches
$, = ' '; # set output field separator
$\ = "\n"; # set output record separator
while (<>) {
chop; # strip record separator
if ($_ =~ /^[a-z]/ && $prev =~ /[a-z]$/) {
$\ = '_';
}
else {
$\ = "\n";
}
if ($prev ne '') {
print $prev;
}
$prev = $_;
}
$\ = "\n";
print $prev;
or, more idiomatically
#!/usr/bin/perl
chop($prev = <>);
while (<>) {
chop; # strip record separator
$prev .= ($_ =~ /^[a-z]/ && $prev =~ /[a-z]$/) ? '_' : "\n";
print $prev;
$prev = $_;
}
print $prev,"\n";
Larry Wall
lw...@jpl-devvax.jpl.nasa.gov
You know, I've always kind of disliked doing that. Suppose your file
contains all possible byte values 0..255? Something loses. Maybe doing
something like ``undef /;'' to make ``$/'' undefined could be used to
tell perl to just read the whole thing. (Undefined is different from
the null string, right?)
Right, though the incantation would be "undef $/;".
If you are in that situation, then it's easier just to say
read(STDIN, $_, 1000000000);
No doubt you'll now complain that you have a file larger than a gigabyte... :-)
However, your idea has merit (in particular because the above won't read
from <>). In fact, I just implemented it. Thanks.
Larry