Harry <
harryoo...@hotmail.com> writes:
> I have an html file where I want to delete an "empty row of 7 null values".
>
> The line numbers below are generated by 'cat -n' and are not part of
> the html file.
>
> I want to delete from lines 694 to 716 inclusive. Of course the
> location of this pattern are not always fall on this line range.
This might do what you want:
sed 'H;1h;$!d;x;s!<tr>\n\(<td>\n \n</td>\n\)*</tr>\n!!g'
This is what happens when IBM JCL goes on a date with Teco and one thing
leads to another. It's the code of nightmares.
But it's so useful a nightmare that I have note of this part so I can
reuse it as needed:
sed 'H;1h;$!d;x; <stuff>'
This reads a while file into sed so that that commands in <stuff> apply
to it all. Whole-file reading is sometimes the easiest way to do
something so this is in my "worth knowing" list.
The substitute command uses ! delimiters for convenience and is globally
applied (the g at the end) to match all occurrences. The \(...\)
creates a group to be repeated and the \n should match the newline. If
you have some other line ending, take appropriate action. You can limit
the match to only 7 cells by replacing the * with \{7\}.
Also, note that this is fragile because it does not parse the HTML. It
assumes an exact arrangement of the strings being matched. Even a rogue
extra space in the input will break it. It can be made more robust, but
the effort may not be worth it. You only need to want a little more
flexibility and using something like xsltproc or Perl with an HTML
parser will be a better bet.
<snip>
--
Ben.