Because regular expressions are greedy by default, as long as
- alternation is available;
- the first, not the longest, match in an alternation wins; and
- the delimiter length is fixed,
the same approach can be used to exclude strings that contain a targeted
substring, without negative lookahead:
/<!\[CDATA\[([^\]]|\][^\]]|\]\][^>])*\]\]>|<!--([^-]|-[^-])*-->/
Capturing parantheses can then be used to tell the matches apart. Because
in this case the difference between no match and a match of zero length may
be different to tell, the entire expression for matching the targeted
substring should be contained in capturing parantheses (marked below):
/<!\[CDATA\[([^\]]|\][^\]]|\]\][^>])*\]\]>|(<!--([^-]|-[^-])*-->)/
^ ^
As both POSIX awk and GNU awk apparently do not support the use of
references to functions to be called which are passed the match and their
return value is used as the replacement, the only possibility that I can see
is to use the match() function in a loop, whereas one would have to apply it
to consecutive slices of the original string while keeping track of the
position of each match, to tell the matches apart.
A user-defined function may be written so that this oft-needed feature does
not have to be implemented from scratch every time, or a more powerful
programming language may be used as this approach is rather inefficient.
But, in general, one should avoid modifying SGML- and XML-like markup using
regular expressions because those formal languages are just _not_ regular.
We have (or write) markup parsers (based on Nondeterministic Push-Down
Automata) and XSLT for that. For example, BeautifulSoup[1] is a lightweight
markup parser, and there is an XSLT processor as lxml.etree.XSLT, for
Python; and xsltproc(1), a command-line XSLT processor, is contained in
libxslt(3), the XSLT C library for GNOME.
See also <
https://stackoverflow.com/a/1732454/855543> ;-)
_______
[1] <
https://www.crummy.com/software/BeautifulSoup/>
--
PointedEars
FAQ: <
http://PointedEars.de/faq> | <
http://PointedEars.de/es-matrix>
<
https://github.com/PointedEars> | <
http://PointedEars.de/wsvn/>
Twitter: @PointedEars2 | Please do not cc me./Bitte keine Kopien per E-Mail.