I have a file, in a certain format grammer, which I wish to convert to
another grammer, and I'm having trouble converting expressions that span
multiple lines:
For example, changing:
[html]
<pre>
to simply:
[code]
I can't seem to use regular expressions to find the initial multi-line item.
I'm also confused as to which utility I should be using to accomplish this.
I assumed sed would be the best tool for the task, but I'm not overly
familiar with it, so perhaps I'm wrong?
Can regular expressions even span multiple lines?
I've tried ^\[html\]$^<pre> to match it, but I would imagine having multiple
^'s confuse the parser.
I've also tried using \n, but I don't believe that's recognised as valid
syntax.
Essentially, I'm just wondering if anybody has a solution to this using
basic unix utilities (bash, sed, awk, etc). I'd rather not write a C
program to do something that'd probably easier done otherwise.
Thanks,
Jeff
jweeks at mailandnews dot com
Tcl is widely available and its regexp spans multiple lines unless you
specifically tell it not to.
set f [open file-path r]
set buffer [read $f]
close $f
regsub -all {\[html\]\n<pre>} $buffer {[code]} buffer
set f [open file-path w]
puts $f $buffer
close $f
--
Derk Gwen http://derkgwen.250free.com/html/index.html
So....that would make Bethany part black?
These 11 lines work under ksh88, ksh93 and bash. Something shorter?
last=
while IFS= read -r curr; do
if [[ $curr == *"<pre>"* && $last == *"[html]"* ]]; then
printf "%s\n" "[code]"
last=
else
[[ -n "$last" ]] && printf "%s\n" "${last#a}"
last="a$curr"
fi
done
printf "%s\n" "${last#a}"
--
Michael Wang * http://www.unixlabplus.com/ * mw...@unixlabplus.com
In message <9McFa.1192$PD3....@nnrp1.uunet.ca> of Tue, 10 Jun 2003
00:03:06 in comp.unix.programmer, Jeff Weeks <jwe...@mailandnews.com>
writes
>Hello,
>
>I have a file, in a certain format grammer, which I wish to convert to
>another grammer, and I'm having trouble converting expressions that span
>multiple lines:
>
>For example, changing:
>[html]
><pre>
>
>to simply:
>[code]
>
>I can't seem to use regular expressions to find the initial multi-line item.
>I'm also confused as to which utility I should be using to accomplish this.
>I assumed sed would be the best tool for the task, but I'm not overly
>familiar with it, so perhaps I'm wrong?
Others have (implicitly) suggested you RTFM. I explicitly say so.
I learnt a lot from O'Reilly's (pub) "sed and awk" book by Linda Lamb.
Meanwhile, the following shows a solution for the example problem.
C:\WINNT\Temp>bash
bash-2.05b$ sed '/^\[html\]$/{N;/\n<pre>$/s/.*\n.*/[code]/;}' < data
[code]
[html]
irrelevant line
<pre> irrelevant line
bash-2.05b$ cat data
[html]
<pre>
[html]
irrelevant line
<pre> irrelevant line
bash-2.05b$ exit
exit
C:\WINNT\Temp>
sed provides very little to help you debug. = and l (ell) commands
inserted in the script where you are interested can help a lot.
--
Walter Briscoe
Yes, I thought that, so I intended to reply to them all and set
follow-ups to comp.unix.shell by copy-and-paste of "comp.unix.shell"
from the "newsgroups" line ...
> Peter S Tillier <peter_...@despammed.com> wrote a response in
> comp.unix.shell which was invisible to me in comp.unix.programmer.
> The cause of that was the "expletive deleted" OP. My work largely
> duplicates Peter's. It would not have been done if Peter had followed
> the OP's inanity.
>
... but I guess that what I must have done was to paste comp.unix.shell
over the "to" instead. All I can say is that it had been a long night -
sorry.
[...]
--
Peter S Tillier
"Who needs perl when you can write dc, sokoban,
arkanoid and an unlambda interpreter in sed?"
Jeff Weeks schrieb:
sed '/^\[html\]/,/^<pre>/c\
[code]' infile
Regards
Juergen