On 2020-12-23 20:39, John Cordes wrote:
>> I'd start with this ugly monstrosity:
>>
>> :%s/^2 \u\{3,} \zs\(.*\n\(\%(\D\|3 CONC \).*\n\)\+\)/\='<div
>> class="xxx">'.substitute(substitute(submatch(1), '\n3 CONC ', '',
>> 'g'), '\n', '', 'g')."<\/div>\n"
>
> I will attempt to deconstruct your 'monstrosity' somewhat later,
Tweaking it so that it only does NOTE items, not generic
continuations:
:%s/^2 NOTE \zs\(.*\n\%(\%(\D\|3 CONC \).*\n\)\+\)/\='<div
class="xxx">'.substitute(substitute(submatch(1), '\n3 CONC ', '',
'g'), '\n', '', 'g')."<\/div>\n"
Breaking it down so hopefully you can swap parts as you see fit:
:%s/^2 NOTE \zs On every line starting with "2 NOTE "
start our replacement here (\zs)
\( start capturing the note
this will be submatch(1) later
.* everything else on that line
\n and the newline
\%( a non-capturing group for another line that
\%(\D starts with either a non-digit
\| or
3 CONC a literal "3 CONC "
\) (end of this OR of things marking a continuation)
.*\n followed by the rest of the line
\) (end of this continuation-line)
\+ we can have 1 or more continuation lines
\) end the capturing
/ replace it with
\= the result of evaluating this expression
'<div class="xxx">' the literal opening tag
. and then the results of
substitute( remove all the newlines from the results of
substitute( removing from
submatch(1), the whole set of continuation stuff
'\n3 CONC ', the literal newline-followed-by-"3 CONC "
'', and replace them with nothing
'g' everywhere
), and in that "\n3 CONC "-less text, replace
'\n', newlines with
'', nothing
'g') everywhere
. and then tack on
"<\/div>\n" the literal closing </div> followed by a newline
> It's a bit more complicated than I first explained. Two aspects:
> a) I *do* need to search on the "2 NOTE" lines, since there are
> various other chunks of lines with the CONC lines; and
> b) Sometimes the line "2 TYPE tngnote" has a line between it and
> the "2 NOTE". The intervening line can look like this
>
> 2 DATE 18 AUG 1776
> or this
> 2 _SDATE 1802
Given the substitution command above, it should only touch "2 NOTE"
lines with subsequent "3 CONT" lines. It does *every* "2 NOTE" so if
you need to limit them to just those that immediately follow "2 TYPE
tngnote" (assuming there aren't any "2 TYPE tngnote" that *don't*
have a NOTE immediately following them), you can tweak that command,
changing that inital "%" to
:g/^2 TYPE tngnote//2 NOTE /s/^2 NOTE \zs…
This looks for all the "2 TYPE tngnote" lines, searches forward
(skipping over any DATE/_SDATE lines or other intervening stuff) for
the "2 NOTE " line following it, and then only performs the
subsitution on those particular lines.
I suspect that the problem snuck in by using \(…\) in your added
conditions which captured that as submatch(1). So you can either
make it non-capturing by adding that "%" before the open-paren:
\%(\_^2 .*DATE.*\)
or change the "submatch(1)" to "submatch(2)"
> Here's an example of one small chunk of
> lines which were transformed by that command:
>
> 1 EVEN
> 2 TYPE tngnote
> 2 DATE 18 AUG 1776
> 2 NOTE <div class="xxx">2 DATE 18 AUG 1776</div>
> 1 EVEN
Note that the content here is what you captured in the first group.
:-)
Hope this helps get you on the right path,
-tim