using vim to add <a href= ...> links to an epub index file

77 views
Skip to first unread message

Chris Jones

unread,
Nov 23, 2017, 6:53:58 PM11/23/17
to vim...@googlegroups.com
I am currently in the final stages of putting together an epub version of
Auguste Escoffier's _Le Guide Culinaire_.

Since this is a "cookbook" of sorts, the last step before proofreading
pretty much requires building a working index with html style links to
the text relative to each entry.

In an epub context this can be achieved by wrapping the text of each
entry in something of the form:

<a href="../Text/file.xhtml#p0001">index_entry</a>

where "file.xhtml" is one of the files making up the text of the e-book
and "p0001" has been defined as an "< ... id="p0001"> within the file.

There are over 6000 entries in this index, which (loudly) suggests that
in this instance it might be worth spending a few hours concocting some
form of automated solution to add all the < href > links to the file in
one fell swoop rather than doing it manually.

The index is a repetition of lines with the following structure:


<div class="ind-01"></div>
<div class="ind-02">Abatis</div>
<div class="ind-03">621</div>

<div class="ind-01"></div>
<div class="ind-02">    —     à la Bourguignonne</div>
<div class="ind-03">621</div>

...


After loading the index file in a vim buffer I have found that:

1. I can match all page entries in a non-ambiguous manner by a search
with the following pattern: "/\d\+<"

The match as highlighted via ":set hlsearch" includes the page number
and nothing else and the cursor sits on the first digit of the page
number.

2. I can invoke the following one-liner from vim with the page number as
an argument and it returns the generated link:


#!/bin/bash

grep -o 'p0[0-9][0-9][0-9]' *.htm | \
awk 'BEGIN { FS=":"} {print "<a href=\"../Text/" $1 "#" $2 "\"" ">" }' | \
grep "$1"

exit 0


... like so:

:r ! My_script 0621

generates the link and writes it to the vim buffer:

<a href="../Text/gc0306.htm#p0621">

What I am missing at this point:

1. I need to retrieve the matched string of the current "/\d\+<" search
and place it in some kind of vim variable (?) that I can use to
invoke the script so that it can be done iteratively without having
to tyoe the page number manually:

:r ! my_script $vim_variable

2. I need to find a way to remove any new-line character(s) so that the
output of "My_script $vim_variable" is placed at the right spot in
the buffer: after I invoke the script using ":r ! My_script"... the
output is inserted in column 0 on a new line immediately after the
matching string:


<div class="ind-01"></div>
<div class="ind-02">Abatis</div>
<div class="ind-03">621</div>
<a href="../Text/gc0306.htm#p0621">


3. A third issue is adding the closing "</a>" tag after the targeted
text, thus completing the wrapping of the entry so that the end
result of one iteration looks exactly like this:


<div class="ind-01"></div>
<div class="ind-02"><a href="../Text/gc0306.htm#p0621"> Abatis</a></div>
<div class="ind-03">621</div>


In other words, I need to put together some kind of front-end...
presumably in vimscript (so that I have ability to navigate the lines in
the buffer)... that does the three things described above:

1. grab the current matched string/page number, pass it to the bash
one-liner to generate the corresponding <a href="..."> and return
the result to vim.

2. move the cursor to the first character of the corresponding index
entry (the text and the page number are vertically aligned so that
hitting "k" on the keyboard does exactly that...) and insert the
generated text before the cursor (iow, what a Shift-P would do)

3. jump to the opening "<" of the closing </div> tag and insert "</a>"
before the cursor.

Another approach I considered might consist in recording a vim macro
that would reproduce manual actions at the keyboard and run it
iteratively against the buffer. But I doubt line-mode commands such as
":r ! ..." would be recorded.

Please let me know if this is at all feasible in vim (and vim might
offer better means of achieving what I am trying to do) or whether
I should look at other options.

Thanks,

CJ

porphyry5

unread,
Nov 24, 2017, 12:43:46 PM11/24/17
to vim_use
> C]

Substitute (:h :s) will do all you need. In the case of links and anchors, I modify this model to the specific situation in each case:

:%s/ \(_\(\w\+\)\)/ <a href="#\1">\2<\/a>/g|:%s/^_\w\+$/<a name="&"><\/a>/

Being simple minded, I just ensure that anchors always occur at the start of lines, and that links never do.

Chris Jones

unread,
Nov 25, 2017, 1:25:32 PM11/25/17
to vim_use
On Fri, Nov 24, 2017 at 12:43:46PM EST, porphyry5 wrote:
> On Thursday, November 23, 2017 at 3:53:58 PM UTC-8, Chris Jones wrote:

[..]
>
> Substitute (:h :s) will do all you need. In the case of links and
> anchors, I modify this model to the specific situation in each case:
>
> :%s/ \(_\(\w\+\)\)/ <a href="#\1">\2<\/a>/g|:%s/^_\w\+$/<a > name="&"><\/a>/

Do you mean using submatch(0) to retrieve what /\d\+< actually matched
in the current iteration?

So far this seems to be the only way to retrieve the string that a regex
actually matches... alas, as per the :help submatch vim manual...
submatch() can only be used in the context of the replacement part of
a :substitute command - which is not what I had in mind.

Just curious. I gave up on the idea of using vim in this instance and
wrote a ~10 lines python script that rewrites the file... adding the
links where relevant.

> Being simple minded, I just ensure that anchors always occur at the
> start of lines, and that links never do.

Always try to eat off of a clean plate when you can. The index file as
tidied up by yours truly was nice and clean to start with... My little
script only created ~10 faulty <a href= > links out of the 6,000+...
which took c. 10 minutes to edit.

All the same & just for the hell of it... doing it in vim would have
been more satisfying.

So if you could afford the time... could you explain the vim solution
you had in mind? I'm still interested.

Thanks,

CJ

porphyry5

unread,
Nov 26, 2017, 3:26:23 PM11/26/17
to vim_use

I was referring to the :substitute command, which can use submatch() if need be, but usually is not necessary.
Entering :h :s<Enter> at the command line invokes the help for :substitute. :s is usually employed, being the shortest abbreviation of :substitute vim recognizes
There is an associated function substitute(), which works almost identically to :substitute

You really need to read the help chapters usr_27.txt and pattern.txt (:h usr_27 and :h pattern), I cannot possibly give a brief overview of vim's pattern matching and manipulating ability.

Largely I correct ocr-ed texts and convert them to .txt, .html and .epub. The 2 :s command string I supplied is literally all I ever need to produce the Page No. anchors and within Index links (occasionally I may need to use a minor modification of the pattern). But I do this early in the conversion process, when it is simple to differentiate links from anchors. You have left yours until much later, so your pattern will be more complex, but the general principles still apply.

Reply all
Reply to author
Forward
0 new messages