I wrote a plugin for my
gwern.net Hakyll script
(
https://www.gwern.net/hakyll.hs) which was slightly tricky, and so
might be of interest.
Bringhurst & other typographers recommend using small-caps for
acronyms/initials of 3 or more capital letters because with full
capitals, they look too big and dominate the page (eg Bringhurst 2004,
_Elements_ pg47; cf
https://en.wikipedia.org/wiki/Small_caps#Uses
http://theworldsgreatestbook.com/book-design-part-5/
http://webtypography.net/3.2.2 )
This can be done by hand in Pandoc by using the span syntax like
`[ABC]{.smallcaps}`, but quickly grows tedious. It can also be done
reasonably easily with a query-replace regexp eg in Emacs
`(query-replace-regexp "\\([^>]\\)\\(\\\".*?\\\"\\)" "\\1<q>\\2</q>"
nil begin end)`, but still must be done manually because while almost
all uses in regular text can be smallcaps-fied, a blind regexp will
wreck a ton of things like URLs & tooltips, code blocks, etc.
However, if we walk a Pandoc AST and check for only acronyms/initials
inside a `Str`, where they *can't* be part of a `Link` or `CodeBlock`,
then looking over
gwern.net ASTs, they seem to always be safe to
substitute in `SmallCaps` elements. Unfortunately, we can't use the
regular `Inline -> Inline` replacement pattern because `SmallCaps`
takes a `[Inline]` argument, and so we are doing `Str String ->
SmallCaps [Inline]` and changing the size/type.
So we instead walk the Pandoc AST, use a regexp to split on 3+ capital
letters, `SmallCaps` the matched text, and append recursively, and
return the concatenated results.
`bottomUp` is slower than `walk` but appears to be necessary here for
greedy generation; `walk` will do only *some* substitutions, which has
something to do with its tree traversal method, I think? (Regardless,
`smallcapsfy` doesn't seem to add *too* much overhead.)
The final code:
import Text.Pandoc
import Text.Regex.Posix ((=~))
smallcapsfy :: [Inline] -> [Inline]
smallcapsfy ((Str []):[]) = []
-- why `::String` on the regexp pattern? need to specify it
otherwise hakyll.hs OverloadedStrings makes it ambiguous & a type
error
smallcapsfy xs@(Str a : x) = let (before,matched,after) = a =~
("[A-Z][A-Z][A-Z]+"::String) :: (String,String,String)
in if matched==""
then xs -- no acronym anywhere in x
else [Str before, SmallCaps [Str
matched]] ++ smallcapsfy [Str after] ++ smallcapsfy x
smallcapsfy xs = xs
Regexp examples:
"BigGAN" =~ "[A-Z][A-Z][A-Z]+" :: (String,String,String)
~> ("Big","GAN","")
"BigGANNN BigGAN" =~ "[A-Z][A-Z][A-Z]+" :: (String,String,String)
~> ("Big","GANNN"," BigGAN")
"NSFW BigGAN" =~ "[A-Z][A-Z][A-Z]+" :: (String,String,String)
~> ("","NSFW"," BigGAN")
"BigGANNN BigGAN" =~ "[A-Z][A-Z][A-Z]" :: (String,String,String)
~> ("Big","GAN","NN BigGAN")
"biggan means big" =~ "[A-Z][A-Z][A-Z]" :: (String,String,String)
~> ("biggan means big","","")
Function examples:
smallcaps [Str "BigGAN"]
~> [Str "Big",SmallCaps [Str "GAN"]]
smallcaps [Str "BigGANNN means big"]
~> [Str "Big",SmallCaps [Str "GANNN"],Str " means big"]
smallcaps [Str "biggan means big"]
~> [Str "biggan means big"]
Whole-document examples:
bottomUp smallcapsfy [Str "bigGAN means", Emph [Str "BIG"]]
~> [Str "big",SmallCaps [Str "GAN"],Str " means",Emph [Str
"",SmallCaps [Str "BIG"]]]
--
gwern