On 2012-11-15 07:50, peter gallagher wrote:
Why not just make all four substitutions in the same
> Oops... I copied it wrong, omitting the underscore before the space
> character in the first regex. The perl string *should* be:
> perl -pe 's/[^\w\.\-_ ]//g' | perl -pe 's/^[0-9\. ]*//g'| perl -pe '$_=lc'
> | perl -pe 's/[ \t]/-/g'
perl process? I'm not saying you're doing wrong, just
wondering why. Also you might want to make sure to
remove all formatting by first piping trough pandoc
with plain output, for which reason you'll want to trim
off leading and trailing whitespace as well.
pandoc -w plain | perl -pe's/[^\w\.\-_ ]//g; s/^[0-9\. ]*//g;
$_=lc; s/[ \t]/-/g'
Also you might want to prepare for Unicode in your input
perl -Mopen=:utf8,:std -pe'...
As it happens, when the subject of heading ids and
links to them came up here a while ago I wrote a perl
script which collects all ATX headings[^1] in its input
and outputs blank-line separated reference link
definitions with the actual heading texts as link
identifiers and the properly formatted HTML ids as
URLs. Optionally it also adds the heading text as a
title attribute text.
[^1]: Actually all lines beginning with 1-6 hash marks
-- take care to backslash escape any false positives!
I've added some help text/documentation and attached it.
I use it like this
perl mkd-head-links.pl chap1.mkd chap2.mkd >headlinks.mkd
pandoc chap1.mkd chap2.mkd headlinks.mkd
perl mkd-head-links.pl --help|pandoc -o mkd-head-links.pdf
to see the examples nicely typeset!