Fwd: Decoupling citeproc and highlighting-kate from pandoc

92 views
Skip to first unread message

Gwern Branwen

unread,
Aug 30, 2013, 11:29:26 AM8/30/13
to hakyll
How would this proposal affect Hakyll users? I am fond of my source code syntax highlighting, which I use throughout my site (anywhere I need to do a little or a lot of statistics).

Forwarded conversation
Subject: Decoupling citeproc and highlighting-kate from pandoc
------------------------

From: John MacFarlane <fiddlo...@gmail.com>
Date: Sun, Aug 25, 2013 at 3:15 PM
To: pandoc-...@googlegroups.com


Pandoc currently includes built-in support for citation processing
(using citeproc-hs) and syntax highlighting (using highlighting-kate).
This is not ideal, since many users don't use either of these features,
and few, I would imagine, use both.  Both citeproc-hs and
highlighting-kate are big libraries, which bloat the pandoc binary.

So I am moving towards decoupling these components from core pandoc.
As noted earlier, I've added a --filter flag to pandoc, which makes it
easy to run JSON filters.  So, to process citations, you'll now do

    pandoc --filter pandoc-citeproc input.txt

and specify the bibliography and csl file in the document's metadata.
Similarly, to highlight source code using highlighting kate, you'll do

    pandoc --filter pandoc-highlight input.txt

(Highlighting style will be specified in the metadata rather than on
the command line.) Both pandoc-citeproc and pandoc-highlight will be
optional add-ons. The packages that provide them will also provide
Haskell libraries for processing citations and doing code highlighting
in pandoc documents, for those who use pandoc as a library.

I have already made the citations change; I have not yet made the
highlighting change, and could still back down if there is a lot of
protest.  The main drawback I can see is that performance will be a
*little* worse -- but the difference is pretty negligible.  The
command line will also be a bit longer for those who use highlighting.
But I think this change makes a lot of sense.  It will also make it
easier for users to use their own highlighting libraries; they just need
to write a simple filter.

Another nice feature I'm adding is inline references.  Instead of
putting your bibliographic references in an external file, you can
now include them directly in the document, in a YAML metadata block.
(This can occur at the end of the document if you like.)
This makes the document more self-contained.  The YAML format is
basically a YAML representation of the JSON format used by citeproc-js.
Here's an example:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Some claim [fenner12a].

---
csl: tests/mhra.csl
references:
- id: fenner12a
  title: One-click science marketing
  author:
   family: Fenner
   given: [Martin]
  container-title: Nature Materials
  volume: 11
  url: 'http://dx.doi.org/10.1038/nmat3283'
  doi: '10.1038/nmat3283'
  issue: 4
  publisher: Nature Publishing Group
  page: 261-263
  type: article-journal
  issued:
    year: 2012
    month: 3
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Of course, you can also store the YAML bibliography in a separate file,
and provide it to pandoc on the command line with the main file.

The pandoc-citations package will include an executable, biblio2yaml,
that translates from common bibliographic formats (e.g. bibtex) into
this YAML format.

I'm pretty excited about these changes, which I would like to include
in the 1.12 release.  The release has been held up waiting for Andrea
to release citeproc-hs.  I haven't heard from Andrea in a long time, so
I now have a contingency plan.  I will simply include the code from
citeproc-hs, including a couple of bug fixes, in the pandoc-citations
package.  When Andrea comes back, as I hope he does, I'll remove this
and depend once more on citeproc-hs.  But at least this will let us
go ahead with the release.

If you want to play with the new pandoc-citations stuff, it's on github:
http://github.com/jgm/pandoc-citations
It assumes you've installed a very recent dev version of pandoc and
pandoc-types.

John

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discus...@googlegroups.com.
To post to this group, send email to pandoc-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/20130825191545.GA80583%40Johns-MacBook-Air-2.local.
For more options, visit https://groups.google.com/groups/opt_out.

----------
From: Joseph Reagle <josep...@reagle.org>
Date: Mon, Aug 26, 2013 at 11:23 AM
To: pandoc-...@googlegroups.com
Cc: John MacFarlane <fiddlo...@gmail.com>


On 08/25/2013 03:15 PM, John MacFarlane wrote:
> and specify the bibliography and csl file in the document's metadata.

Does using the filter require this? Is it because one can't pass
arguments to the filter?

I often already specify CSL in the metadata using my wrapper/build
utilities so that's fine by me. For instance:

~~~~

Title: DRAFT: The Obligation to Know
Author: Joseph Reagle
Date: 20130601
md_opts: --toc --style-csl apa --british-punctuation

~~~~

However, I am concerned with having the bibfile in the metadata because
I generate those dynamically and temporarily to deal with the problem of
my larger bibtex file (~3MB) and pandoc/hs-citeproc's slow processing.
(I extract the keys from the markdown file then use that to generate a
specific and per-document bibtex file.)

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discus...@googlegroups.com.
To post to this group, send email to pandoc-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/521B730A.50901%40reagle.org.

----------
From: BP Jonsson <bpjo...@gmail.com>
Date: Tue, Aug 27, 2013 at 8:16 AM
To: "pandoc-...@googlegroups.com" <pandoc-...@googlegroups.com>


I've been meaning to start using the citation feature for quite some time and this change might actually get me going. When working on other people's documents as I do quite a bit I usually get a ready-formatted bibliography section so then the need hardly arises.

As for using either syntax highlighting or citations but not both it kindof applies in that I hardly would use both with the same document, but I might well use them both at different times. Anyway I'm not opposed to the change.

Like Joseph I'm wondering whether it is or will be possible to pass arguments to filters?

/bpj

Den måndagen den 26:e augusti 2013 skrev Joseph Reagle:


--
/BP

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discus...@googlegroups.com.
To post to this group, send email to pandoc-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuR_8EKbhs8LwzwzNXANmS6%3Dxs7m3JtFR1vYXhmqbuwi3Q%40mail.gmail.com.

----------
From: Nick Bart <nickba...@gmail.com>
Date: Tue, Aug 27, 2013 at 9:18 AM
To: pandoc-...@googlegroups.com




On Sunday, August 25, 2013 9:15:45 PM UTC+2, fiddlosopher wrote:

The pandoc-citations package will include an executable, biblio2yaml,
that translates from common bibliographic formats (e.g. bibtex) into
this YAML format.

"biblio2yaml" sounds exciting. Could it be used as a filter, too, converting external biblio databases at runtime, and replacing bibutils?

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discus...@googlegroups.com.
To post to this group, send email to pandoc-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/d479b8cf-95f2-4182-bc16-8dca53420eb8%40googlegroups.com.

----------
From: John MacFarlane <fiddlo...@gmail.com>
Date: Tue, Aug 27, 2013 at 9:53 AM
To: pandoc-...@googlegroups.com


+++ Nick Bart [Aug 27 13 06:18 ]:
>    On Sunday, August 25, 2013 9:15:45 PM UTC+2, fiddlosopher wrote:
>
>      The pandoc-citations package will include an executable,
>      biblio2yaml,
>      that translates from common bibliographic formats (e.g. bibtex) into
>      this YAML format.
>
>    "biblio2yaml" sounds exciting. Could it be used as a filter, too,
>    converting external biblio databases at runtime, and replacing
>    bibutils?

Yes, I've set it up so it will work as a pipe -- I'm not sure if
that's what you meant.
Note that it uses the bibutils library behind the scenes.

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discus...@googlegroups.com.
To post to this group, send email to pandoc-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/20130827135349.GB4394%40Johns-MacBook-Pro.local.

----------
From: John MacFarlane <fiddlo...@gmail.com>
Date: Tue, Aug 27, 2013 at 11:46 AM
To: pandoc-...@googlegroups.com


+++ John MacFarlane [Aug 27 13 06:53 ]:
> +++ Nick Bart [Aug 27 13 06:18 ]:
> >    On Sunday, August 25, 2013 9:15:45 PM UTC+2, fiddlosopher wrote:
> >
> >      The pandoc-citations package will include an executable,
> >      biblio2yaml,
> >      that translates from common bibliographic formats (e.g. bibtex) into
> >      this YAML format.
> >
> >    "biblio2yaml" sounds exciting. Could it be used as a filter, too,
> >    converting external biblio databases at runtime, and replacing
> >    bibutils?
>
> Yes, I've set it up so it will work as a pipe -- I'm not sure if
> that's what you meant.
> Note that it uses the bibutils library behind the scenes.

Note that you can also just include

    bibliography: mybib.bib

in the metadata, and it will grab the bibliography from a bibtex or
biblatex file.  (Other formats can also be used.)


--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discus...@googlegroups.com.
To post to this group, send email to pandoc-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/20130827154658.GA91077%40Johns-MacBook-Air-2.local.

----------
From: Nick Bart <nickba...@gmail.com>
Date: Fri, Aug 30, 2013 at 11:24 AM
To: pandoc-...@googlegroups.com


That's what I wanted to know, thank you.

If bibutils is used, I'm curious how the conversion step from MODS to "YAML representation of CSL variables in the JSON format" is done? If the existing citeproc-hs routines are used, please note that these still contain a number of bugs, and I’d be very happy if these could be ironed out at this opportunity.


On Tuesday, 27 August 2013 16:53:49 UTC+3, fiddlosopher wrote:
+++ Nick Bart [Aug 27 13 06:18 ]:
>    On Sunday, August 25, 2013 9:15:45 PM UTC+2, fiddlosopher wrote:
>
>      The pandoc-citations package will include an executable,
>      biblio2yaml,
>      that translates from common bibliographic formats (e.g. bibtex) into
>      this YAML format.
>
>    "biblio2yaml" sounds exciting. Could it be used as a filter, too,
>    converting external biblio databases at runtime, and replacing
>    bibutils?

Yes, I've set it up so it will work as a pipe -- I'm not sure if
that's what you meant.
Note that it uses the bibutils library behind the scenes.

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discus...@googlegroups.com.
To post to this group, send email to pandoc-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/049d081f-6a1a-479b-9b44-c36c8967cf6c%40googlegroups.com.




--
gwern
http://www.gwern.net

Jorge Peña

unread,
Aug 30, 2013, 4:23:06 PM8/30/13
to hak...@googlegroups.com, gw...@gwern.net
I'm not sure how you'd think it'd affect you. I asked for some clarification on the list, but it sounds like you'll still be able to do highlighting, it'll just require some changes in the code so that you can apply the filter to the codeblock. In other words, it doesn't sound like they're getting rid of highlighting features in pandoc once and for all, they're just decoupling it so that it's not packaged along with it, thereby ridding pandoc of some hefty dependencies. So all it will require is end-users pulling in the dependency and adding the code to apply the highlighting filter to the document, whereas before it was all built-in and by default.

Gwern Branwen

unread,
Aug 30, 2013, 4:39:47 PM8/30/13
to hakyll
On Fri, Aug 30, 2013 at 4:23 PM, Jorge Peña <jorge...@gmail.com> wrote:
> I'm not sure how you'd think it'd affect you.

I'm not sure how I would think it would not affect me. Right now,
nowhere in my hakyll.hs is anything whatsoever about highlighting or
Kate - because the syntax highlighting is on by default. Presumably by
making it optional for users, I will now have to do something to
enable it, or otherwise my 521 codeblocks over 35 pages will all be
broken.

--
gwern
http://www.gwern.net

Jorge Peña

unread,
Aug 30, 2013, 4:49:47 PM8/30/13
to hak...@googlegroups.com, gw...@gwern.net
Indeed, it'd affect you in that way. I simply got the impression that you thought you'd no longer be able to highlight the code blocks at all, as if Pandoc was ridding itself of the feature completely (instead of simply decoupling it).

You wouldn't be the only person to be in this position, so be assured that a fix would become readily available, should they decide to follow through with it. It could end up being added to Hakyll as a separate compiler (or maybe even the default). From the sound of it, even if this weren't added to Hakyll itself (to avoid embedding usage assumptions or whatever; for example I don't use the built-in highlighter but instead use pygments, but then I don't use the default Pandoc compiler in Hakyll), it sounds like they intend to make it very simple/straightforward to drop-in with code.

So don't worry!
Reply all
Reply to author
Forward
0 new messages