extend CSL.Output.formats in a plugin?

101 views
Skip to first unread message

Matt Price

unread,
Jan 26, 2015, 9:52:19 PM1/26/15
to zoter...@googlegroups.com
Hi,

(reposted from the forums -- I hadn't anticipated joining the dev group so this may be a little naive to most of you -- apologies for that, and thanks in advance for forbearance)

I am trying, in the most tentative and exploratory way, to extend Erik Hetzner's zotxt plugin (https://bitbucket.org/egh/zotxt) to output bibliographies formatted in the org-mode (http://orgmode.org) syntax, which is a little bit like markdown, only different.

The code appears to rely on underlying capabilities in citeproc-js, which currently generates output in html, plain-text, and rtf. It looks to me as though it is not so hard to write a new output filter in citeproc, but I don't know how long it would take to get that accepted into citeproc, and then for the changes to propagate up to Zotero.

So I wondered, in the interim, whether it would be possible to just extend CSL.Output.Formats from within a hacked version of the plugin.
So, I would imitate formts.js from citeproc-js (https://bitbucket.org/fbennett/citeproc-js/src/789b5a8a2d694c6afd4073161e4d5941b695e2be/src/formats.js?at=default) and define

CSL
.Output.Formats.prototype.org = {
.....
}



at the top of the extension's main file, bootstrap.js ; and then allow the plugin to serve up org-syntax output in the relevant functions there.

I guess I don't understand javascript's object model very well, nor do I quite understand how much of the underlying citeproc code the plugin has access to. I think it pulls in the Zotero functionality with these lines:

if (!z) {
 z
= Components.classes["@zotero.org/Zotero;1"].
 getService
(Components.interfaces.nsISupports).wrappedJSObject;

 
/* these must be initialized AFTER zotero is loaded */
 easyKeyRe
= z.Utilities.XRegExp("^(\\p{Lu}[\\p{Ll}_-]+)(\\p{Lu}\\p{Ll}+)?([0-9]{4})?");
 alternateEasyKeyRe
= z.Utilities.XRegExp("^([\\p{Ll}_-]+)(:[0-9]{4})?(\\p{Ll}+)?");
 
}




I don't know if I would have to somehow work my way deep into the Zotero code base and access the citeproc functions from somewhere in the z object?

Anyway, thanks for your help!
Matt

Frank Bennett

unread,
Jan 26, 2015, 10:23:36 PM1/26/15
to zoter...@googlegroups.com
Hi, Matt,

I worked with Eric's code a couple of years ago, for the MLZ book
project, which used an extended version of reStructuredText for
authoring, converted to LaTeX for rendering.

My first thought was to add a reStructuredText output format to the
CSL processor, but I quickly discovered that that won't work. Inline
markup in reStructuredText (and in Markdown - not sure about org-mode)
is unable to handle nesting. If italics are placed inside boldface
(say, as "My ***Aunt*** Sally"), the parsing breaks.

The solution (in Eric's code, and in the version of it that I used for
the book) is to hop over the reST form and its limitations by casting
cites in HTML, then converting them to the more robust
reStructuredText internal XML representation, and then grinding the
reST into output (LaTeX) in the usual way.

If org-mode markup can handle nested inline markup, it should be easy
to add an output mode to the processor - as you've spotted, it would
just be a matter of adding a code block to formats.js. If the syntax
can't handle nested inline, though, we can't add it to the processor
code, since it would not produce correct output in all cases.
> --
> You received this message because you are subscribed to the Google Groups
> "zotero-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to zotero-dev+...@googlegroups.com.
> To post to this group, send email to zoter...@googlegroups.com.
> Visit this group at http://groups.google.com/group/zotero-dev.
> For more options, visit https://groups.google.com/d/optout.

Frank Bennett

unread,
Jan 26, 2015, 10:32:44 PM1/26/15
to zoter...@googlegroups.com
On Tue, Jan 27, 2015 at 11:52 AM, Matt Price <mopt...@gmail.com> wrote:
> Hi,
>
> (reposted from the forums -- I hadn't anticipated joining the dev group so
> this may be a little naive to most of you -- apologies for that, and thanks
> in advance for forbearance)
>
> I am trying, in the most tentative and exploratory way, to extend Erik
> Hetzner's zotxt plugin (https://bitbucket.org/egh/zotxt) to output
> bibliographies formatted in the org-mode (http://orgmode.org) syntax, which
> is a little bit like markdown, only different.
>
> The code appears to rely on underlying capabilities in citeproc-js, which
> currently generates output in html, plain-text, and rtf. It looks to me as
> though it is not so hard to write a new output filter in citeproc, but I
> don't know how long it would take to get that accepted into citeproc, and
> then for the changes to propagate up to Zotero.
>
> So I wondered, in the interim, whether it would be possible to just extend
> CSL.Output.Formats from within a hacked version of the plugin.
> So, I would imitate formts.js from citeproc-js
> (https://bitbucket.org/fbennett/citeproc-js/src/789b5a8a2d694c6afd4073161e4d5941b695e2be/src/formats.js?at=default)
> and define
>
> CSL.Output.Formats.prototype.org = {
> .....
> }
>
>
>
> at the top of the extension's main file, bootstrap.js ; and then allow the
> plugin to serve up org-syntax output in the relevant functions there.

To hack the processor, you can use the plugin code here as a starting point:

https://bitbucket.org/fbennett/zotero-processor/src

It sets its own alternative citeproc-js source on Zotero, which will
then instantiate it and use it for rendering. You can basically do
anything you like inside the modded citeproc-js source.

>
> I guess I don't understand javascript's object model very well, nor do I
> quite understand how much of the underlying citeproc code the plugin has
> access to. I think it pulls in the Zotero functionality with these lines:
>
> if (!z) {
> z = Components.classes["@zotero.org/Zotero;1"].
> getService(Components.interfaces.nsISupports).wrappedJSObject;
>
> /* these must be initialized AFTER zotero is loaded */
> easyKeyRe =
> z.Utilities.XRegExp("^(\\p{Lu}[\\p{Ll}_-]+)(\\p{Lu}\\p{Ll}+)?([0-9]{4})?");
> alternateEasyKeyRe =
> z.Utilities.XRegExp("^([\\p{Ll}_-]+)(:[0-9]{4})?(\\p{Ll}+)?");
> }
>
>
>
>
> I don't know if I would have to somehow work my way deep into the Zotero
> code base and access the citeproc functions from somewhere in the z object?
>
> Anyway, thanks for your help!
> Matt
>

Matt Price

unread,
Jan 26, 2015, 10:56:09 PM1/26/15
to zoter...@googlegroups.com
Hi Frank!

Thanks for responding. 

Org will permit some kinds of nesting (italics inside bold is fine, for instance) but, from what I can tell, breaks if one tries to embed a link within a link.  So, it shouldn't be used to provide the description text within an org file (since zotxt generates a custom link type); but it should be fine to use on export, when the custom links disappear. 

I think!

Matt

Matt Price

unread,
Jan 26, 2015, 11:02:16 PM1/26/15
to zoter...@googlegroups.com
Thanks for this!

I am pretty ignorant about FF plugins -- is the very short overlay.js where the action is? Can I just lift it wholesale? (though I notice zotxt has no chrome directory, https://bitbucket.org/egh/zotxt/src/a12d538ae924?at=master):


/ Not wrapped in an onLoad function call, because we want this
// in place _before_ Zotero is initialized.
var Zotero = Components.classes["@zotero.org/Zotero;1"]
	.getService(Components.interfaces.nsISupports)
	.wrappedJSObject;
var Components = Components.utils["import"]("resource://gre/modules/XPCOMUtils.jsm");
Components.classes["@mozilla.org/moz/jssubscript-loader;1"]
	.getService(Components.interfaces.mozIJSSubScriptLoader)
	.loadSubScript("chrome://zotero-processor/content/citeproc.js");
Zotero.CiteProc.CSL = CSL;

thanks again,
matt

Frank Bennett

unread,
Jan 26, 2015, 11:25:43 PM1/26/15
to zoter...@googlegroups.com
Ah, right, yes. Bootstrap loading.

In zotxt, the loadZotero() function does the same thing as the overlay
in the zotero-processor plugin. So something along these lines should
work to load the alternative processor source (I think):

function loadZotero () {
if (!z) {
z = Components.classes["@zotero.org/Zotero;1"].
getService(Components.interfaces.nsISupports).wrappedJSObject;

Components.classes["@mozilla.org/moz/jssubscript-loader;1"]
.getService(Components.interfaces.mozIJSSubScriptLoader)
.loadSubScript("chrome://zotero-processor/content/citeproc.js");

z.CiteProc.CSL = CSL;

/* these must be initialized AFTER zotero is loaded */
easyKeyRe =
z.Utilities.XRegExp("^(\\p{Lu}[\\p{Ll}_-]+)(\\p{Lu}\\p{Ll}+)?([0-9]{4})?");
alternateEasyKeyRe =
z.Utilities.XRegExp("^([\\p{Ll}_-]+)(:[0-9]{4})?(\\p{Ll}+)?");
}
}

(If that works, I could adopt the same approach in zotero-processor,
which might make it easier to get it working with Standalone, where it
can't currently be installed.)

Erik Hetzner

unread,
Jan 27, 2015, 2:10:58 AM1/27/15
to zoter...@googlegroups.com, Matt Price
Hi Matt,

I think the easiest way to do this would be to do what I did in
zot4rst, which is parse the (limited subset of) HTML returned by
citeproc and transform it into org syntax.

See below for an elisp code snippet based on:

https://bitbucket.org/egh/zot4rst/src/32fdfc93041fce0e31ad59bab4e434730491a1b1/xciterst/util.py?at=master

I’m not sure the use case here is, but have you looked at zotxt-emacs?
It includes some support for org-mode, which I have been cleaning up
recently. See http://bitbucket.org/egh/zotxt-emacs. If you write any
code, it would be great if it could be worked into that project.

best, Erik

(require 'pcase)

(defun org-zotxt-parse-htmlstring (html)
(with-temp-buffer
(insert html)
(libxml-parse-html-region (point-min) (point-max))))

(defun org-zotxt-htmlstring2org (html)
(org-zotxt-htmltree2org (org-zotxt-parse-htmlstring html)))

(defun org-zotxt-htmltree2org (html)
(pcase html
((pred (stringp)) html)
(`(a ,attrs . ,children)
(format "[[%s][%s]]" (cdr (assq 'href attrs))
(org-zotxt-htmltree2org children)))
(`(i ,attrs . ,children)
(format "/%s/" (org-zotxt-htmltree2org children)))
(`(b ,attrs . ,children)
(format "*%s*" (org-zotxt-htmltree2org children)))
(`(p ,attrs . ,children)
(format "%s\n\n" (org-zotxt-htmltree2org children)))
(`(span ,attrs . ,children)
(pcase (cdr (assq 'style attrs))
("font-style:italic;"
(format "/%s/" (org-zotxt-htmltree2org children)))
("font-variant:small-caps;"
; no way?
(org-zotxt-htmltree2org children))
(_ (org-zotxt-htmltree2org children))))
((or `(html ,attrs . ,children)
`(body ,attrs . ,children))
(org-zotxt-htmltree2org children))
((or
;; list of elements
(pred (lambda (h) (and (listp h) (stringp (car h)))))
;; list of parses
(pred (lambda (h) (and (listp h)
(listp (car h))
(symbolp (car (car h)))))))
(mapconcat #'org-zotxt-htmltree2org html ""))))

(org-zotxt-htmlstring2org "<p><a href=\"http://example.org/\">hello</a> <span style=\"font-style:italic;\">world<br/> foo</span></p>")

On Mon, 26 52 2015 at
> --
> You received this message because you are subscribed to the Google Groups "zotero-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to zotero-dev+...@googlegroups.com.
> To post to this group, send email to zoter...@googlegroups.com.
> Visit this group at http://groups.google.com/group/zotero-dev.
> For more options, visit https://groups.google.com/d/optout.
> [1.2 <text/html; UTF-8 (quoted-printable)>]
--
Sent from my free software system <http://fsf.org/>.

Matt Price

unread,
Jan 27, 2015, 8:08:56 AM1/27/15
to zoter...@googlegroups.com
thank you Erik, that is really very helpful (I'll have to learn more about pcase!). 

I actually opened an issue on zotxt-emacs yesterday (https://bitbucket.org/egh/zotxt-emacs/issue/20/expose-internal-structure-of-desc); the use-case is mostly figuring out how to get from what zotxt-emacs already does, to working HTML and ODT export. 

If I'm reading the code right, the first step would be to modify zotxt.el so that it accepts the html-encoded bibliographic information from the zotxt extension (I think this would be in zotxt-get-item-bibliography-deferred and zotxt-choose-deferred).  The second step would be to add export functions to the zotero link type definition in org-zotxt.el.  These would parse the Zotero links, get the item ID of individual items, and then re-query the database, this time retrieving the HTML.  The HTML parser would simply pass this on to the exporter, while the ODT would do osme relatively ocmplex munging (don't know how to do that yet).

However, I'm not quite sure how to do either one of these yet...

Thanks again for the help!
Matt

Matt Price

unread,
Jan 27, 2015, 8:09:36 AM1/27/15
to zoter...@googlegroups.com
Thanks Frank, I'm hoping to get to this today or tomorrow, awesome!

Matt

Emiliano Heyns

unread,
Feb 21, 2015, 5:59:30 PM2/21/15
to zoter...@googlegroups.com
Actually writing a new format isn't really hard; I tried to parse & transform the generated HTML, but in the end found it much easier to write the format in the end. You can find mine here: https://github.com/ZotPlus/zotero-better-bibtex/blob/master/chrome/content/zotero-better-bibtex/schomd.coffee ; it's written in CoffeeScript, but you can pass that through the "Try CoffeeScript" online translator at http://coffeescript.org/ to create fairly clean javascript.

Matt Price

unread,
Nov 5, 2015, 9:00:38 PM11/5/15
to zoter...@googlegroups.com, emilian...@iris-advies.com
Following up on a long-dead thread.  Emiliano; many thanks for this. Is there any reason you haven't added markdown support to the upstream citeproc-js? Wouldn't it be great to have markdown export in citeproc? Anyway I will build an org-mode translator on this model sometime in the next week or so. 

Also seems like it would make sense to have an odt output.  pandoc-citeproc seems to produce odt output; but maybe there are complexities to that process that I don't understand.


On Sat, Feb 21, 2015 at 5:59 PM, Emiliano Heyns <emilian...@iris-advies.com> wrote:
Actually writing a new format isn't really hard; I tried to parse & transform the generated HTML, but in the end found it much easier to write the format in the end. You can find mine here: https://github.com/ZotPlus/zotero-better-bibtex/blob/master/chrome/content/zotero-better-bibtex/schomd.coffee ; it's written in CoffeeScript, but you can pass that through the "Try CoffeeScript" online translator at http://coffeescript.org/ to create fairly clean javascript.

--

Emiliano Heyns

unread,
Nov 6, 2015, 1:41:58 AM11/6/15
to Matt Price, zoter...@googlegroups.com

No particular reason, I hadn't considered people might be interested. I'll put together a pull request.

ODT is a lot more complicated. If this is intended to be pasted in somewhere you'd have to play nice with the existing document styles which you can't know.

Emiliano Heyns

unread,
Nov 6, 2015, 4:56:03 AM11/6/15
to zotero-dev, emilian...@iris-advies.com


On Friday, November 6, 2015 at 3:00:38 AM UTC+1, Matt Price wrote:
Following up on a long-dead thread.  Emiliano; many thanks for this. Is there any reason you haven't added markdown support to the upstream citeproc-js? Wouldn't it be great to have markdown export in citeproc? Anyway I will build an org-mode translator on this model sometime in the next week or so. 


Ah, I remember now why. The markdown generation from BBT is opinionated, as it includes BBT citation keys. I've submitted a pull request that excludes this but it should probably be regarded as starter code rather than a finished product. 

Frank Bennett

unread,
Nov 6, 2015, 3:51:59 PM11/6/15
to zotero-dev, emilian...@iris-advies.com
As Emiliano says, direct-to-odt for fully formatted cites would be hard. As far as I know, the attributes for the formatting elements are document-specific, declared in the document header. So an odt format in citeproc-js probably isn't something to attempt.

What you could do, though, is to slot in templates for live Zotero citations, in the same way that the ODF/RTF Scan plugin does. For that, you could either run the plugin code itself over the finished ODT document (which would require a separate connection to Zotero for the plugin code to work with), or push the field templates into the document directly by extending the org-mode ODT converter itself. If you can get the necessary IDs (library, item) and strings (prefix, suffix, locator) in the Emacs environment, the latter route seems like it might simplify things. (Seems like - I'm talking through my hat here, since I don't know enough of Emacs to have an actual opinion.)

Emiliano Heyns

unread,
Nov 6, 2015, 4:08:44 PM11/6/15
to zotero-dev, emilian...@iris-advies.com
On Friday, November 6, 2015 at 9:51:59 PM UTC+1, Frank Bennett wrote:
As Emiliano says, direct-to-odt for fully formatted cites would be hard. As far as I know, the attributes for the formatting elements are document-specific, declared in the document header. So an odt format in citeproc-js probably isn't something to attempt.


This is correct, and even the IDs for the standard styles aren't reliably available. Really not the way to go. I have no idea how pandoc-citeproc does this; do you have some output I could inspect?
 
What you could do, though, is to slot in templates for live Zotero citations, in the same way that the ODF/RTF Scan plugin does. For that, you could either run the plugin code itself over the finished ODT document (which would require a separate connection to Zotero for the plugin code to work with), or push the field templates into the document directly by extending the org-mode ODT converter itself. If you can get the necessary IDs (library, item) and strings (prefix, suffix, locator) in the Emacs environment, the latter route seems like it might simplify things. (Seems like - I'm talking through my hat here, since I don't know enough of Emacs to have an actual opinion.)


The CAYW picker can cough up these templates fully preformatted. But why would you want to do ODT output in CSL? If it's pastability you're after, you can just go with HTML and paste that, Open/LibreOffice will do the right thing.

Frank Bennett

unread,
Nov 6, 2015, 4:40:53 PM11/6/15
to zotero-dev, emilian...@iris-advies.com

My comment there was more about org-mode export, CSL would not be involved. The idea would be to get valid dynamic cite markup into the ODT export, and let any CSL formatting take place in Zotero (or Juris-M) via the WP plugins.
Reply all
Reply to author
Forward
0 new messages