[...]
> I'm just daydreaming on this, but would that be correct? Or is it six
> of one and half a dozen of the other, in terms of performance?
I don't know, but:
a) the Zotero implementation was first written before there were
macros, so the design likely reflects that
b) take a look at the Haskell implementation, which I believe is
strongly functional:
<http://code.haskell.org/citeproc-hs/>
Bruce
PS - It would be great if at some point we could have a
pure-Javascript CSL processor (rather than E4X).
The best thing to do might be to write something like this from
scratch as Bruce says, as pure javascript, & then to see if it
couldn't be integrated into Zotero. Certainly I found that learning
Zotero internals at the same time that you are trying to write new
code is difficult.
-Erik
On Sun, Jan 25, 2009 at 4:18 AM, Bruce D'Arcus <bda...@gmail.com> wrote:
>
> On Sun, Jan 25, 2009 at 2:15 AM, Frank Bennett <bierc...@gmail.com> wrote:
>
> [...]
>
>> I'm just daydreaming on this, but would that be correct? Or is it six
>> of one and half a dozen of the other, in terms of performance?
[...]
> In my opinion this is a good idea. It would be faster, probably, but
> more importantly it would be easier to maintain. I started work on
> something like what I imagine you have in mind but got bogged down
> with other work & with trying to figure out Zotero's internals.
> Attached if you are interested.
Cool!
> The best thing to do might be to write something like this from
> scratch as Bruce says, as pure javascript, & then to see if it
> couldn't be integrated into Zotero. Certainly I found that learning
> Zotero internals at the same time that you are trying to write new
> code is difficult.
A couple of things:
First, I was thinking that JQuery might help on the sort of basic
parsing and XML support that E4X provides for the current Zotero code
(and in browsers, could help with additional functionality).
Second, WRT to the generic vs. Zotero specific issue, the approach
that most CSL implementations take is to define an independent data
representation and then write different input drivers to map to that,
and different output drivers to get the output (XHTML, ODF, TeX, RTF,
etc.). So, in other words, I'd expect that a rewritten cite.js (or
csl.js) file would not know anything about Zotero.
Bruce
I can't speak for Simon, but, for what it's worth, I don't see us
integrating something into Zotero that had jQuery as a requirement.
Abstracting the processor so that it can more easily be used outside of
Zotero is a worthy (and, I suspect, fairly easy) goal, but E4X is an
ECMA standard that adds a pretty critical language feature, and
replacing it in Zotero with a large third-party JavaScript code base (as
good as it is) wouldn't really make sense.
> I can't speak for Simon, but, for what it's worth, I don't see us
> integrating something into Zotero that had jQuery as a requirement.
> Abstracting the processor so that it can more easily be used outside of
> Zotero is a worthy (and, I suspect, fairly easy) goal, but E4X is an
> ECMA standard that adds a pretty critical language feature, and
> replacing it in Zotero with a large third-party JavaScript code base (as
> good as it is) wouldn't really make sense.
I see your point, but a "standard" isn't particularly relevant unless
its widely implemented. An E4X-only library effectively means it's
Mozilla-only ATM; right*?
But that's admittedly somewhat orthogonal to Frank's and Erik's
concerns (and certainly your's). Was just expressing the hope that if
anybody did bother to rewrite the code, that there would be room for
it to work more widely. The problem is really parsing the CSL file I
guess.
Bruce
* A quick search suggests that there's some movement on implementing
it in WebKit, but not sure the status. I'd guess that MS will never
implement it.
---------- Forwarded message ----------
From: andrea rossato <andre...@unitn.it>
Date: Sun, Jan 25, 2009 at 5:51 PM
Subject: Re: Fwd: Functional CSL processor?
To: Bruce D'Arcus <xxxx...@gmail.com>
On Sun, Jan 25, 2009 at 04:42:00PM -0500, Bruce D'Arcus wrote:
> Andrea: so how did you handle the substitution case Simon notes below?
Easily... the style elements (names, macros, text, etc.), is parsed
into a recursive data-type, Element, defined in Style.hs [1]. Among
the constructors of this data-type there's is 'Names'. 'Names' holds
the variable attribute value, of the <names> element, the <name> and
<label> elements, and the list of substitutions - which are Element
types too. If the variables evaluate to nothing (do not produce an
output), then the substitutions are tried. This is the relevant part
of the code, in Eval.hs:
evalElement :: Element -> State EvalState [Output]
evalElement el
[...]
| Names s n fm d sub <- el = ifEmpty (evalNames s n d)
(withName (getName n) $
evalElements sub)
(appendOutput fm)
| Substitute (e:els) <- el = ifEmpty (consuming $ evalElement e)
(getFirst els)
id
[...]
> I can't quite figure out your code (well, really Haskell) ;-)
Haskell can be quite difficult to read, indeed.
Hope this helps.
Andrea
[1] http://code.haskell.org/citeproc-hs/docs/Text-CSL-Style.html#t%3AElement
> While I can't make a firm promise to come up with something useful, I
> would like to take a look at this problem, at least if no one else is
> likely to latch hold to it in the short term.
If you manage to make progress, it might be good to put this in a
public SCM repo so that others might contribute as time permits.
Bruce
> Small steps. I see the test suite for Mozilla itself now. Runs from
> the command line, I guess that will be the way to go.
I really know nothing about JS testing, but why would it follow that
you'd use a Mozilla-specific testing option?
BTW, not sure it's relevant, but I recently came across this:
<http://ejohn.org/blog/fireunit/>
Bruce
> The D.O.H. test framework does look good. It's apparently run from
> Rhino in the command line interface, which means JS only, no browser
> functionality. I assume that for this task that's not going to be a
> problem, but if that's mistaken ... please let me know.
Well, my perspective: the CSL-related processing should care nothing
at all about the browser, or the application: it's just taking some
input (CSL file and data, probably as JSON) and generating output.
I would think that for Zotero there'd just be some little driver code
that map the Mozilla storage stuff to the processing input..
But Dan or Simon obviously know much more about the
Zotero/Firefox-specific details.
Bruce
.....
> Rhino 1.7r1 was able to load and parse a CSL file successfully with
> xml = new XML( ... ); this is all very new to me ... would that be
> E4X at work?
Yeah; no such support in regular JS (though libraries like JQuery can
provide similar kinds of convenience).
> I'm thinking that what this should do is generate a compiled CSL
> object, with a method that accepts a data blob with a bunch of string
> attributes (an Item), and spits out a nested list object. The non-
> list elements in the list would be text blobs, each containing a
> string, formatting hints, and an inherited rendering hook. The list
> object would have a method that accepts a spec blob with info on how
> to handle each of the hints, and would apply it to its content by
> walking the list before spitting out a string object on completion.
> This would separate the parsing and evaluation work from the
> generation of the final string output, which should be a little easier
> to build, follow and maintain. Would also make it easier to build
> extensions for export formats other than HTML and RTF.
>
> If that general concept makes sense, I can start building little
> pieces of tested code, which might eventually mature into an actual
> CSL processor.
Your explanation makes sense, but I'd strongly suggest you post
questions like this to the xbib dev list, since Andrea is the expert
in how do this with a functional approach*, he's got a working
implementation behind him (which I've not yet used a lot, but my
testing shows it's solid, and really fast), and I don't believe he's
on this list ;-)
Bruce
* Though XSL is a functional language too, so I guess I have some
useful experience as well.
> Now that the fundamentals of the output engine are done, I've been
> thinking about how to handle the CSL file, and I've come to the same
> conclusion as Bruce, that this can be done in native Javascript. The
> parsing is hugely simplified by the fact that CSL files contain no
> text nodes; ....
Except, of course, some of the cs:info metadata (title, id, etc.). But
that'd be handled separately from the main macro, etc. stuff.
Bruce
Yeah, you're right; and that will fail on valid CSL files that use
namespace prefixes.
Bruce
> Progress here. I've checked in code for a small recursive execution
> function that operates on an E4X object, together with one test as a
> quick demo. Still needs to be thoroughly tested, and there are no
> wrapper functions yet, but you should be able to throw arbitrary
> portions of the CSL object at it, together with an Item object, and
> get back the string (or set up configuration, or whatever) that the
> CSL is meant to generate for that chunk. One thing this will mean is
> that the 150-odd lines of code in CSL.Global that extracts locale
> strings and installs them as JS objects can go away; we can just merge
> the locale into the CSL object prior to execution, and the wrappers
> will grab the correct terms as a matter of course.
Just try to leave room for non-E4X implementations.
I just recalled, for example, that the JQuery project has abstracted
out their selector engine for possible implementation in other
frameworks. Here's an (XML) example from the unit tests:
jQuery.get('data/dashboard.xml', function(xml) {
var titles = [];
jQuery('tab', xml).each(function() {
titles.push(jQuery(this).attr('title'));
});
equals( titles[0], 'Location', 'attr() in XML context: Check first title' );
equals( titles[1], 'Users', 'attr() in XML context: Check second title' );
start();
});
So the "jQuery('tab', xml).each" bit iterates through all "tab"
elements in the file, and the "jQuery(this).attr('title')" pulls out
the title attributes.
An interesting integration of tests, BTW.
<http://github.com/jeresig/sizzle/tree/master>
Bruce