On Thu, May 5, 2011 at 7:53 AM, spartan <spart...@gmail.com> wrote:
> I love zotero so much. But got disturbed by the poor support of
> bibtex. So I spent quite some time to improve the functionality of the
> bibtex export. It should be very useful for those who use bibtex/latex
> a lot, in particular for scientists like me. But I don't know where I
> should post my modified codes.
Please post the code to http://gist.github.com/ and post a link to the
list, so the other BibTeX devs can take a look.
Avram
1. Zotero.getPrefs(..), a new function in the sandbox that allows
access to arbitrary preferences from the Zotero section of the user's
prefs (patch to translate.js)
2. Expose the local path of attachments through the "path" attribute
on the attachment object (patch to item_local.js)
I'm a little concerned about the first one, since I don't see a
justification for exposing all the user's preferences to all
translators. This also has some implications for the future
portability of the translator to a server-side or in-connector
translator sandbox. Couldn't we just keep BibTeX key editing internal
to the file and have people edit a constant there?
The second proposed change seems quite reasonable.
- Avram
> --
> You received this message because you are subscribed to the Google Groups "zotero-dev" group.
> To post to this group, send email to zoter...@googlegroups.com.
> To unsubscribe from this group, send email to zotero-dev+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/zotero-dev?hl=en.
>
>
On Thu, May 5, 2011 at 8:19 PM, spartan <spart...@gmail.com> wrote:
[..]
> If people are really concerned about exposing all zotero-related
> preferences
> to all translators. We can define a more specific function like,
>
> "getBibtexPrefs":function(translate, pref) {
> ...
> return Zotero.Prefs.get("bibtex"+pref);
> }
>
> In this case, only bibtex related preferences get exposed. What do you
> think?
I agree with your intent here, and I hope we can find a workable
solution. If the key generation settings are going to be in the Zotero
preferences, I think we should probably handle it as a
non-BibTeX-specific subtree of the preferences:
"getPref":function(translate, pref) {
...
return Zotero.Prefs.get("translator."+pref);
}
We can imagine more uses for such preferences for other translators as
well, although I think we'd have to be very careful about what uses we
allow in translators that ship with Zotero. The added freedom might
prove useful for internal-use translators being created for specific
users' or institutions' workflows.
I flagged this not because I disagree with it, but rather because
changes to the core client go through a different review process, and
I wanted to make sure that the core devs took a look at these specific
proposed changes and weighed in on them.
Avram
>> > > 8. some other minor improvements for bibtex export such as better
>> > > treatment for
>> > > latex special characters like $,\,_,^, etc.
>
> I am opposed to this change. I do realize that some other BibTeX
> users really want it. I've helped them implement something similar in
> their personal copies of the translator. But it runs with a very
> different philosophy than Zotero is made with.
>
> For the clarity of others: the proposed change effects export by
> allowing BibTeX users to use BibTeX-specific markup that will be
> preserved on export. Encouraging this markup would break sharing of
> the references with anything that is not BibTeX, as people would write
> LaTeX entities into zotero fields, rather than using the what-you-say-
> is-what-you-get-everywhere UTF-8 equivalents.
Yes, definitely agree with Rick's position here, and strongly against
in any way treating bibtex as special for purposes of data entry (use
UTF8) or key/label generation (which is relevant for a variety of
formats).
Bruce
...
> Somehow I am feeling a little too much hatred towards bibtex/latex
> in zotero community. Please forgive me if I did not use more polite
> words here or there because I am not a native english speaker.
I have no "hatred" towards that community.
I do get annoyed when people present it as the only significant game
in town though, and don't consider the wider world that Zotero is
attempting to be compatible with (a variety of other formats, the web,
word-processors, etc.).
> Physical sciences all have long history for research and literature.
> Not all of the documents are in modern or fancy formats. But so
> much of the information is good enough and found its way getting
> into my reference database and I am sure, also, a lot of other
> scientists'. Are we asking all the scientific researchers who are
> using zotero and bibtex to manually modify their databases to
> conform "zotero's standard"? Those latex-specific
> markups are scattered everywhere in these people's databases.
So use a BibTeX-dedicated solution?
I have no problem if, for example, Zotero were to add math support in
ways that would allow round-tripping to LaTeX. But that's a different
matter.
Rick has more direct experience working with BibTeX, though, as well
as understanding of Zotero, so I'd tend to defer to him on the
details.
[...]
> Somehow the spirit around zotero is not right. It should be
> more for a cooperative, compatible, user-friendly one. It will
> be really disappointing if it becomes a policing tool to
> tell people what to use and what not.
It really depends on where you stand: "cooperative, compatible,
user-friendly" for whom, after all? Just BibTeX users?
Bruce
The concern is that decisions like handling of LaTeX's math-mode
delimiters or character entities have pretty major consequences for
data exchange. Upon import, $\alpha$ should definitely be converted to
the Greek letter in its Unicode representation. Similarly,
superscripted and subscripted characters and many other symbols have
Unicode representations, and we can and should make the proper
conversions to convert to them when importing BibTeX files. This is
particularly important because BibTeX is a common data interchange
format-- we have site translators like the Google Scholar translator
that rely on BibTeX, and many people import large quantities of items
using the format. These may be people who never actually use
BibTeX/LaTeX for bibliographies and documents, or they could be people
who intend to export back to BibTeX at a later date.
Most of the (La)TeX markup that occurs in bibliographic data is
limited to individual symbols -- these can be stored as Unicode within
Zotero, and imported and exported correctly. Full-fledged math support
is only rarely needed, particularly outside of abstracts. To enter the
individual symbols from within Zotero, the current system requires
that users enter them as Unicode-- probably by using the system
character palette. While this is probably not terribly convenient,
it's nothing that couldn't be streamlined by a small plugin to convert
(La)TeX names into the correct Unicode symbols.
If we set up a situation where the recommended workflow is to mark up
bibliographic data in (La)TeX ways, users who take this approach will
lose the full flexibility of Zotero and may no longer be able to use
their data in any non-BibTeX context. No export to RIS, (possibly) no
OpenURL locator lookups, no use of the server API, particularly its
formatted citations, no use of the local integration plugins, from
word processor plugins to emacs org-mode integration.
All that said, I do think we could do better in our handling of
math-mode upon export to BibTeX. The arguments above do not imply that
we actively do not support handling what the user clearly intends to
do-- if we can devise a handling for $...$ that respects user
intentions on export, and doesn't escape the lot of it, but still
doesn't risk biting users who just happen to use dollar signs a lot,
then I think we should put it into the BibTeX export function.
Regards,
Avram
The rest of your argument depends on this assumption, but $\alpha$ is
"garbage" to all non-LaTeX-using members of the Zotero ecosystem. It
shouldn't be there to begin with, and it's a bug if it is. Import
translators should be fixed to ensure that it doesn't end up in a Zotero
field. Batch editing should allow people to fix that data.
Our philosophy on this is pretty well documented by this point, in this
thread and elsewhere. You may feel that BibTeX-using Zotero users "do
not need their databases to look appealing to other guys", but that's
contrary to Zotero's goals. This isn't a "punitive" policy, and it's in
no way specific to BibTeX. Adding support to export translators for
format-specific markup in Zotero fields encourages the entry of bad data
in Zotero fields.
(And, really, it's a little silly to keep suggesting that properly
rendered characters are something that only MS Word users are interested
in.)
> I could understand it if zotero's stand on this issue is based on
> "Non-Doing". But it actually intentionally implemented a piece
> of code to make the exported bib file even messier. The current
> entrance normalization procedure (i.e. import) is far from
> complete. So often, latex or semi-latex like markups easily
> go inside our databases. Instead of export it as it is so we can
> find better solution to normalized it through re-import later, the
> current implementation makes it irrecoverable.
>
> Just like, say, the US is too crowded now so the government
> wants to promote the use of trains for transportation. It will be
> a reasonable measure if they build better railroads and make
> more trains, etc. But it will be ourageous if they start to dig
> holes on the normal roads to prevent people from driving their
> cars. Sadly such kind of policing behavior is what zotero
> treats bibtex now.
This thread is getting off the rails, and I suggest you tone down the
rhetoric. It serves no useful purpose in advancing your aims. Frank
tried to explain the issues in a polite way, and this is how you
respond?
You understand the requirements outlined by various people in this
thread (basically, to normalize text content around UTF8, and HTML),
so how about working on a concrete proposal to improve things based on
those requirements, rather than complaining with overheated rhetoric?
If you can't do that, then I suggest you drop it.
And many of what you seem to derisively call "devs" in this thread are
users too.
Bruce
The contentious behavior is not on import, but on export. When
exporting to BibTeX, Zotero currently makes the assumption that the
user wants the present field contents to not be interpreted as marked
up using *TeX. Therefore, Zotero escapes backslashes, dollar signs,
etc. There is little doubt that some such escaping is necessary in any
potential BibTeX translator.
The controversial part is that Zotero provides no special handling for
field contents that is already marked up with *TeX, and it escapes it
as with everything else. This is actually a reasonable practice, but
it's certainly a pain for anyone who is trying to preserve the markup
while still keeping items in Zotero.
I think there is room for a BibTeX translator that supports both
escaping- and non-escaping export, with the behavior controlled by a
preference exposed as spartan suggests, or by a variable set in the
translator.
The default behavior, however, and perhaps the only behavior possible
without editing the translator (if only to uncomment a line, say),
should be escaping output. The tenet of data portability matters.
There are also proposed changes to the attachment path exposure, which
have not been properly discussed, mainly because of the separate
dispute over Zotero's escaping procedures on export to BibTeX.
In future messages, spartan, please keep in mind that you are dealing
with a number of people with extensive experience with BibTeX and a
wide variety of systems-- I certainly cut my teeth on LaTeX and
understand the appeal of its expressiveness. The members of this list
are all very dedicated to making researchers' lives (in many cases
their own) easier. And that's about all-- no hidden agenda, no
discrimination.
Avram
2) Let me insist on my proposal and elaborate it a bit further. I know
it does not address exactly the discussion between spartan and others,
but this debate is a good chance to think more generally about a better
bibtex life into zotero. And I'd even go as far as saying that my
suggestion might solve the problem that raised the disagreement of
spartan with the zotero team. I hope this is not considered as thread
hijacking as it is really related to the current discussion.
Here is a suggestion to better integrate bibtex support (and more
generally latex support, or math writing support), into zotero.
A) On bibtex import into zotero. If a pair of $ signs is detected,
either 1) (preferably) transform the content into pure Unicode text if
possible. Example: $\alpha$ is transformed into "α", unicode alpha. 2)
(if no text alternative) transform the content into
<latex>content</latex> markup. Example: $\frac{a}{b}$ would be
transformed into <latex>\frac{a}{b}</latex>.
B) On bibtex export from zotero. Obviously, change back the <latex> and
</latex> tags into dollar signs. Quote dollar signs from zotero (because
these must be considered as real dollar signs).
C) (For more general support of maths.) Transform other translators to
account for these <latex> tags. E.g. transforming latex content into
HTML should be possible by re-using some latex-to-html program.
D) Problem of the currently existing entries that have latex code
enclosed into dollar signs (e.g. non-transformed entries imported from
bibtex). A plugin to zotero (or other appropriate mechanism as suggested
by the zotero team) should be written in order to batch-transform these
entries. Problem: not every $ pairs must be transformed into <latex>
tags. Probably ask the user case-by-case for confirmation. It should
thus be possible to run the batch on a subset of entries to permit
correcting only the specific subset of entries the user is interested in
from a large db.
E) Problem of people wanting to use the strings "<latex>" and "</latex>"
in an entry without giving them a specific meaning. Does that exist?
Advantages of the proposal over the current situation:
1) The handling of bibtex imported entries containing latex code is
better for bibtex users, not worst for other ones (in the current
situation they are anyway unhappy with this unexploited and
unexploitable — let us pretend that word exists — latex code inside
their database).
2) Opens a path for better handling of bibtex imported entries, and more
generally of fields containing math, through the writing of appropriate
exporters, even for non bibtex users.
3) The zotero database becomes better because the latex code is clearly
recognizable a) by users, visually (easier to spot problems after
import); b) by automatic tools: easier to batch-change these entries if
e.g. the user wants to get rid of them, if zotero team comes up with a
bright idea on how to deal with mathematics, if writing a dedicated GUI,
etc.
Advantages of the proposal over spartan's proposal:
1) The user may use dollar signs where he wants in zotero to really mean
"dollar sign" and assume these will be exported accordingly to bibtex or
any format (i.e. as a dollar sign rendered in the resulting document,
thus as a quoted dollar sign in the bibtex source file).
2) Latex code is clearly recognizable from zotero (see point 3 here above).
Problems linked to the proposal:
1) ... well, someone has to implement it.
2) ... and I won’t do it.
... But at least having the specs and trying to agree on it makes us
move forward, I feel (when someone has time to implement something for
better bibtex support, he then has some hints about how to do it).
3) A simple and very concrete proposal to spartan (provided zotero team
agrees).
As suggested by I think Avram: spartan, add a user preference that asks
for dollar signs to be exported to bibtex as-is. The preference is by
default to export quoted dollar signs. Add a simple check into your
proposed patch to ensure the user really wants to export dollar signs to
bibtex as-is. Then I think everybody is happy.
Comments?
Olivier
For LaTeX-friendly math display on the web, MathJax seems to be a popular option these days. You might want to look into whether that might provide a path forward?
On a related note, it might be worth distinguishing between:
a) BibTeX junk that is a legacy of pre-unicode days and TeX-specific
hacks (e.g. all those silly curly braces, -- for – (en-dash), etc.)
b) math, in which nothing has caught on like (La)TeX notation. It
would be silly, in my opinion, to prevent users from including TeX
notation for math in abstracts (or even titles?), esp. with products
like MathJax providing an alternate rendering method for TeX math
notation.
(If somebody already made this point, my apologies for the inadvertent
plagiarism, but I haven’t been able to read all the text.)
best, Erik
No, that's a fine distinction.
Keeping that in mind, ASCIIMathML.js
(http://www1.chapman.edu/~jipsen/mathml/asciimath.html) looks
interesting, given that Firefox supports MathML rendering. It's
basically aiming for a user-friendly graphing calculator notation, and
it also supports standard LaTeX notation.
Advantages:
1) We could (in theory—I haven't looked to see if there's support for
this or if we'd have to write it) import LaTeX into a simpler notation
that could be clearer for people who don't know LaTeX and that looks
less like gibberish in places that don't handle LaTeX explicitly.
2) There's also a PHP port, which would let us use it on the server.
3) The ASCII notation could be displayed on the web for browsers that
don't support MathML (without requiring MathJax, which may be a bit heavy).
Downsides:
1) It's basically just a different arbitrary format, though it's at
least aiming to be as simple and intuitive as possible, and it seems to
have a fairly wide adoption in wikis and other tools.
2) Given that it supports LaTeX notation, it doesn't really address our
core concerns with privileging LaTeX in Zotero data fields, but, as Erik
notes, if math simply can't be represented as Unicode, there may not be
a better alternative.
I also haven't checked whether there's something to convert their ASCII
notation to LaTeX directly for use in the BibTeX export, though given
that it produces MathML we could get there one way or another.
If we implemented ASCIIMathML, the parsed math would obviously be
exported as proper LaTeX. That doesn't mean the escaping wouldn't remain
for non-math LaTeX (i.e., things for which there are Unicode equivalents).
It's good to hear that ASCIIMathML has worked for you before. Seems
promising.
To be honest I am not sure why Zotero would support ASCIIMathML
instead of TeX math. Everybody already knows TeX math.
By the way, given my previous remarks about distinguishing between
math/non-math in BibTeX, I would like to point out that $\alpha$ is a
tricky case. Is the user using $\alpha$ as a workaround because the
were not able to enter α (U+03B1 GREEK SMALL LETTER ALPHA), perhaps in
a paper about the greek alphabet? Or did they want 𝛼 (U+1D6FC,
MATHEMATICAL ITALIC SMALL ALPHA)? In which case should it be imported
as that unicode character or as <tex>$\alpha$</tex>? An importer will
always need to make some guesses about user intention, as will an
exporter.
best, Erik
You seem to be under the impression that we don't understand you. I
assure you we do. There's no need to keep repeating this.
I'm telling you that, if we implemented ASCIIMathML, parsed math
obviously wouldn't be escaped when exporting to BibTeX. The same would
quite possibly not be true�for the reasons we've outlined�for anything
that didn't go through the math parser.
Well, mainly because TeX math looks like code, and ASCIIMathML looks
(more) like math. The graphing calculator notation supported by
ASCIIMathML seems to be the better analogue to the Unicode
representations—i.e., natural, non-code-like representations—of other
LaTeX symbols that we're trying to use. And while I'll certainly defer
to others on this, it doesn't seem that "everybody" knows TeX math, or
else graphing calculators would use it and ASCIIMathML wouldn't be as
popular as it seems to be. Given spartan's example, it seems even people
who know TeX math appreciate the simpler syntax in some contexts.
I'd like to hear other opinions on this, though.
Thanks for your work on this, but we're not going to just pref anything
that's controversial.
- We have a possible solution (ASCIIMathML) to the escaping issue.
- The local path issue needs to be discussed further.
- It's not clear that we need to expose prefs to translators if the
above two issues are addressed without prefs. But this merits further
discussion as well.
You mention that other possibilities, possibly (or probably) better,
exist to solve the problem spartan is addressing here. True. But it�s
not a reasonably small change to ask to spartan to implement, it�s a
completely different path. He mentioned he won't do it (that�s
understandable), and AFAIU we are not close to find someone to do it. So
that suggestion amounts to leave the issue unresolved.
So to recap, spartan worked to improve the bibtex support. He proposed a
solution, and, after some (though) discussions he changed his proposal
to include most reasonable (in terms of required amount of work)
suggestions that he were offered. Importantly, he added a pref to make
sure that the default is as before and if the user really wants no
quote, he can. This means that spartan's proposal is a superset of the
current functionalities. Now it appears that the offered improvement
will perhaps not make it into zotero because some possibly better
solution that noone in the near future is going to implement exist?
Let me be clear: I favor (as my previous posts showed) thinking about a
better approach to solve that math problem. But I also favor being
realistic about the time, amount of work, probability of finding someone
to do it, etc., issues about that better approach. I�d suggest
separating long term and short term here.
>
> - The local path issue needs to be discussed further.
>
> - It's not clear that we need to expose prefs to translators if the
> above two issues are addressed without prefs. But this merits further
> discussion as well.
It seems reasonable to me to have per-translator preferences. Most
probably other translators could use these, e.g. because of other
formats ambiguities that the user could have to specify how to solve.
Olivier
...
>> - The local path issue needs to be discussed further.
>>
>> - It's not clear that we need to expose prefs to translators if the above
>> two issues are addressed without prefs. But this merits further discussion
>> as well.
>
> It seems reasonable to me to have per-translator preferences. Most probably
> other translators could use these, e.g. because of other formats ambiguities
> that the user could have to specify how to solve.
So if other translators can use preferences for key generation and
paths, say, then why are the preferences specific to one translator?
I'm guessing this would be Dan's question at least.
Bruce
I had suggested at the very start of this discussion that the
preferences say something like "translator", not "bibtex". We'd only
be talking about a subtree of the prefs, if getPref is added to the
sandbox. If the key generation preference is shared one or more of
file renaming, citation labels (like [JAbD99]), and BibTeX export,
then we should probably put it in a different part of the prefs and
give it its own function in the sandbox.
- Avram
Again, I don't know what Dan had in mind in his comment, but my point
would that be that the key and path issues are not specific to BibTeX:
they're relevant to most output formats.
Bruce
I am willing to accept the notion that mathematical expressions are a
kind of structured inline data of sorts. What I object to is
privileging TeX as the only way to represent that data.
To back up, my concerns are in part about the user-facing display
issues, but also about data portability across different formats
(HTML, RTF, Word, etc.).
Certainly you'd prefer to have the HTML output of your math-inclusive
titles be properly rendered as a math expression, than raw TeX?
Certainly it's feasible that in some data exchange contexts it might
make sense to have those expressions encoded as MathML rather than
TeX?
Those are the sorts of issues that are behind my own position on this.
Perhaps MathJax is worth looking at in the future, as it seems to be
able to a) properly represent math expressions, and b) convert among
TeX and MathML.
Bruce
best Alex
This is a re-post of my post at zotero-forum:
http://forums.zotero.org/discussion/17847/improving-bibtex-export/
I love zotero so much. But got disturbed by the poor support of
bibtex. So I spent quite some time to improve the functionality of the
bibtex export. It should be very useful for those who use bibtex/latex
a lot, in particular for scientists like me. But I don't know where I
should post my modified codes.
Here is the brief list of improvements from my hard work:
1. much more flexible (user control) bibtex key generation: from
author names, initials, title, journal, year, volume, pages ,...etc
2. the suffix for key collisions can be numeric or alphabetic
3. key format strings can be read from prefs
(extention.zotero.bibtex....)
4. attachments like pdf files can be exported with real path links
only. now you can easily get the resulting bibtex file work with other
external applications like JabRef and Mendeley.
5. unicode conversions for greek letters
6. user specified field like callNumber to export pre-stored keys
... and other improvements.
It's been working pretty well for me. But I'd like to share it with
your guys.
Please drop me a note if somebody knows how to get these improvements
added to the next release of zotero.
spartan