Lossless export

79 views
Skip to first unread message

Conal Tuohy

unread,
Aug 3, 2015, 11:35:48 PM8/3/15
to zotero-dev
I need to export and process a Group Library, and I would like to export it in a form which does not lose any of the metadata.

From the wiki page <https://www.zotero.org/support/kb/field_mappings>, it seems that some fields are not exported in certain export formats:
Only specific Zotero metadata fields are compatible with fields in other formats (i.e. RIS, MODS). Depending on which format is chosen for import/export, specific data will translate across formats while others will not.
Are any of the export formats strictly lossless? Zotero RDF, perhaps? Any of the standard interchange formats?

Thanks in advance for any advice!

Conal

If not, can anyone advise me of

Aurimas Vinckevicius

unread,
Aug 4, 2015, 2:35:02 AM8/4/15
to zoter...@googlegroups.com
Zotero RDF is fairly close, but it's not strictly lossless (I don't recall exactly what's not exported. Item relations, for one)

From the standard formats, we try to do our best with RIS, but it's not a very good format, so there is some loss.

Depending on your needs exporting to CSV should give you most of the metadata in a table format. No attachments though.

--
You received this message because you are subscribed to the Google Groups "zotero-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zotero-dev+...@googlegroups.com.
To post to this group, send email to zoter...@googlegroups.com.
Visit this group at http://groups.google.com/group/zotero-dev.
For more options, visit https://groups.google.com/d/optout.

Conal Tuohy

unread,
Aug 4, 2015, 2:55:27 AM8/4/15
to zoter...@googlegroups.com

Thanks Aurimas! There's some kind of backup system isn't there? Does it have an XML serialization?

Dan Stillman

unread,
Aug 4, 2015, 3:00:10 AM8/4/15
to zoter...@googlegroups.com
On 8/4/15 2:54 AM, Conal Tuohy wrote:
>
> There's some kind of backup system isn't there? Does it have an XML
> serialization?
>

Backing up just means copying the data directory, including the SQLite
database that stores the data. Depending on your needs, you can of
course pull data directly from the database, but that might be more
complicated than you want.

In addition to Zotero RDF and CSV from the client, which are close to
lossless, you can also use the web API to get truly lossless data:

https://www.zotero.org/support/dev/web_api/v3/

(You'll be able to get API JSON from the client in Zotero 5.0.)

Aurimas Vinckevicius

unread,
Aug 4, 2015, 3:00:38 AM8/4/15
to zoter...@googlegroups.com
https://www.zotero.org/support/zotero_data It's an SQLite database (binary)

Conal Tuohy

unread,
Aug 4, 2015, 6:25:59 AM8/4/15
to zoter...@googlegroups.com

On 4 August 2015 at 17:00, Dan Stillman <dsti...@zotero.org> wrote:
you can also use the web API to get truly lossless data:

https://www.zotero.org/support/dev/web_api/v3/

Truly lossless export is exactly what I want, thanks Dan!

But I'm having difficulty with this; should this URL not return me a list of items in this public group library?

https://api.zotero.org/groups/patrick_white_notebooks/items?v=3

I just get an HTTP 500 Internal Server Error with an entity content of:

An error occurred

I note too that I can request a bunch of different formats, including several that appear to be the same as the lossy formats I could get through the client. Which format, if any, is actually lossless?

Con

--
@conal_tuohy

Conal Tuohy

unread,
Aug 4, 2015, 6:48:41 AM8/4/15
to zoter...@googlegroups.com
OK I have sussed it - I found the group ID for my library by sniffing the HTTP traffic as I browsed the Group Library on the zotero.org website. The web page for the library made an AJAX request with the URL I needed. Incidentally, is there some more convenient way to find the group ID for a given group library?

When I request the list of items in atom XML format, I get this list which I think is exactly what I need:

Sebastian Karcher

unread,
Aug 4, 2015, 9:40:47 AM8/4/15
to zoter...@googlegroups.com
There are a bunch of ways -- what I do is to click on the "subscribe to
this feed" button in the library view of your group:
https://www.zotero.org/groups/patrick_white_notebooks/items
and that will give you the group id (it will actually give you a full
API call, which is useful as a starting point anyway).

Hth,
Sebastian


On 08/04/2015 05:48 AM, Conal Tuohy wrote:
> OK I have sussed it - I found the group ID for my library by sniffing
> the HTTP traffic as I browsed the Group Library on the zotero.org
> <http://zotero.org> website. The web page for the library made an AJAX
> request with the URL I needed. Incidentally, is there some more
> convenient way to find the group ID for a given group library?
>
> When I request the list of items in atom XML format, I get this list
> which I think is exactly what I need:
>
> https://api.zotero.org/groups/300568/items?v=3&format=atom
>
>
> On 4 August 2015 at 20:25, Conal Tuohy <conal...@gmail.com
> <mailto:conal...@gmail.com>> wrote:
>
>
> On 4 August 2015 at 17:00, Dan Stillman <dsti...@zotero.org
> <mailto:dsti...@zotero.org>> wrote:
>
> you can also use the web API to get truly lossless data:
>
> https://www.zotero.org/support/dev/web_api/v3/
>
>
> Truly lossless export is exactly what I want, thanks Dan!
>
> But I'm having difficulty with this; should this URL not return me
> a list of items in this public group library?
>
> https://api.zotero.org/groups/patrick_white_notebooks/items?v=3
>
> I just get an HTTP 500 Internal Server Error with an entity
> content of:
>
> An error occurred
>
>
> I note too that I can request a bunch of different formats,
> including several that appear to be the same as the lossy formats
> I could get through the client. Which format, if any, is actually
> lossless?
>
> Con
>
> --
> Conal Tuohy
> http://conaltuohy.com/
> @conal_tuohy
>
>
>
>
> --
> Conal Tuohy
> http://conaltuohy.com/
> @conal_tuohy
> --
> You received this message because you are subscribed to the Google
> Groups "zotero-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to zotero-dev+...@googlegroups.com
> <mailto:zotero-dev+...@googlegroups.com>.
> To post to this group, send email to zoter...@googlegroups.com
> <mailto:zoter...@googlegroups.com>.

Emiliano Heyns

unread,
Aug 4, 2015, 11:01:04 AM8/4/15
to zotero-dev
The debug translator included in Better BibTeX (called BetterBibTeX JSON) should be lossless -- I think it only strips out duplicate fields without otherwise changing the reference.

Dan Stillman

unread,
Aug 4, 2015, 2:38:07 PM8/4/15
to zoter...@googlegroups.com
On 8/4/15 6:48 AM, Conal Tuohy wrote:
> When I request the list of items in atom XML format, I get this list
> which I think is exactly what I need:
>
> https://api.zotero.org/groups/300568/items?v=3&format=atom

If you're processing it and want complete data, you want the default
JSON, which you'll get without the format=atom.

Conal Tuohy

unread,
Aug 4, 2015, 11:21:03 PM8/4/15
to zotero-dev
I'm actually after XML, rather than JSON, because I'm intending to process the data with XML tools.

It looks to me like the data expressed in the XHTML contains all the bibliographic metadata. Am I wrong?

The only difference I've seen so far is that in the JSON format, the author's firstName and lastName are distinct, whereas in the XML they are joined together into a single "Author" value, i.e.

JSON:
{
  • "creatorType": "author",
  • "firstName": "Patrick",
  • "lastName": "White"
}

XML:
          <tr class="creator">
            <th style="text-align: right">Author</th>
            <td>Patrick White</td>
          </tr>

... but I think I can live with that.

Dan Stillman

unread,
Aug 5, 2015, 3:07:35 AM8/5/15
to zoter...@googlegroups.com
On 8/4/15 11:21 PM, Conal Tuohy wrote:
> I'm actually after XML, rather than JSON, because I'm intending to
> process the data with XML tools.
>
> It looks to me like the data expressed in the XHTML contains all the
> bibliographic metadata. Am I wrong?

The Atom doesn't give you tags, relations, or collections, but if you
can live with that, it should be fine.

Conal Tuohy

unread,
Aug 5, 2015, 10:01:06 PM8/5/15
to zoter...@googlegroups.com
Ah OK ... I do need at least some of those things so I guess I will need to parse the JSON data into an XML form. Thank you very much, Dan!

Conal Tuohy

unread,
Aug 30, 2015, 2:48:49 AM8/30/15
to zoter...@googlegroups.com
I wrote some code to extract a shared library from Zotero's web API, (in JSON format, since my goal was to get an exhaustive dump of the data) and convert the result into EAD.

The code is here: https://github.com/Conal-Tuohy/ZoteroEADConverter

I blogged some thoughts about it here: http://conaltuohy.com/blog/zotero-web-api-data-format/

Thanks to all who responded to my earlier questions!

Conal

Bruce D'Arcus

unread,
Aug 30, 2015, 12:36:48 PM8/30/15
to zoter...@googlegroups.com
Not sure I agree with your conclusions on XML as a solution. Have you
tried the rdf_bibliontology output? I'd say if that's lacking somehow,
then it makes more sense to improve that, rather than create a new
format.

Conal Tuohy

unread,
Aug 31, 2015, 12:17:55 AM8/31/15
to zoter...@googlegroups.com
I did consider creating an export translator, or more likely modifying one of the existing export functions. But I chose to work with the web API instead, because although it wasn't an issue for my current task at hand, it made more sense to work with the web API than with a client-side JS API, if I wanted to build Zotero data into other web based systems (e.g. producing linked data) in future. That's where I think the web API has a strategic significance that goes beyond the individual export functions built into the client (because it is more open).

The key thing a web API needs is to be explicit about what the data means. If you look at the way that the rdf_bibliography export represents Zotero properties, for instance, a lot of Zotero's data model is represented in terms of externally defined ontologies (e.g. Dublin Core), there are a still quite a few Zotero properties which are lacking a definition in the RDF, for example <http://www.zotero.org/namespaces/export#itemType>, <http://www.zotero.org/namespaces/export#automaticTag>, etc. So even if ported to the web API, that translator would still need further work to deal with the issue I blogged about at <http://conaltuohy.com/blog/zotero-web-api-data-format/#formal-format>.

Mapping Zotero's model to a range of bibliographic ontologies is not a trivial or uncontroversial task, especially as new ones are popping up all the time. It would be much simpler to start by offering a web API which can produce an output that fully and faithfully (and self-descriptively!) represents the internal model of Zotero; then that's a platform that e.g. linked data or other web APIs could be built on top of.







Sebastian Karcher

unread,
Aug 31, 2015, 1:07:00 AM8/31/15
to zoter...@googlegroups.com
FWIW, aurimasv.github.io/z2csl/typeMap.xml does have an exhaustive
mapping of the data part of API JSON to Zotero fields (we've started
documenting the recommended use of Zotero fields, but that's a difficult
task and since the human-entered part of Zotero data isn't going to be
input by catalogers but by users, it'll only ever be prescriptive, not
necessarily descriptive documentation).

Using existing export translators (including Zotero and Bibliontology
RDF) is possible via web API
https://www.zotero.org/support/dev/web_api/v3/basics#export_formats
though I understand why you'd want to work with 100% lossless data which
neither is.

I disagree on treating Zotero API JSON as a data format independent of
Zoterohttps://xkcd.com/927/, and I do think it's a highly human-readable
data format for anyone familiar with Zotero, but full documentation
would certainly be nice.
> <mailto:bda...@gmail.com>> wrote:
>
> Not sure I agree with your conclusions on XML as a solution. Have you
> tried the rdf_bibliontology output? I'd say if that's lacking somehow,
> then it makes more sense to improve that, rather than create a new
> format.
>
> On Sun, Aug 30, 2015 at 2:48 AM, Conal Tuohy
> <conal...@gmail.com <mailto:conal...@gmail.com>> wrote:
> > I wrote some code to extract a shared library from Zotero's web
> API, (in
> > JSON format, since my goal was to get an exhaustive dump of the
> data) and
> > convert the result into EAD.
> >
> > The code is here: https://github.com/Conal-Tuohy/ZoteroEADConverter
> >
> > I blogged some thoughts about it here:
> > http://conaltuohy.com/blog/zotero-web-api-data-format/
> >
> > Thanks to all who responded to my earlier questions!
> >
> > Conal
> >
> > On 6 August 2015 at 12:00, Conal Tuohy <conal...@gmail.com
> <mailto:conal...@gmail.com>> wrote:
> >>
> >>
> >> On 5 August 2015 at 17:07, Dan Stillman <dsti...@zotero.org
> <mailto:dsti...@zotero.org>> wrote:
> >>>
> >>> On 8/4/15 11:21 PM, Conal Tuohy wrote:
> >>>>
> >>>> I'm actually after XML, rather than JSON, because I'm
> intending to
> >>>> process the data with XML tools.
> >>>>
> >>>> It looks to me like the data expressed in the XHTML contains
> all the
> >>>> bibliographic metadata. Am I wrong?
> >>>
> >>>
> >>> The Atom doesn't give you tags, relations, or collections, but
> if you can
> >>> live with that, it should be fine.
> >>
> >>
> >> Ah OK ... I do need at least some of those things so I guess I
> will need
> >> to parse the JSON data into an XML form. Thank you very much, Dan!
> >>
> >>
> >> --
> >> Conal Tuohy
> >> http://conaltuohy.com/
> >> @conal_tuohy
> >
> >
> >
> >
> > --
> > Conal Tuohy
> > http://conaltuohy.com/
> > @conal_tuohy
> > +61-466-324297 <tel:%2B61-466-324297>
> >
> > --
> > You received this message because you are subscribed to the
> Google Groups
> > "zotero-dev" group.
> > To unsubscribe from this group and stop receiving emails from
> it, send an
> > email to zotero-dev+...@googlegroups.com
> <mailto:zotero-dev%2Bunsu...@googlegroups.com>.
> > To post to this group, send email to zoter...@googlegroups.com
> <mailto:zoter...@googlegroups.com>.
> > Visit this group at http://groups.google.com/group/zotero-dev.
> > For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google
> Groups "zotero-dev" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to zotero-dev+...@googlegroups.com
> <mailto:zotero-dev%2Bunsu...@googlegroups.com>.
> To post to this group, send email to zoter...@googlegroups.com
> <mailto:zoter...@googlegroups.com>.
> Visit this group at http://groups.google.com/group/zotero-dev.
> For more options, visit https://groups.google.com/d/optout.
>
>
>
>
> --
> Conal Tuohy
> http://conaltuohy.com/
> @conal_tuohy
> +61-466-324297
> --
> You received this message because you are subscribed to the Google
> Groups "zotero-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to zotero-dev+...@googlegroups.com
> <mailto:zotero-dev+...@googlegroups.com>.
> To post to this group, send email to zoter...@googlegroups.com
> <mailto:zoter...@googlegroups.com>.

Philipp Zumstein

unread,
Aug 31, 2015, 2:39:41 AM8/31/15
to zoter...@googlegroups.com
As far as I understand it, one main obstacle is that there is no documentation of the zotero namespace https://www.zotero.org/namespaces/export# , neither in human-readable form nor in machine-readable form (I guess that should be some RDFS, OWL modelling). Any reason not to do this?

A documentation should explain the structure, e.g. what comes inside a "z:Attachment", provide a label maybe in different languages, define domain, range, subproperty... any description or comments help further to know the intention of an element. (e.g. http://dublincore.org/2012/06/14/dcterms.rdf ). Additional, application profiles describe how the format is used in a specific application (see for example the application profiles of dublin core), but this does not to be part of the documentation.

To unsubscribe from this group and stop receiving emails from it, send an email to zotero-dev+...@googlegroups.com.
To post to this group, send email to zoter...@googlegroups.com.

Kieren Diment

unread,
Aug 31, 2015, 4:09:30 AM8/31/15
to zoter...@googlegroups.com
> might be more complicated than you want

Having worked with the database layer a little (a while ago), it's definitely non-trivial.  So in this instance I would "might" as "will"

Reply all
Reply to author
Forward
0 new messages