Export Translators - Attachment Export Capabilities

94 views
Skip to first unread message

Alexander Craig

unread,
Jun 27, 2011, 11:43:04 AM6/27/11
to zoter...@googlegroups.com
Hello,

I'm currently working on an export translator to allow Zotero to interface with a custom reference management system that I'm developing. The system uses an XML based format for citation data which is easy enough to generate through an export translator, but file attachments are causing me some difficulties. Ideally what I'd like to do is only export PDF format attachments from Zotero entries, and move all the PDF files to the same directory as the exported XML file (ignoring name conflict issues for now). 

As I understand it, this functionality can not be supported through the current Zotero translator sandbox. If the "exportFileData" option is set, all attachments for the exported entries are exported to the user's selected export location in separate sub-directories. The "attachment.path" attribute of each attachment is set to the relative location form the user's selected export directory, but neither the absolute path to the attachment or the path to the user's selected export location are exposed to the translator code.

If anyone could provide some guidance as to how I could support this feature I would be very appreciative. I'd like to maintain compatibility with standard Zotero installs, so if there really is no way to further modify attachment exports through translator code perhaps this feature could be considered as an addition to the base package?

Alexander Craig

unread,
Aug 4, 2011, 9:41:01 AM8/4/11
to zotero-dev
As per a recommendation I received on the general forums, I've
implemented my requested changes against the trunk and made a patch
available at:

http://www.thecraigrepository.ca/files/up2p/ZoteroExportAttachments.patch

If this can be considered for inclusion in the next Zotero release I
would be happy to make any further required changes myself.

Simon

unread,
Aug 4, 2011, 12:42:50 PM8/4/11
to zoter...@googlegroups.com
Copying files and then deleting them is probably not the right way of doing this. Instead, I'd suggest that we add a saveFile() method to the item object returned by Zotero.nextItem(). The function would take a path relative to the current save directory, create any intermediate directories if they don't exist, and copy the file there (along with supporting files in the case of snapshots). We could then modify the few existing translators that support exporting files to call this function. These modifications should take place in translate_item.js (exportFiles and _attachmentToArray); translate.js should have as little Firefox-specific code as possible. How does this sound to you?

Simon

Avram Lyon

unread,
Aug 4, 2011, 3:06:33 PM8/4/11
to zoter...@googlegroups.com

From the point of view of a translator author / maintainer, this
sounds like a reasonable workflow. Alexander? Would you be up for
trying to recast the patch to work like this? And would this still
address the use case that's motivating the patch?

Avram

Alexander Craig

unread,
Aug 4, 2011, 4:28:29 PM8/4/11
to zotero-dev
I agree that the proposed workflow is better for the general case than
my own, and I could definitely implement the changes.

I think these changes should cover my use case, with the exception of
filename conflicts. In my case I'm expecting to move a lot of PDF
attachment files to the same directory, and naming conflicts will
potentially be an issue. The automatic name mangling that I
implemented in the current patch is definitely hacky, and should be
replaced. Ideally I think the saveFile() call should have an optional
parameter that can be used to specify a new filename for the
attachment, and an error should be thrown if a file already exists at
the specified path / filename. This way it would be up to individual
translator authors to determine how they want to handle any possible
conflicts. In the case of snapshots the supporting files likely can't
change names, but at least an error could be thrown if any of the
supporting files would cause a name conflict. This would allow the
translator author to retry the save with a new directory (but not new
filenames). If anyone has any better ideas to deal with naming issues,
I'd be more than happy to hear them.

Alex

Simon

unread,
Aug 4, 2011, 6:21:48 PM8/4/11
to zoter...@googlegroups.com
On Thursday, August 4, 2011 4:28:29 PM UTC-4, DBrickShaw wrote:
I agree that the proposed workflow is better for the general case than
my own, and I could definitely implement the changes.

I think these changes should cover my use case, with the exception of
filename conflicts. In my case I'm expecting to move a lot of PDF
attachment files to the same directory, and naming conflicts will
potentially be an issue. The automatic name mangling that I
implemented in the current patch is definitely hacky, and should be
replaced. Ideally I think the saveFile() call should have an optional
parameter that can be used to specify a new filename for the
attachment, and an error should be thrown if a file already exists at
the specified path / filename. This way it would be up to individual
translator authors to determine how they want to handle any possible
conflicts. In the case of snapshots the supporting files likely can't
change names, but at least an error could be thrown if any of the
supporting files would cause a name conflict. This would allow the
translator author to retry the save with a new directory (but not new
filenames). If anyone has any better ideas to deal with naming issues,
I'd be more than happy to hear them.

I was thinking saveFile() would take a full path to the file (i.e., for the current structure, attachment.itemID+"/"+attachment.filename). A translator could thus keep track of the files it has saved, and mangle filenames itself as necessary to prevent duplicates. We might also want to consider making saveFile() throw by default if a file of the same name already exists, and add an optional argument to overwrite existing files.

Simon
Message has been deleted
Message has been deleted

Alexander Craig

unread,
Aug 5, 2011, 3:13:52 PM8/5/11
to zotero-dev
I've put together a new patch implementing the Simon's suggestions,
and it can be downloaded at:

http://www.thecraigrepository.ca/files/up2p/ZoteroAttachSaveItem.patch.

Some implications of these changes:
- ItemGetter._exportFileDirectory now references the base directory of
the export, and a "files" sub-directory is no longer automatically
generated.
- attachment.path still points to the default export path, but no file
is automatically created at this location. However, the result of this
is that "attachment.saveItem(attachment.path)" performs the same
operations as exporting files previously did, so conversion of the
translators should be fairly pain free.
- An exception is thrown if any of the files to be copied would
overwrite an existing file (including support files for snapshots). An
optional parameter is provided to force overwriting of existing files.

Here's a snippet from my translator to give an idea of what translator
code will look like:

if(attachment.mimeType == "application/pdf") {
var filename = attachment.filename;
var renameCounter = 2;
var saveSuccess = false;
while(!saveSuccess) {
try {
attachment.saveItem(filename);
saveSuccess = true;
} catch (e) {
filename = "[" + renameCounter + "]" +
attachment.filename;
renameCounter++;
}
}
} else if (attachment.mimeType == "text/html") {
attachment.saveItem(attachment.filename);
}

In this case, I'm copying all the PDF files to the base directory (and
renaming as conflicts occur), while HTML snapshots are being exported
to the default export location.

Alex

Dan Stillman

unread,
Aug 5, 2011, 3:14:36 PM8/5/11
to zoter...@googlegroups.com
On 8/5/11 3:02 PM, Alexander Craig wrote:
> Here's a snippet from my translator to give an idea of what translator
> code will look like:
> if(attachment.mimeType == "application/pdf") {
> var filename = attachment.filename;
> var renameCounter = 2;
> var saveSuccess = false;
>
> while(!saveSuccess) {
> try {
> attachment.saveItem(filename);
> saveSuccess = true;
> } catch (e) {
> filename = "[" + renameCounter + "]" + attachment.filename;
> renameCounter++;
> }
> }
>
> storeField("file", "file:" + attachment.filename,
> itemFieldsMap, mapKeys);

> } else if (attachment.mimeType == "text/html") {
> attachment.saveItem(attachment.filename);
> }

If saveItem() throws an error for another reason (e.g., another Mozilla
file error), the code above will result in an infinite loop. (If it
managed to save before the error, you could even fill up the filesystem
with duplicate files.) You should either add a fail-safe counter or,
better yet, throw and test for a specific error (see error.js and
examples in sync.js).

Alexander Craig

unread,
Aug 5, 2011, 3:17:15 PM8/5/11
to zotero-dev
Very true, and I probably should have clarified that this is a
stripped down version of my translator code just intended to showcase
how the new call would be used. The saveItem call will thrown an Error
object if a duplicate is detected before copying, whereas any other
error will rethrow whatever exception the IO operation generated.

Dan Stillman

unread,
Aug 5, 2011, 3:47:21 PM8/5/11
to zoter...@googlegroups.com
On 8/5/11 3:17 PM, Alexander Craig wrote:
> Very true, and I probably should have clarified that this is a
> stripped down version of my translator code just intended to showcase
> how the new call would be used. The saveItem call will thrown an Error
> object if a duplicate is detected before copying, whereas any other
> error will rethrow whatever exception the IO operation generated.

OK, but this has implications for the patch, which is why I bring it up.
What are you planning to test against to know whether you need to
increment the counter or just re-throw the error? You're throwing an
Error object in a few places in saveItem (which isn't necessary, by the
way�throwing a string is fine if you're not using the other Error
properties), so all you can do is look for that particular
human-readable error string ("A file already exists at the specified
file path"), which isn't particularly good practice. Using an object
with a specific property is probably overkill (and I guess translators
won't have access to Zotero.Error anyway), but you should at least
include a specific code in the thrown string (e.g., "ERROR_FILE_EXISTS")
that won't change and that translators can look for.

Alexander Craig

unread,
Aug 5, 2011, 4:09:41 PM8/5/11
to zotero-dev
In my current code I'm doing this:

try {
// Save item
} catch (e) {
if(e instanceof Error) {
// Rename and try again
} else {
throw e;
}
}

My reasoning is that for any exception that I haven't explicitly
thrown an Error object for the translator code would have no way to
deal with it anyway. The other main error cases I can think of are
when the attachment has become corrupted or moved in local storage,
when the file system has run out of disk space, or perhaps some kind
of permissions issue. In any of these cases there's no change that can
be made to the saveItem() parameters to make it work, and the
translator code lacks the permissions to deal with the real problem.

I can definitely see the value in providing a constant code in the
error though. Where would you recommend this be defined?

Alex

On Aug 5, 3:47 pm, Dan Stillman <dstill...@zotero.org> wrote:
> On 8/5/11 3:17 PM, Alexander Craig wrote:
>
> > Very true, and I probably should have clarified that this is a
> > stripped down version of my translator code just intended to showcase
> > how the new call would be used. The saveItem call will thrown an Error
> > object if a duplicate is detected before copying, whereas any other
> > error will rethrow whatever exception the IO operation generated.
>
> OK, but this has implications for the patch, which is why I bring it up.
> What are you planning to test against to know whether you need to
> increment the counter or just re-throw the error? You're throwing an
> Error object in a few places in saveItem (which isn't necessary, by the
> way throwing a string is fine if you're not using the other Error

Alexander Craig

unread,
Aug 5, 2011, 4:18:45 PM8/5/11
to zotero-dev
On further inspection that doesn't actually work as I expected it to,
since (as you said) I can't compare against Zotero.Error. I'll
definitely implement a better way to identify the error and post a new
patch.

Alex

Dan Stillman

unread,
Aug 5, 2011, 4:46:55 PM8/5/11
to zoter...@googlegroups.com
On 8/5/11 4:09 PM, Alexander Craig wrote:
> In my current code I'm doing this:
>
> try {
> // Save item
> } catch (e) {
> if(e instanceof Error) {
> // Rename and try again
> } else {
> throw e;
> }
> }
>
> My reasoning is that for any exception that I haven't explicitly
> thrown an Error object for the translator code would have no way to
> deal with it anyway.

But my point is that "e instanceof Error" will match all cases of Error
being thrown�not just the file-exists error�as well as potentially other
JS errors, so saveItem() needs to include something more specific to
test against. (For example, in your current code, an empty path, which
could easily happen by accident, would result in an endless loop, as you
throw an Error there too. Same if some other Error was added to the
function at a later point.) The easiest approach, I'm suggesting, is
just to add "ERROR_FILE_EXISTS" to the error string, and translators can
look for that.

(As a side note, despite my earlier claim, Simon points out that
throwing an Error object (vs. a string) does have the advantage of
automatically including a line number and stack trace.)

Alexander Craig

unread,
Aug 8, 2011, 10:08:26 AM8/8/11
to zotero-dev
I've implemented a new version with your suggested changes to
exceptions, and the patch is available here:

http://www.thecraigrepository.ca/files/up2p/ZoteroAttachSaveItem2.patch

The error message in the case of a naming conflict is set to
"ERROR_FILE_EXISTS <filename>", and when an empty or undefined path is
provided the error message is set to "ERROR_EMPTY_PATH". I decided to
keep using Error objects because of the stack trace benefit you
mentioned. A new example of what calling code would look like:

if(attachment.mimeType == "application/pdf") {
var filename = attachment.filename;
var renameCounter = 2;
var saveSuccess = false;

while(!saveSuccess) {
try {
attachment.saveItem("attachDir/" + filename);
saveSuccess = true;
} catch (e) {
if(e.message.indexOf("ERROR_FILE_EXISTS") == 0) {
filename = "[" + renameCounter + "]" +
attachment.filename;
renameCounter++;
} else {
throw e;
}
}
}
}

Is there somewhere beyond translate_item.js that these error codes
should be documented? Also, my code assumes that forward slashes will
be used as separators in the paths regardless of OS since this was the
format previously provided in attachment.path.

Cheers,
Alex

On Aug 5, 4:46 pm, Dan Stillman <dstill...@zotero.org> wrote:
> On 8/5/11 4:09 PM, Alexander Craig wrote:
>
> > In my current code I'm doing this:
>
> > try {
> >      // Save item
> > } catch (e) {
> >      if(e instanceof Error) {
> >          // Rename and try again
> >      } else {
> >          throw e;
> >      }
> > }
>
> > My reasoning is that for any exception that I haven't explicitly
> > thrown an Error object for the translator code would have no way to
> > deal with it anyway.
>
> But my point is that "e instanceof Error" will match all cases of Error
> being thrown not just the file-exists error as well as potentially other
Reply all
Reply to author
Forward
0 new messages