sync tool 4.2.0 temporary space

19 views
Skip to first unread message

Dan Pritts

unread,
Jan 20, 2017, 11:48:58 AM1/20/17
to DuraCloud Users
Hi,

Just upgraded to synctool 4.2.0 and am running a full sync on our space.

It is leaving files in /tmp - and in our case they are large files.
This may be happening when it comes close to running out of space -
that's been happening.

The log files don't contain these temporary file names, nor any mention
of /tmp. The console doesn't report any errors.

So, two questions:

1) In the short term, how do i specify a different directory for
temporary files? Didn't see anything on the wiki about it.

2) how can i debug why these files are getting left around? I'll see
if i can figure out where the original files are and search the logs for
them. Probably not too hard if i find based on size.

--
Dan Pritts
ICPSR Computing & Network Services
University of Michigan

Daniel Bernstein

unread,
Jan 23, 2017, 6:27:02 PM1/23/17
to Dan Pritts, DuraCloud Users
Hi Dan, 

With the release of synctool 4.2.0 we made some improvements to the way we are handling large file uploads.  Namely,  we have added support for automatic retries at the chunk level rather than at the entire file level.   Additionally, it the chunk already exists in DuraCloud,  we are now performing a checksum on each chunk before transferring it.  If the chunk already exists with the expected checksum,  we do not spend the time and bandwidth transferring it.  In order to accomplish these changes,  we now write the chunk to be transferred to disk before transferring it. Upon completion (be it successful or otherwise) we delete the temp file.  There is one important implication to this change.  If you are transferring files over 1GB,  you must have at least 1GB of extra disk space per  worker thread (ie specified in the -t setting) in addition to space for storing your sync tool logs.  Thus it is those very temp chunks that you are seeing in the temp dir.    Therefore if you want to run 10 simultaneous threads you'll need at least 10 GBs of space plus addition space for logs.  As long as the jvm shuts down in an orderly way,  those files will be removed.

So to answer your first question, you can set the temp directory by adding -Djava.io.tmpdir=/path/to/your/tmpdir to the command line parameters.

I'm not sure I can give you a good answer for 2.   You can certainly remove them from your tmpdir if the app is not running.  If you see files in the tempdir when the application is not running, it is likely that the JVM did not shutdown in an orderly way and thus did not have a chance to clean up which it should do under normal circumstances.  If the tool is still running, it is possible that you are simply seeing the chunks that are currently being transferred. 

If you have any additional questions/issues, please send them to sup...@duracloud.org.

Best regards, 

Daniel Bernstein
DuraCloud Support


************************************
Daniel Bernstein
Software Engineer, Duraspace
707.874.2045 (office)



--
You received this message because you are subscribed to the Google Groups "DuraCloud Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to duracloud-users+unsubscribe@googlegroups.com.
To post to this group, send email to duracloud-users@googlegroups.com.
Visit this group at https://groups.google.com/group/duracloud-users.
For more options, visit https://groups.google.com/d/optout.

Dan Pritts

unread,
Jan 23, 2017, 7:34:03 PM1/23/17
to Daniel Bernstein, DuraCloud Users
Ah, that is an important (and useful) change.   I must have missed it in the release notes.

the server where we run the sync tool didn't have anywhere near enough space in /tmp.   It looks like the sync tool attempted to copy a 1.7GB file to /tmp, but the copy failed at about 300MB.  The disk was definitely filling up, so that part's very plausible.   There were a boatload of retries and failures transferring the file, which may explain why it stayed around so long (over a day, which seemed like an awfully long time).   

All that said, I didn't find an explicit log message complaining about a failed copy. 

Seems like verbosely reporting disk-full conditions somewhere outside the log files would be a big win.  E.g., in the output of the "s"/status command in the console. 

In any event, I've added much more space to /tmp on the server and ran a successful full sync over the weekend, so we're good for now.

thanks
danno

Daniel Bernstein

unread,
Jan 23, 2017, 7:41:20 PM1/23/17
to Dan Pritts, DuraCloud Users
Glad to hear it.  Just for clarity:  if you are uploading a single 200GB file,  you would never need more than 1GB of tmp space for that file.  I just wanted to make sure that was clear.

Best, 

Daniel

************************************
Daniel Bernstein
Software Engineer, Duraspace
707.874.2045 (office)



Daniel Bernstein

unread,
Jan 23, 2017, 7:42:19 PM1/23/17
to Dan Pritts, DuraCloud Users
That is assuming your max chunk size is set to 1GB (which is the default).

************************************
Daniel Bernstein
Software Engineer, Duraspace
707.874.2045 (office)



Reply all
Reply to author
Forward
0 new messages