File size changed

204 views
Skip to first unread message

HAIK LI

unread,
Feb 27, 2021, 11:11:31 AM2/27/21
to Discuss
Hello all!

I am a new user of Globus.
I just find that when I make data transfer between two endpoints, file sizes may be changed.

I was trying to transfer a .fastq.gz file, ~20GB. When it was downloaded onto my lab computer, the data size became ~19GB. I am not sure if the file got damaged. I never saw a change of file size when I was using wget or scp to download data on a terminal before.

Any advice or feedback would be appreciated!

Best

Alan Sill

unread,
Feb 27, 2021, 11:54:59 AM2/27/21
to HAIK LI, Alan Sill, Discuss
By default, Globus Connect uses checksums to verify file integrity after transfer. When you initiate your transfer, you may select the “Transfer Sync Options” pull-down menu to see that this option is selected by default.

There could be some differences between file systems and operating system behavior in reporting the file size. If you can determine in each case how to list the file sizes at the byte and not summarized GB level, you may be able to compare their sizes in more detail. 


--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@globus.org.

Stephen Rosen

unread,
Feb 28, 2021, 4:37:59 PM2/28/21
to Alan Sill, HAIK LI, Discuss
Alan's answer is spot-on!
The "verify file integrity" option refers to checksumming, which is enabled by default in the web interface and globus-cli, and ensures that if file corruption occurs, the file will be retransferred.

However, if you're not using the Globus webapp or globus-cli to submit your Transfer task, the option (verify_checksum) might not be set or might be set to false.
So if you're using some 3rd party tool (e.g. a script specific to your lab), you may want to double check that it's either implemented in terms of the globus-cli or setting this flag.


The possibility that some tools are showing you "Gigabytes" (1GB = 10^9 bytes) and that others are showing you "GibiBytes" (1GB = 2^30) is, sadly, very real.

Aside: yes, GibiByte is a real word. Do not worry, I've never met anyone who says "gibibyte" either. :-)


The other explanation which comes to mind is that you may be viewing a file which has not finished transferring yet. If the task is in progress, the file will be visible on the destination but will be incomplete.

If you want to view the exact number of bytes in the file, on a typical linux I'd suggest using the stat command:
  $ stat -c '%s' foo.fastq.gz

This shows the exact number of bytes in the file. Run the same command on both ends and check that the output is the same.
You can also checksum manually with commands like sha256sum.


Best,
-Stephen

Prentice Bisbal

unread,
Mar 1, 2021, 12:26:40 PM3/1/21
to dis...@globus.org

Different filesystem types may report different usage statistics for how much disk space the same file uses. It has to do with the details of how the file is written to disk my different filesystems. If you want to make sure the files are really the same on each system you can use a checksum command like md5sum or sha256sum. These commands return a checksum that will match only if the files are identical, like this:

$ md5sum test_file
b6d81b360a5672d80c27430f39153e2c  test_file

$ sha256sum test_file
30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58  test_file

sha256sum is considered more secure than md5sum, but md5sum is still widely used. For more information, see:

https://en.wikipedia.org/wiki/Md5sum

https://en.wikipedia.org/wiki/Sha1sum

Prentice

--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@globus.org.
-- 
Prentice 
Reply all
Reply to author
Forward
0 new messages