Mismatch between expected and downloaded (mm10)

3 views
Skip to first unread message

Mike Rightmire

unread,
Feb 26, 2018, 1:33:17 PM2/26/18
to UCSC Genome Browser Mailing List
Good Morning,

I've been downloaded the complement of the mm10 files and I've noticed a mismatch beween the expected downlaod sizes and the actual.

The total expected size:

    # rsync -hna --stats rsync://hgdownload-euro.soe.ucsc.edu/gbdb/mm10/ | egrep --color=auto 'Number of files:|total size is'
       Number of files: 892 (reg: 869, dir: 23)
       total size is 972.50G  speedup is 39,866,286.50 (DRY RUN)

Actual sizes:   

    # du -sh mm10/
       906G mm10/

    # find  ./mm10/ -type f | wc -l
       869

However, according to the rsync output (command used below), the download thinks it's complete.

    # export HGDOWNLOAD="hgdownload-euro.soe.ucsc.edu"
    # export RSYNCOPTS=""
    # export GBDBDIR="/gbdb/"
    # export db="mm10"
    # rsync --progress -avp $RSYNCOPTS $HGDOWNLOAD::gbdb/$db/ $GBDBDIR/$db/ 2>&1

Is there an issue here?

Thanks!
Mike


--

Universitäts Klinikum Heidelberg - University Hospital Heidelberg

Section of Bioinformatics and Systems Cardiology
Analysezentrum III - Klaus Tschira Institute

Mike Rightmire 

Bioinformatics and IT

Im Neuenheimer Feld 669

69120 Heidelberg

Tel.: +49 6221 56 - 34213
Fax.: +49 6221 56 - 6868

Cell: +49 176 7131 8758

Michael....@uni-heidelberg.de
http://www.klinikum.uni-heidelberg.de

Jairo Navarro Gonzalez

unread,
Mar 2, 2018, 2:23:05 PM3/2/18
to t...@ix.urz.uni-heidelberg.de, UCSC Genome Browser Mailing List

Hello Mike,

Thank you for using the UCSC Genome Browser and your inquiry.

You may find the following website useful in learning what the rsync command is doing when you use the --stats argument.

https://www.explainshell.com/explain?cmd=rsync+-hna+--stats

From that page, rsync's stats argument calculates a count of all files and a count of files to be transferred:
  • Number of files is the count of all "files" (in the generic sense), which includes directories, symlinks, etc.
  • Number of files transferred is the count of normal files that were updated via rsync’s delta-transfer algorithm, which does not include created dirs, symlinks, etc.

The following output you provided reflects these numbers:

Number of files: 892 (reg: 869, dir: 23)

The number of regular files, not directories or symlinks, transferred was 869, which is the total number of files minus directories:

892 - 23 = 869

The total file size calculated includes the sizes of any symlinks, which are not transferred by the rsync program. This could explain the size difference between rsync --stats and du.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly-accessible Google Groups forum.
If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Jairo Navarro 
UCSC Genomics Institute

Want to share the Browser with colleagues?
Host a workshop: http://bit.ly/ucscTraining


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/5A93C601.6010209%40uni-heidelberg.de.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.

Reply all
Reply to author
Forward
0 new messages