hg38 Assembly Accession Number Confusion

2 views
Skip to first unread message

McDonald, James

unread,
Apr 2, 2018, 5:58:09 PM4/2/18
to gen...@soe.ucsc.edu
UCSC Genome Browser Staff,

I'm James McDonald, a postdoc in Dr. Chiappinelli's lab at George Washington University. We are trying to download and use hg38. However, I have hit some confusion on the assembly accession numbers.

From the hg38 gateway page, I followed the link to download via FTP (ftp://hgdownload.soe.ucsc.edu/goldenPath/hg38/).

I checked both of the following files: 1) /goldenPath/hg38/database/README.txt and 2) /goldenPath/hg38/bigZips/README.txt. These indicate that the directory contains hg38 or GRCh38 with the genome assembly accession of GCA_000001405.2. When I search this on NCBI's website in the assembly database, that accession matches GRCh37.p1.

However, when I dig down into /goldenPath/hg38/bigZips/analysisSet/README.txt to get the actual files I need for indexing sequence aligners, the README indicates the files here contain the GRCh38 accession number GCA_000001405.15 files. This does match the correct annotation in the NCBI assembly database.

I am confused about how the two parent directories for the hg38 data list a different accession number than the analysisSet directory. I am especially concerned because the GRCh37.p1 accession number suggests I might be downloading hg19 data instead of hg38. 

Is this just a typo of the accession number in the README files? Can you point me to the correct data for the hg38 Dec. 2013 release that I can use for next-generation sequence analysis?

Thank you very much,
James

Christopher Lee

unread,
Apr 3, 2018, 12:45:21 PM4/3/18
to McDonald, James, UCSC Genome Browser Discussion List
Hi James,

Thank you for your question about accession numbers. This was indeed a
typo and should be fixed up now. Please let us know if you have any
further questions!

Thanks,

Christopher Lee
UCSC Genomics Institute
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "UCSC Genome Browser Public Support" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to genome+un...@soe.ucsc.edu.
> To post to this group, send email to gen...@soe.ucsc.edu.
> Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
> To view this discussion on the web visit
> https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAKZ1J2EFtbQysukXwLP2A4kE%3Dy4psVWjQcTt4TYRP%3Dk4Lh%3DZLQ%40mail.gmail.com.
> For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.
Reply all
Reply to author
Forward
0 new messages