UCSC Genome Browser Staff,
I'm James McDonald, a postdoc in Dr. Chiappinelli's lab at George Washington University. We are trying to download and use hg38. However, I have hit some confusion on the assembly accession numbers.
I checked both of the following files: 1) /goldenPath/hg38/database/README.txt and 2) /goldenPath/hg38/bigZips/README.txt. These indicate that the directory contains hg38 or GRCh38 with the genome assembly accession of GCA_000001405.2. When I search this on NCBI's website in the assembly database, that accession matches GRCh37.p1.
However, when I dig down into /goldenPath/hg38/bigZips/analysisSet/README.txt to get the actual files I need for indexing sequence aligners, the README indicates the files here contain the GRCh38 accession number GCA_000001405.15 files. This does match the correct annotation in the NCBI assembly database.
I am confused about how the two parent directories for the hg38 data list a different accession number than the analysisSet directory. I am especially concerned because the GRCh37.p1 accession number suggests I might be downloading hg19 data instead of hg38.
Is this just a typo of the accession number in the README files? Can you point me to the correct data for the hg38 Dec. 2013 release that I can use for next-generation sequence analysis?
Thank you very much,
James