hg19 contigs dropped in hg38

166 views
Skip to first unread message

C T

unread,
Mar 8, 2021, 12:20:36 PM3/8/21
to gen...@soe.ucsc.edu
Hello,

I saw that some regions in hg19 are dropped in hg38 assembly. I'm curious regarding the reason why these regions are dropped in the newer assembly. In some of the post, it was mentioned that the sequences in hg19 were obsolete but I'm not sure exactly what does that mean. I don't really understand how those regions can no longer exist in hg38. I understand hg19 is a single representation of multiple genome. Are they now in the alternative sequences?

Thank you in advance for your clarification.

Gerardo Perez

unread,
Mar 12, 2021, 4:28:43 PM3/12/21
to C T, genome

Hello,

Thank you for your interest in the UCSC Genome Browser and your question about hg19/hg38 regions.

The updated reference genome hg38 contains many improvements over the hg19 assembly, some of which may include contigs that were merged together or placed where there were previously sequence gaps. These updates can be viewed by turning on the assembly track, the 'hg19 diff' (or hg38 diff) track, the GRC Patches, and the GRC Incident tracks. For example, you can load the following session to view the region chr10:45,234,036-49,262,210 on hg38:

http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=jnavarr5&hgS_otherUserSessionName=hg38.MLQ.21338

This area is full of changes and updates to the sequence. If you have any specific regions of interest, we may be able to offer additional information as to why they have changed.

The Genome Reference Consortium (GRC) publishes the assembly, including the updates and corrections. For example, the latest hg38 patch was GRCh38.p13, and the following blog post covers details on this patch release:
https://genomeref.blogspot.com/2019/03/grch38p13-has-been-released.html

Once a certain number of fixed sequences accumulate, the GRC would release a new and more accurate reference genome.

If you want to read more about the differences in the assemblies, there are plenty of thorough explanations online. Here is one from NCBI: https://www.ncbi.nlm.nih.gov/assembly/basics/

I hope this is helpful. Please include gen...@soe.ucsc.edu in any replies to ensure visibility by the team. All messages sent to that address are archived on our public forum. If your question includes sensitive information, you may send it instead to genom...@soe.ucsc.edu.

Gerardo Perez
UCSC Genomics Institute


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAMt%3DZPmT2Ke2-e4AkUWS%2B2RVjveALrVEsWuLdP4RnYVNiMCAYQ%40mail.gmail.com.

C T

unread,
Mar 15, 2021, 1:55:32 PM3/15/21
to Gerardo Perez, genome
  Hi Gerardo,

Thank you for your reply.
I am looking specifically at these two regions for EWS/FLI translocation in hg19: chr11:128,567,502-128,767,502 and chr22:29,584,301-29,784,301.
Looking at the genome browser, it seems that only the chr22 region has dropped regions annotation in hg38 and chr11 region does not have differences in hg38. Is that correct? Also, what does these regions annotation mean exactly?


Thank you,
Cenny

Gerardo Perez

unread,
Mar 18, 2021, 8:18:22 PM3/18/21
to C T, genome

Hello Cenny,

Thank you for your interest in the UCSC Genome Browser and your follow-up question about the hg19/hg38 regions.

Yes, part of the hg19 chr22 EWSR1 gene was dropped in the hg38 assembly. In the image below or session, under the label “GRC Incident Database”, we can see an annotation meaning that there has been a reported and resolved assembly problem, HG-346, in the region of the EWSR1 gene. Under the label “Contigs Dropped or Changed from GRCh37(hg19) to GRCh38(hg38)”, we can see a brown annotation in the region of the EWSR1 gene, which means different portions of this same contig used in the construction of hg38 and hg19 assemblies.

hg19_contigs_hg38_MLQ.jpg

http://genome.ucsc.edu/s/gperez2/hg19_contigs_hg38_MLQ

We don’t see annotations for the GRC Incident track or the Hg38 Diff track in the hg19 chr11 FLI gene. Yes, the FLI gene should not have differences in the hg38 assembly.

I hope this is helpful. Please include gen...@soe.ucsc.edu in any replies to ensure visibility by the team. All messages sent to that address are archived on our public forum. If your question includes sensitive information, you may send it instead to genom...@soe.ucsc.edu.

Gerardo Perez
UCSC Genomics Institute

C T

unread,
Mar 23, 2021, 12:20:55 PM3/23/21
to Gerardo Perez, genome
Hi Gerardo,

Thank you again for clarifying the EWSR1 that was dropped in hg38 assembly. I have a follow up/related question. What tools were used to assess the difference between hg19 and hg38? I have a draft cancer human genome that I would like to compare to hg19 and am wondering whether I can use the same tools to assess the differences.

Thank you!
Cenny 

Hiram Clawson

unread,
Mar 23, 2021, 12:29:49 PM3/23/21
to C T, Gerardo Perez, genome
Good Morning Cenny:

The process to compare two human genome we call 'Same Species' liftOver procedure.
It is a series of 'blat' processes followed by 'chain/net' processing to
determine similar sequences between the two genomes.

The process is performed by a series of scripts on our cluster process
management system:
http://genomewiki.ucsc.edu/index.php/DoSameSpeciesLiftOver.pl

You would need to adjust this procedure depending upon the type
of computer cluster you have available. Note the final sentence
on that wiki page:
"The script can be used in a -debug mode where it does nothing but
construct the required shell scripts."

Use this -debug mode to get the shell scripts established, then work through
each one in turn, substituting your cluster process when it comes to that
point.

--Hiram

On 3/23/21 9:08 AM, C T wrote:
> Hi Gerardo,
>
> Thank you again for clarifying the EWSR1 that was dropped in hg38 assembly.
> I have a follow up/related question. What tools were used to assess the
> difference between hg19 and hg38? I have a draft cancer human genome that I
> would like to compare to hg19 and am wondering whether I can use the same
> tools to assess the differences.
>
> Thank you!
> Cenny
>
> On Thu, Mar 18, 2021 at 8:18 PM Gerardo Perez <gpe...@ucsc.edu> wrote:
>
>> Hello Cenny,
>>
>> Thank you for your interest in the UCSC Genome Browser and your follow-up
>> question about the hg19/hg38 regions.
>>
>> Yes, part of the hg19 chr22 EWSR1 gene was dropped in the hg38 assembly.
>> In the image below or session
>> <http://genome.ucsc.edu/s/gperez2/hg19_contigs_hg38_MLQ>, under the label
>>>> The Genome Reference Consortium (GRC) <https://www.ncbi.nlm.nih.gov/grc>
>>>>> <https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAMt%3DZPmT2Ke2-e4AkUWS%2B2RVjveALrVEsWuLdP4RnYVNiMCAYQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>
>
Reply all
Reply to author
Forward
0 new messages