pfam errors on GRCh38

12 views
Skip to first unread message

Brendan Veeneman

unread,
Jun 14, 2016, 6:20:52 PM6/14/16
to gen...@soe.ucsc.edu
Hi UCSC genome browser support,

I'm using the "Pfam in UCSC Gene" track that you provide for GRCh38/hg38, but I've encountered what I believe is an erroneous annotation.

As an example, please examine FLI1 :
hg18 : chr11:128,069,199-128,187,521
hg19 : chr11:128,563,811-128,683,162
hg38 : chr11:128,693,770-128,813,267

I believe FLI1 actually has two domains: one pointed domain ("SAM_PNT"), and one ETS domain ("Ets"), as shown in hg18 and hg19.  In hg38 though, they've been duplicated, and most strangely, the new Ets domain lands in the middle of the SAM_PNT domain.  I ran domain prediction on the protein sequence of the new Ets domain, and it's predicted as "SAM_PNT," as expected.

I've previously encountered other baffling domains, but usually I was less familiar with the genes than I am with this one, so I think it isn't isolated to this one case.  I'm guessing there was an error in a liftover step, involving transcript coordinates.  I would ask that you please correct this, not just for this one gene, but for the general case.

Thank you very much for your support!
Brendan


Luvina Guruvadoo

unread,
Jun 22, 2016, 12:20:27 PM6/22/16
to Brendan Veeneman, gen...@soe.ucsc.edu
Hello Branden,

Thank you for bringing this to our attention. One of our engineers looked into this - he says this isn't an error because hmmer does actually find this ETS doman in the SAM_PNT domain, but the reason why we probably shouldn't be displaying it is because the domain E-value is too high to consider it real. He has made changes to our pipeline to be more restrictive. The next release of this table, which should be within the next week or so, will have the one high scoring ETS domain.You can confirm that there are two ETS domains at the EBI site with this link:
https://www.ebi.ac.uk/Tools/hmmer/results/D6138C22-3284-11E6-8C55-5780D26C98AD/score

If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Regards,
Luvina

--
Luvina Guruvadoo
UCSC Genome Browser

http://genome.ucsc.edu




--


Reply all
Reply to author
Forward
0 new messages