Pfam domains raw data input fruitfly

14 views
Skip to first unread message

Daniel Gonzalez

unread,
Sep 15, 2022, 12:09:39 PM9/15/22
to gen...@soe.ucsc.edu
Hello,

I am interested in how the Pfam domains track of fruitfly has been generated. In the Help pages, you state:

"The proteins associated with the transcripts in the refGene table (see RefSeq Genes description page) are submitted to the set of Pfam-A HMMs which annotate regions within the predicted peptide that are recognizable as Pfam protein domains. These regions are then mapped to the transcripts themselves using the pslMap utility."

I would like to get the raw data of Pfam domain identification, before they are mapped with pslmap. I am browsing the discussion forums and I don't find related information, and in the ftp I have only found the sql commands and tables with the Pfam already mapped on the genome. Thank you.

Best regards,

Daniel.




Gerardo Perez

unread,
Sep 20, 2022, 8:23:51 PM9/20/22
to Daniel Gonzalez, gen...@soe.ucsc.edu

Hello, Daniel.

Thank you for your interest in the Genome Browser and your question about raw data of Pfam domain identification.

We have a file that has the mapping of domains to the dm6 RefSeq proteins:
https://hgwdev-gperez2.gi.ucsc.edu/~gperez2/mlq/mlq_30019/data/dm6/bed/pfam/ucscPfam.tab

If you were looking for something different, then we would encourage you to reach out to the PFAM helpdesk (pfam...@ebi.ac.uk) and CC us (genom...@soe.ucsc.edu). We annotate the genome, not the proteins.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Gerardo Perez
UCSC Genomics Institute


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CADd%2Be27uKmumsqb-TW6CeoMyvz3sgJrq_BLEwpXLEv%2BO99%3DhzA%40mail.gmail.com.

Daniel Gonzalez

unread,
Oct 3, 2022, 2:33:59 PM10/3/22
to Gerardo Perez, gen...@soe.ucsc.edu
Hi Gerardo,

Thank you for this information. I made this question because I have found different results between UCSC genes, and results reported by NCBI blast or InterproScan. I assumed this was because a different version of Pfam used by UCSC and other tools, but after the file you provided, I have also found rare results for genes where Pfam domains have been identified (in your file), but they are not reported on the UCSC browser, so I am wondering if this relates to pslmap filtering. I post here some examples:

example 1:
transcript:FBtr0342783 (RefSeq: NM_166413.3), protein NP_726006 has two Pfam domains (PF00046 and PF03826) identified by InterproScan, NCBI, and also in the file you provided, but they are not reported on the UCSC browser.
image.png


example 2:
transcript:FBtr0110966 (RefSeq: NM_001043090.3), protein NP_001036555 has one Pfam (PF00447) identified by InterproScan, NCBI, and also in the file you provided, but two Pfam domains are reported in the UCSC browser.
image.png

There are several genes like these ones. Do you know what might be happening?. Is this a bug?. Thank you.

Best regards,

Daniel.




Luis Nassar

unread,
Oct 14, 2022, 7:50:28 PM10/14/22
to Daniel Gonzalez, Gerardo Perez, gen...@soe.ucsc.edu
Hello, Daniel.

Thank you for sharing those examples.

The first example, NM_166413.3, looks to be due to a bug in how we generate the alignments for this track. We have regenerated the track and it should now contain the two expected domains.

As far as the second example, that is an expected result due to how the track is generated. The pipeline calculates the domains for all the transcripts and then creates a final unique list which is the Pfam domains track. As you say, NP_001036555 only has a single domain, but the isoform NM_057227.5/NP_476575.1 results in the two domains seen on the pfam track can be verified on InterproScan.

I hope this is helpful. Please include gen...@soe.ucsc.edu in any replies to ensure visibility by the team. All messages sent to that address are archived on our public forum. If your question includes sensitive information, you may send it instead to genom...@soe.ucsc.edu.

Lou Nassar
UCSC Genomics Institute

Daniel Gonzalez

unread,
Oct 19, 2022, 2:46:22 PM10/19/22
to Luis Nassar, Gerardo Perez, gen...@soe.ucsc.edu
Hi Lou,

Thank you very much. I was managing several genes and I was wondering what happened. This is very helpful.

Best regards,

Daniel.


Reply all
Reply to author
Forward
0 new messages