Summary table for Repeatmasker output?

237 views
Skip to first unread message

Jeremy Johnson

unread,
Aug 7, 2017, 3:40:16 PM8/7/17
to gen...@soe.ucsc.edu
Hi,

I'm interested ina summary table from the RepeatMasker run done on the elephant genome (rmsk.txt.gz file located here:  hgdownload.soe.ucsc.edu/goldenPath/loxAfr3/database/rmsk.txt.gz)

I believe this file summarizes the % of sequence found by RepeatMasker specific to the different classes of repeats (LINE/SINE/LTR etc.)

I realize I could script something that derives that data out of the rmsk.txt.gz file, I just thought I'd ask you first instead of reinventing the wheel.

Thanks!

~Jeremy

Jairo Navarro Gonzalez

unread,
Aug 8, 2017, 6:59:51 PM8/8/17
to Jeremy Johnson, UCSC Genome Browser Mailing List

Hello Jeremy,

Thank you for using the UCSC Genome Browser and your question about obtaining a summary table for the RepeatMasker track.

You can get summary information from the RepeatMasker track by using the Table Browser.

Step 1: Configure the Table Browser

clade: mammal
genome: Elephant
assembly: Jul. 2009 (Broad/loxAfr3)
group: Variation and Repeats
track: RepeatMasker
table: rmsk
region: genome

Step 2: Click summary/statistics

Clicking the summary/statistics button will redirect you to a new page where you can view information such as the item count or how much of the genome is covered by these annotations. For example, the RepeatMasker track contains 5,770,660 annotations which covers about 47.68% of the elephant genome excluding gaps.

Step 3: Create a filter

You can create a filter to give the statistics for the each of the different classes of repeats.
To create a filter, on the main Table Browser page next to the filter: setting, click create.

Once redirected to a new page, where you can filter the results by a particular field.

For example, you can limit the statistics to only the SINE class of repeats by changing "repClass does match *" to

repClass does match SINE

and then click submit.

Step 4: Click summary/statistics again

Once you are back on the main Table Browser page, click the summary/statistics button once again. This time, only statistics for items that passed through the filter will be displayed. For example, the SINE repeat class has 1,962,142 annotations which covers about 8.67% of the elephant genome excluding gaps.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly-accessible Google Groups forum.
If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Jairo Navarro 
UCSC Genomics Institute


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAB6Mi-SdfGRYvWm8j8fhw8F-P9%2BvZZ%2BxMAN4_mbGsEw5JSaerg%40mail.gmail.com.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.

Jeremy Johnson

unread,
Aug 9, 2017, 12:16:27 PM8/9/17
to Jairo Navarro Gonzalez, UCSC Genome Browser Mailing List
Hi Jairo,

This is perfect, thanks so much. I'm trying it this afternoon, I'll be in touch if I run into trouble (which I doubt I will, because of the details of your email.)

~Jeremy
Reply all
Reply to author
Forward
0 new messages