Access to repeat elements data

366 views
Skip to first unread message

qanita khalid

unread,
Sep 15, 2017, 11:33:38 AM9/15/17
to gen...@soe.ucsc.edu
Hello
I want to access interspersed repeat elements of human genome assembly 38. Interspersed repeat element like SINE, LINE and Alu's co-ordinates data. But i didn't find anyway to download it. Kindly guide me to access repeat element data.
Please response me as soon as possible.


Regards
Qanta khalid
M.phill Scholar
Comparative Evolutionary and Genomics Lab
National Center for Bioinformatics
Quaid-i-Azam University Islamabad, Pakistan

Matthew Speir

unread,
Sep 15, 2017, 12:15:06 PM9/15/17
to qanita khalid, gen...@soe.ucsc.edu
Hi Qanta,

Thank you for your question about repeat elements in the human genome assembly hg38/GRCh38.

You can find a file containing the genomic coordinates for all of the repeats identified by the RepeatMasker software on our download server: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/rmsk.txt.gz. The track description page contains information about how this track was constructed: http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg38&g=rmsk ; from this page, you can click the "View table schema" to see a description of all of the columns in the track and file.

In case it is helpful, here is an answer to a previous mailing list question that contains instructions on filtering the repeats by their intersection with specific genes: https://groups.google.com/a/soe.ucsc.edu/forum/#!msg/genome/vzUmdis_iFk/cn8RbnedCgAJ.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group
--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CABpUMOMhkbFGmvUuJ06N%3D3jBaMUZhuLP93oNLR_o90rC-AXQRw%40mail.gmail.com.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.

Matthew Speir

unread,
Oct 5, 2017, 5:18:37 PM10/5/17
to qanita khalid, gen...@soe.ucsc.edu
Hi Qanta,

Thank you for your follow-up question about the format of the RepeatMasker (rmsk) table in the UCSC Genome Browser.

The file that I linked includes the location of the repeats in terms of reference genome coordinates. The three columns genoName, genoStart, and genoEnd (columns 6, 7 and 8, respectively) are the location of the repeat on the reference genome.

The columns repStart and repEnd are for recording what portion the identified genomic repeat covers in the Repbase repeat sequence as the genomic repeat may not include the entire repeat sequence identified by Repbase. The coordinates are relative to the repeat sequence from Repbase. The "RepeatMasker Viz" track in the Genome Browser provides a good way to visualize this partial alignment idea for repeats: http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=brianlee&hgS_otherUserSessionName=hg38.spot.


I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group


On 9/27/17 11:21 PM, qanita khalid wrote:
Hi Matthew
I want to access only repeat elements coordinates data of human genome assembly 38. You give me this track for downloading data http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/rmsk.txt.gz. But in this file I'm unable to understand where repeat element start or end. Because this fie having chromosome coordinates along repeat elements coordinate. Kindly can you guide me either repeat elements coordinates fall in those specific chromosome coordinates or not?? just like chromosome 1 start or end coordinate is

chr1         82051251       82051667     -166904755     +     L2a      LINE      L2      2935     3387      0

In the above line 2935 repeat start and other is repeat end. Is there any way to find only repeat elements coordinates according to genome assembly coordinate??
Please reply me as soon as possible

Best Regards
Qanta khalid

On Sun, Sep 17, 2017 at 5:19 PM, qanita khalid <qanitak...@gmail.com> wrote:
Thanks sir

Best Regards
Qanta khalid

qanita khalid

unread,
Nov 14, 2017, 11:42:57 AM11/14/17
to gen...@soe.ucsc.edu

---------- Forwarded message ----------
From: qanita khalid <qanitak...@gmail.com>
Date: Thu, Nov 9, 2017 at 11:11 AM
Subject: Re: [genome] Access to repeat elements data
To: Matthew Speir <msp...@soe.ucsc.edu>


Hi Matthew,
Earlier I was asking about repeat masked elements of hg38 assembly of human genome. And you give me track for downloading the data http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/rmsk.txt.gz. But now can you please tell me about this file of repeat elements belong to which patch of hg38 assembly. Either patch 7, 10 or 11.
Kindly response me as soon as possible.

Best Regards
Qanta khalid

On Sat, Oct 7, 2017 at 4:28 PM, qanita khalid <qanitak...@gmail.com> wrote:
I was just li'l bit confused about this. Thanks for helping me to understand.

Best Regards
Qanta khalid

Brian Lee

unread,
Nov 14, 2017, 1:46:13 PM11/14/17
to qanita khalid, gen...@soe.ucsc.edu
Dear Qanta,

Thank you for using the UCSC Genome Browser and your question about the rmsk.txt.gz file for hg38 available on the downloads page: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database

The track data was generated against the original hg38 assembly. In the future we may release more recent RepeatMasker data for hg38, but the default track that we will display (the rmsk table in question) will be what was used to mask the assembly originally. Subscribing to our genome-announce mailing list (Email genome-announce+subscribe@soe.ucsc.edu) would be one way to learn when we may add the new data.

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UC Santa Cruz Genomics Institute

Training videos & resources: http://genome.ucsc.edu/training/index.html
Want to share the Browser with colleagues?
Host a workshop: http://bit.ly/ucscTraining

Reply all
Reply to author
Forward
0 new messages