Interpretation of file wgEncodeCrgMapabilityAlign100mer.bw

77 views
Skip to first unread message

陈大洋

unread,
Jul 5, 2018, 3:41:08 PM7/5/18
to gen...@soe.ucsc.edu
Dear editor,
I am a student coming from China. My name is Max Chen. I am confused about the interpretation of this document wgEncodeCrgMapabilityAlign100mer.bw in this website( https://genome.ucsc.edu/cgi-bin/hgTables).  I don't know what the four columns mean, especially the third column.  I read the references and explanations, but still don't quite understand.  I hope you have some time to explain it.  thank you in advance.
Good luck.
Yours Sincerely.

Christopher Lee

unread,
Jul 6, 2018, 1:39:30 PM7/6/18
to 陈大洋, UCSC Genome Browser Discussion List

Hello Max,

Thank you for your question about the ENCODE CRG Mappability track. Did you get output from the track for a specific region like the below example and are confused as to the output?

track type=wiggle_0 name="CRG Align 100" description="Alignability of 100mers by GEM from ENCODE/CRG(Guigo)" 
#bedGraph section chr21:32976481-33136142
chr21    33029853    33033699    1
chr21    33033699    33033700    0.5
chr21    33033700    33033701    0.166667

If so, the columns follow the bedGraph format, which is explained here:
https://genome.ucsc.edu/goldenpath/help/bedgraph.html

Briefly, the columns indicate the chromosome, start and end coordinates, and then a value over that genomic range. For more information on what the value means in this case, you can read the Alignability sub-section under the Methods section on the track description page:
https://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeMapability

If you're question is about something different please let us know and we'd be happy to offer further explanation!

Thanks,

Christopher Lee
UCSC Genomics Institute

Want to share the Browser with colleagues?
Host a workshop: http://bit.ly/ucscTraining

Thank you again for your inquiry and using the UCSC Genome Browser. If
you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a
publicly-accessible forum. If your question includes sensitive data,
you may send it instead to genom...@soe.ucsc.edu.



--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/7d2d6220.131cf.16465b8bb69.Coremail.dayang0307%40163.com.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.

陈大洋

unread,
Jul 10, 2018, 10:07:57 AM7/10/18
to Christopher Lee, UCSC Genome Browser Discussion List
Hello Christopher,
  

Thank for your response. I took a closer look at the two links below. I wrote down my explanation.  I hope you can check if it is correct.


For the window:

chr21 33029853 33033699 1

K-mer number:   33033699- 33029853+1= 3847

It means that 3847 k-mers found at the window will uniquely align within the whole genome.


For the window:

chr21 33033699 33033700 0.5

K-mer number: 33033700 – 33033699+1=2

It means that 2 k-mers found at the window will align 2 (1/0.5) regions within the whole genome.


looking forward to your response.

Good luck.

Yours Sincerely.

Jairo Navarro Gonzalez

unread,
Jul 10, 2018, 6:15:18 PM7/10/18
to 陈大洋, UCSC Genome Browser Discussion List

Hello Max,

Thank you for using the UCSC Genome Browser and sending your follow-up question.

It is not necessary to add a base to the start coordinate as bedGraph tracks are half-open coordinates. If you want to learn more about our coordinates system, you can read the following blog post: The UCSC Genome Browser Coordinate Counting Systems.

For the window:

chr21 33029853 33033699 1

K-mer number: 33,033,699 - 33,029,853= 3,846

This means that each base in this range is the beginning of a 100-mer, and if you align each of those 3,846 100-mers to the genome, each of them only maps to a single location. Hence, the entire range gets a score of 1/1, or 1. The bedGraph could instead be written as:

chr21 33029853 33029854 1
chr21 33029854 33029855 1
chr21 33029855 33029856 1
chr21 33029856 33029857 1
...
...
chr21 33033698 33033699 1

But since the 4th column is the same, it can be compressed to a single range.

For this other 100-mer:

chr21 33033699 33033700 0.5

If you take the 100-mer starting in this position and align it to the genome, it will align to two locations, and thus gets a score of 1/2, or 0.5.

If you would like to know more about how this data is generated, you should contact the data provider, Paolo Ribeca:

paolo....@gmail.com

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly-accessible Google Groups forum.


If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Jairo Navarro 
UCSC Genome Browser

Want to share the Browser with colleagues?
Host a workshop: http://bit.ly/ucscTraining

陈大洋

unread,
Jul 11, 2018, 10:53:14 AM7/11/18
to Jairo Navarro Gonzalez, UCSC Genome Browser Discussion List
Hello Jairo,
Your explanation is the same as my expectations.  I will use it to correct the genome mappability, then to call CNV. I will finish the work one month later. Thank for your help!!
Good luck.
Yours Sincerely.
Reply all
Reply to author
Forward
0 new messages