Enable reads extension, how the extension is made?

109 views
Skip to first unread message

Winston Dallas

unread,
Dec 6, 2012, 1:20:29 AM12/6/12
to seqm...@googlegroups.com
Dear Tao,

Since I work on a very high resolution (my reference peak is only 1000bp) - I must know exactly how the read extension is made.
I have made several attempts in which I've used selected reference genome (only few genes of the mouse mm9) and control Tag file that I made based on the coordinates of the TSS basically, with slight changes.
Eventually I've ran to the conclusion that the read extension is done in the following manner:

Tools->Options->enable reads extension size: Default=200. (we have checked positive for this option "V")

Now, we loading aligned reads file with this format: "Chrom", "Start", "End" columns.
based on what I concluded: seqMINER will now calculate the middle point between the
"Start" and "End" and will place it on the Heatmap in respect to the middle of the reference peak.
(so in each case the middle point is used - the middle point of the reference peak file, and the middle point of the tag file)

Next, from this calculated position the extension (by default=200) will be performed to both directions evenly, using half of the extension per each direction (upstream and downstream):

For example: if the middle of our reference peak is 1000 (the reference peak is "Start" is 500 and its "End" is 1500)
and for example for the Tag file (bed file format) we have a tag with these corresponding to this:
chr1, Start=1100, End=1200

So - in that case the middle point for the Tag is: (1200-1100)/2=1150. Therefore, this middle point is located at +150 compare to the middle of the reference peak (=1000).
Now, if the read extension was set like the default (=200), then in this case the resultant bin that will appear on the Heatmap will be marked on between +50 to +250. 
(+150 - 100 = +50  and  +150 + 100=+250)

I would like to check with you this logic and to confirm that indeed this is the case.
The read extension seems to be extended evenly to both directions (half of the extension value per each side) from the middle of the tag,
and NOT simply extended from the Start coordinate of the Tag downstream.
Is this correct?

Thanks a lot for you attention and help!

Dali
 

Tao

unread,
Aug 22, 2013, 11:39:58 AM8/22/13
to seqm...@googlegroups.com
Dear Dali,

Sorry for the late response, I hope this can still help you.

I didn't try to find the middle of the read because the fragment means nothing. I extended the read to try to represent a real read. So the reads are extended from the start site to 200bp by default if it's a "+" strand read. for the "-" strand reads, they are extended from the end site and go back to 200bp upstream.
best,
Tao

在 2012年12月6日星期四UTC+1上午7时20分29秒,Winston Dallas写道:

Julian Rozenberg

unread,
Jul 3, 2014, 12:50:50 PM7/3/14
to seqm...@googlegroups.com
Dear Tao,

Therefore, is it possible to artificially use lower read length?
For example, my reads are 100 bp and I am interested in the distribution of the ends densities.

My concern is that when I change this parameter I do not see changes in generated distributions and I suspect a bug.

Thank you very much for a beautiful program.

Julian Rozenberg
Reply all
Reply to author
Forward
0 new messages