fragment length

1,245 views
Skip to first unread message

Stephen

unread,
Sep 23, 2011, 8:43:14 AM9/23/11
to RSEM Users
Hi,

I just have a quick query with regard to the option --fragment-length-
mean. It mention for use with single-end data only so is this
referring to the mean read length?

Cheers,

Steve

b...@cs.wisc.edu

unread,
Sep 23, 2011, 12:30:42 PM9/23/11
to rsem-...@googlegroups.com
Hi Steve,

It is the mean fragment length instead of mean read length. In RNA-Seq,
you fragment the library into pieces and then sequence them. But you
cannot read the whole fragment, might be only first 75bp. 75bp is the read
length while the fragment might be around 200bp. 200bp is the fragment
length. By knowing this fragment length, RSEM can estimate expression
levels more accurate.

Best,
Bo

bill

unread,
Oct 21, 2011, 2:01:25 PM10/21/11
to RSEM Users
Bo

I have 6 libs that have a mean fragment length of ~250. The DNA
length is spread over about ~200 bases and the peak is not centered
but somewhat to the shorter end. Also, there is a second peak of
incomplete shearing that is longer in length, smaller in height, yet
broader. This second peak is very likely less in area than the first.
What do I use as the --fragment-length-sd? The sd of the 6 sample
means (which is small, ~7 bases), or use an eyeball estimate for a
peak that is now quite normal, maybe 50 bases? And what relevance does
the second peak have to your model?

Wanting to understand your model better,

Bill

bill

unread,
Oct 21, 2011, 2:03:00 PM10/21/11
to RSEM Users
I meant to say that the peak was "not quite normal".
Bill

Colin Dewey

unread,
Oct 21, 2011, 3:25:15 PM10/21/11
to rsem-...@googlegroups.com
Hi Bill,

First, I just want to check that you have single-end reads. If you have paired-end reads, then you don't need to worry about these parameters; RSEM will learn them automatically (and the learned distribution need not be gaussian).

Assuming you have single-end reads, the --fragment-length-mean and --fragment-length-sd specify the distribution of the fragment lengths for all reads that you are providing as input to RSEM. RSEM doesn't currently understand the concept of multiple libraries, so it will treat your 6 combined libraries as one library. RSEM currently assumes a single gaussian distribution for the fragment length, so it won't be able to model the second peak that you are seeing, but that may not be a problem given the size of that peak. Note that the --fragment-length-sd is not the sd of the means of each library, but is simply the sd of the fragment lengths of all libraries combined.

In the future, we may implement the ability to specify (or automatically learn) a different fragment length distribution for each library.

Hope that helps,
Colin

Spollen, William G.

unread,
Oct 21, 2011, 3:29:44 PM10/21/11
to rsem-...@googlegroups.com
Yes they are single end, 50 base reads. Since I must eyeball the sd for the major peak at 250 for each of the 6 samples, is it better to err on the long side or the short side?

Thanks

Bill

Colin Dewey

unread,
Oct 21, 2011, 3:32:21 PM10/21/11
to rsem-...@googlegroups.com
Hi Bill,

I would err on the long side because you have that secondary peak.

Best,
Colin

Leonardo

unread,
Aug 21, 2014, 5:00:44 PM8/21/14
to rsem-...@googlegroups.com
Hi Colin,
I'm pretty confused about calculating fragment length and sd. First of all a hipotetical example of libraries for RNA-seq:
lib 1 has fragments with 260pb, 250pb, 270pb 
lib 2 has fragments with 260pb, 280pb, 300pb

The mean and standard deviation of lib1 is 260 and 10. For lib2 is 280 and 20pb. Although the sd for all fragment sizes together is 17.88.

It seems from your explanation that fragment-length-mean is simply the mean of each library that was sequenced. But is also f-length-sd a value from each library or from all libraries? Like, I'll have to set: lib1 --f-length-mean 260 -f-length-sd 10 and lib 2 --f-l-mean 280 --f-l-sd 20 OR lib1 --f-length-mean 260 -f-length-sd 17.88 and lib 2 --f-l-mean 280 --f-l-sd 17.88

Could you also explain how RSEM will considerate it in its calculations?
Very thanks!

Bo Li

unread,
Aug 26, 2014, 3:02:29 AM8/26/14
to rsem-...@googlegroups.com
Hi Leonado,

If you run RSEM on two libraries separately. For the first library, you
should use 260, 10 and for the second, use 280 and 20.

Best,
Bo
> --
> RSEM website: http://deweylab.biostat.wisc.edu/rsem/ [1]
> ---
> You received this message because you are subscribed to the Google
> Groups "RSEM Users" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to rsem-users+...@googlegroups.com.
> To post to this group, send email to rsem-...@googlegroups.com.
> Visit this group at http://groups.google.com/group/rsem-users [2].
>
>
> Links:
> ------
> [1] http://deweylab.biostat.wisc.edu/rsem/
> [2] http://groups.google.com/group/rsem-users

Leonardo

unread,
Aug 26, 2014, 10:14:12 AM8/26/14
to rsem-...@googlegroups.com
Thanks Bo!
Reply all
Reply to author
Forward
0 new messages