It is the mean fragment length instead of mean read length. In RNA-Seq,
you fragment the library into pieces and then sequence them. But you
cannot read the whole fragment, might be only first 75bp. 75bp is the read
length while the fragment might be around 200bp. 200bp is the fragment
length. By knowing this fragment length, RSEM can estimate expression
levels more accurate.
Best,
Bo
First, I just want to check that you have single-end reads. If you have paired-end reads, then you don't need to worry about these parameters; RSEM will learn them automatically (and the learned distribution need not be gaussian).
Assuming you have single-end reads, the --fragment-length-mean and --fragment-length-sd specify the distribution of the fragment lengths for all reads that you are providing as input to RSEM. RSEM doesn't currently understand the concept of multiple libraries, so it will treat your 6 combined libraries as one library. RSEM currently assumes a single gaussian distribution for the fragment length, so it won't be able to model the second peak that you are seeing, but that may not be a problem given the size of that peak. Note that the --fragment-length-sd is not the sd of the means of each library, but is simply the sd of the fragment lengths of all libraries combined.
In the future, we may implement the ability to specify (or automatically learn) a different fragment length distribution for each library.
Hope that helps,
Colin
Thanks
Bill
I would err on the long side because you have that secondary peak.
Best,
Colin