Downsampling chip / control

47 views
Skip to first unread message

dario

unread,
Feb 23, 2012, 6:35:36 AM2/23/12
to RSEG Users
Hello,

A question about difference in number of reads between chip sample and
control: Is it wise to downsample the control if its number of reads
is larger (say more than twice) the number of reads in the chip
sample? In other words, should the control and the chip have more or
less the same number of reads for the algorithm to perform better?

Many thanks

Dario

Bob

unread,
Mar 20, 2012, 5:44:02 PM3/20/12
to RSEG Users
Hello the group,

I am exploring RSEQ, and comparing it with the tools that I am
currently using.
The posted question is more or less applies to my case. But I realized
that the RSEQ mailing list is not much active. Is is always like that?

Cheers,
Robert

Song, Qiang

unread,
Mar 21, 2012, 12:47:27 AM3/21/12
to rseg-s...@googlegroups.com
Dear Dario and Robert,

To a certain extent RSEG is able to adapt to sequencing depth
different between the test sample and the control sample. In our model
the read count difference in basal state is not necessarily zero. The
difference in the sequencing depth between the samples should be
mostly captured by the basal state. Therefore RSEG does not do an
"internal normalization" to make the test sample and the control have
the same number of reads. 

However the issue you mentioned is indeed important, as we may imagine
if the control sample has far more reads than the test sample and show
large variance, some true signal in the test sample may be
overwhelmed. 

In your case, to be safe, I would do the analysis several times with
control samples of different sizes. 

Dario, sorry for missing your last email. Robert: we usually respond
to emails quickly. Feel free to contact us if you have any other
questions.

Regards, 
Song Qiang

dario

unread,
Mar 21, 2012, 4:38:43 AM3/21/12
to RSEG Users
Ok, thanks for reply Qiang. I'll keep this in mind. I was asking
because MACS (which anyway uses a different approach) produces rather
unreliable FDRs if control and sample are very different in size.

All the best

Dario

On Mar 21, 4:47 am, "Song, Qiang" <qiang.s...@usc.edu> wrote:
> Dear Dario and Robert,
>
> To a certain extent RSEG is able to adapt to sequencing depth
> different between the test sample and the control sample. In our model
> the read count difference in basal state is not necessarily zero. The
> difference in the sequencing depth between the samples should be
> mostly captured by the basal state. Therefore RSEG does not do an
> "internal normalization" to make the test sample and the control have
> the same number of reads.
>
> However the issue you mentioned is indeed important, as we may imagine
> if the control sample has far more reads than the test sample and show
> large variance, some true signal in the test sample may be
> overwhelmed.
>
> In your case, to be safe, I would do the analysis several times with
> control samples of different sizes.
>
> Dario, sorry for missing your last email. Robert: we usually respond
> to emails quickly. Feel free to contact us if you have any other
> questions.
>
> Regards,
> Song Qiang
>

Robert Faryabi

unread,
Mar 21, 2012, 8:47:51 AM3/21/12
to rseg-s...@googlegroups.com
Thanks Qiang,

I couldn't see how RSEG mitigates this problem. The difference between the size of two libraries affects the parameter estimation for the NBDiff distribution. Larger library size, means larger success number, which in turn would skew the mean toward the larger library. Don't you think that would skew the distribution of NBDiff? 

Thanks,
Robert    

Song, Qiang

unread,
Mar 22, 2012, 1:39:44 AM3/22/12
to rseg-s...@googlegroups.com
Hi Robert,

You are right that library size affects the parameter estimation, but it affects both 
the NBDiff distribution in the foreground and the basal state in the same direction,
i.e., with larger control library size, both the basal state mean and the foreground 
mean are shifted toward left. Therefore when we try to determine whether a given
observation belongs to the basal state or the foreground, that shift is accounted for 
the non-zero mean in basal state. 

As I explained in last email, what concerns is the possible increase of variance
in larger control samples, which leads to larger variance for the basal state NBDiff, 
which may cause false negatives. RSEG currently does not consider this issue. 
This will require a better understanding of the relationship between the mean and variance
in the control sample. Therefore I proposed exploring this effect empirically by multiple runs 
with different control library size. 

I hope this may help better understand our model and its limitation.

Best,
Song Qiang
Reply all
Reply to author
Forward
0 new messages