Downsampling chip / control

dario

unread,

Feb 23, 2012, 6:35:36 AM2/23/12

to RSEG Users

Hello,

A question about difference in number of reads between chip sample and
control: Is it wise to downsample the control if its number of reads
is larger (say more than twice) the number of reads in the chip
sample? In other words, should the control and the chip have more or
less the same number of reads for the algorithm to perform better?

Many thanks

Dario

Bob

unread,

Mar 20, 2012, 5:44:02 PM3/20/12

to RSEG Users

Hello the group,

I am exploring RSEQ, and comparing it with the tools that I am
currently using.
The posted question is more or less applies to my case. But I realized
that the RSEQ mailing list is not much active. Is is always like that?

Cheers,
Robert

Song, Qiang

unread,

Mar 21, 2012, 12:47:27 AM3/21/12

to rseg-s...@googlegroups.com

Dear Dario and Robert,

To a certain extent RSEG is able to adapt to sequencing depth

different between the test sample and the control sample. In our model

the read count difference in basal state is not necessarily zero. The

difference in the sequencing depth between the samples should be

mostly captured by the basal state. Therefore RSEG does not do an

"internal normalization" to make the test sample and the control have

the same number of reads.

However the issue you mentioned is indeed important, as we may imagine

if the control sample has far more reads than the test sample and show

large variance, some true signal in the test sample may be

overwhelmed.

In your case, to be safe, I would do the analysis several times with

control samples of different sizes.

Dario, sorry for missing your last email. Robert: we usually respond

to emails quickly. Feel free to contact us if you have any other

questions.

Regards,

Song Qiang

dario

unread,

Mar 21, 2012, 4:38:43 AM3/21/12

to RSEG Users

Ok, thanks for reply Qiang. I'll keep this in mind. I was asking
because MACS (which anyway uses a different approach) produces rather
unreliable FDRs if control and sample are very different in size.

All the best

Dario

On Mar 21, 4:47 am, "Song, Qiang" <qiang.s...@usc.edu> wrote:
> Dear Dario and Robert,
>
> To a certain extent RSEG is able to adapt to sequencing depth
> different between the test sample and the control sample. In our model
> the read count difference in basal state is not necessarily zero. The
> difference in the sequencing depth between the samples should be
> mostly captured by the basal state. Therefore RSEG does not do an
> "internal normalization" to make the test sample and the control have
> the same number of reads.
>
> However the issue you mentioned is indeed important, as we may imagine
> if the control sample has far more reads than the test sample and show
> large variance, some true signal in the test sample may be
> overwhelmed.
>
> In your case, to be safe, I would do the analysis several times with
> control samples of different sizes.
>
> Dario, sorry for missing your last email. Robert: we usually respond
> to emails quickly. Feel free to contact us if you have any other
> questions.
>
> Regards,
> Song Qiang
>

Robert Faryabi

unread,

Mar 21, 2012, 8:47:51 AM3/21/12

to rseg-s...@googlegroups.com

Thanks Qiang,

I couldn't see how RSEG mitigates this problem. The difference between the size of two libraries affects the parameter estimation for the NBDiff distribution. Larger library size, means larger success number, which in turn would skew the mean toward the larger library. Don't you think that would skew the distribution of NBDiff?

Thanks,

Robert

Song, Qiang

unread,

Mar 22, 2012, 1:39:44 AM3/22/12

to rseg-s...@googlegroups.com

Hi Robert,

You are right that library size affects the parameter estimation, but it affects both

the NBDiff distribution in the foreground and the basal state in the same direction,

i.e., with larger control library size, both the basal state mean and the foreground

mean are shifted toward left. Therefore when we try to determine whether a given

observation belongs to the basal state or the foreground, that shift is accounted for

the non-zero mean in basal state.

As I explained in last email, what concerns is the possible increase of variance

in larger control samples, which leads to larger variance for the basal state NBDiff,

which may cause false negatives. RSEG currently does not consider this issue.

This will require a better understanding of the relationship between the mean and variance

in the control sample. Therefore I proposed exploring this effect empirically by multiple runs

with different control library size.

I hope this may help better understand our model and its limitation.

Best,

Song Qiang

Reply all

Reply to author

Forward