How DaPars deal with multiple replicates?

Zhixia Xiao

unread,

May 21, 2018, 4:34:30 AM5/21/18

to DaPars

Hi Xia,

Thanks for developing such a useful tools.

I'm writing to ask how that DaPars deal with multiple replicates for estimating long transcript expression and short transcript expression.

From my own understanding, the algorithm is to use a arg min estimates of long and short expressions of a sum of square, please correct me if I'm wrong I'm . If so, are biological replicates of different conditions treated equally in equation? Many thanks~

Bests,

Zhixia

Zhixia Xiao

unread,

May 21, 2018, 5:15:59 AM5/21/18

to DaPars

And for multiple group, like time-series data, is multiple comparison applicable for DaPars? Thanks.

puttyx

unread,

Aug 17, 2018, 7:34:48 PM8/17/18

to DaPars

Hi Zhixia,

I am having a very related question about multiple replicates.

From Eq. 1, it seems that each patient (with two matched normal and tumour samples) is treated independently, then P* obtained after optimization could be different for different patients, then I wonder how could their corresponding w variables be integrated among different patients?

From the description in the Methods, it seems DaPars doesn't take into consideration the different P* values, the downstream calculations after Eq. 1 only depend on those w values.

Zhixia, since the author isn't very active, do you have any comment on my question, please? I tried to find how this is handled in the source code, https://github.com/ZhengXia/dapars/blob/44a3552276c5c12426cd3765d5c71c6e8cf5fe53/src/DaPars_main.py, but the code is too difficult to read.

ZhengXia

unread,

Aug 17, 2018, 7:46:24 PM8/17/18

to alfred...@gmail.com, DaPars

No multiple comparisons yet. But you can put the 2 normal samples in one group and another 2 tumor samples in another group for comparison.

Thanks,

Zheng

--
You received this message because you are subscribed to the Google Groups "DaPars" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dapars+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Zhixia Xiao

unread,

Aug 19, 2018, 10:33:58 PM8/19/18

to DaPars

Hi puttyx,

Sorry for the late reply.

I could not fully understand your question actually, what do you mean by Eq,1 and w value?

The estimation of expression of long and short isoforms were performed independently, as well as the PDUI ( long_exp/(long_exp+short_exp) ) value, then samples' (or patient in your design? sorry I'm not working for human data..) expression of long and short of each group were averaged and used as input for a fisher exact test (2x2 table of groupA_short, groupA_long, groupB_short, groupB_long). Pvalue were adjusted by BH method (if I remember it rightly, might be other adjusting method. This is for multiple p value, and control overall false discovery rate), PDUI for each group were also averaged and outputted as Group_A_Mean_PDUI and Group_B_Mean_PDUI, and were used to calculate and cut off PDUI_Group_diff.

Hope this could help.

Bests,

Zhixia

alfred...@gmail.com

unread,

Aug 19, 2018, 10:50:51 PM8/19/18

to DaPars

Eq. 1 is the equation used for estimate the expression of the short isoform and the position of the proximal cleavages in the Methods section (https://www.nature.com/articles/ncomms6274#methods). The URL to the equation is at (I couldn't post it here directly in Google Groups) https://media.nature.com/full/nature-assets/ncomms/2014/141120/ncomms6274/images/ncomms6274-m1.gif

My question is applicable to mouses too. Let's say you have two mouses (i.e. replicates), then each mouse has two samples, a normal and a tumour samples. That's four samples in total. The above consider two samples at a time (see the 2 above the Sigma sign), so applying this equation to each mouse' two samples would result in two sets of values for the variables to the left-hand side, you then calculate PDUI for each sample with the obtained values. My question is that the P* could be different for the two mouses, then how does DaPars aggregate the PDUI from the two samples, will it still be meaningful to do averaging as the shorter isoforms are actually different for the two samples?

Let me know if that's clearer, if not, I will try to clarify my question even further.

zhixia xiao

unread,

Aug 20, 2018, 7:51:17 AM8/20/18

to alfred...@gmail.com, dap...@googlegroups.com

According to code, for each searching point (P*), the author calculate mean squared error for each sample, then averaged them. Then when selecting P*, these sample averaged mean squared error were compared, and the min value was selected, so that P* was determined. And for a selected P*, expression of long and short for each sample were also calculated and reported in output.

In short, although there is a 2 on Sigma sign, all the samples listed in configure file will be taken into consideration (involved into calculation of averaged mean squared error of each P*, no matter they are replicates or from different conditions). Hope this could help.

Thanks,

Zhixia

zxue....@gmail.com

unread,

Aug 20, 2018, 10:44:35 AM8/20/18

to DaPars

For "According to code, for each searching point (P*), the author calculate mean squared error for each sample", could you please point me to the code line you are referring to at https://github.com/ZhengXia/dapars/blob/master/src/DaPars_main.py please?

zhixia xiao

unread,

Aug 20, 2018, 9:57:08 PM8/20/18

to zxue....@gmail.com, dap...@googlegroups.com

Please refer to function De_Novo_3UTR_Coverage_estimation_Genome_for_TCGA_multiple_samples last loop section (line 421-432), thanks.

puttyx

unread,

Aug 23, 2018, 3:22:11 PM8/23/18

to DaPars

Hi Zhixia,

Thank you! After inspecting the code lines you pointed, what you said makes a lot of sense to me now. But it seems not to be exactly what Eq. 1 (https://media.nature.com/full/nature-assets/ncomms/2014/141120/ncomms6274/images/ncomms6274-m1.gif) in the paper Methods section is suggesting, is it? To wrap up the algorithm for search P*:

mse_for_all_p = []
for each p:
    mse_per_p = []
    for each sample:
        calculate w_L, w_S, mse
        mse_per_p.append(mse)
    mse_per_all_p.append(average(mse_per_p))
p* = argmin(mse_per_all_p)


Then based on p*, get the corresponding w_L, w_S and each sample

I wonder if the matched information is used at all in DaPars when calculating w_L, w_S and P* for each sample because DaPars requires matched samples from two conditions (normal and tumour), right? I understand the samples from different conditions will be compared to calculated DeltaPDUI later, but is it used before calculating PDUI, please?

zhixia xiao

unread,

Aug 23, 2018, 10:50:07 PM8/23/18

to Zhuyi Xue, dap...@googlegroups.com

Hi puttyx,

The group information was not used when estimating P* or isoform expression, after P* and w_L, w_S were determined, the whole table of them were used for filtration (coverage depth, number of samples passed etc.. defined in config file) and fisher exact test (function DaPars_Filtering in the code), and only in this step, the group information was taken into consideration. Thanks.

Reply all

Reply to author

Forward