Hi team,
Hope you are doing well.
I would like to ask about batch effect correction and then some doubts on voila view commands please.
1. Batch1 has 3 controls and 1 case. Batch2 has 3 controls and 2 cases. I want to graph the pre- and post-correction data like a PCA plot.
Can you please suggest how to go about this and if there is an already available code for it?
I combined this question with the voila view doubt because the samples in the resulting violin plots are distorted. This could mean the batch correction step was inefficient. Here are my commands. Please let me know if I should do something differently. I had an
intercept in my model matrix and removed batch2. There is a group column
indicating 0 for control and 1 for case.
majiq build /proj/env2/gencode.v39.chr_patch_hapl_scaff.annotation.gff3 -c config.ini -o /proj/env2/polyA_M/SJdir/denovo_disable --disable-denovo --disable-denovo-ir --simplify 0.01 -j 9
python moccasin.py /proj/env2/polyA_M/model_matrix.tsv /proj/env2/polyA_M/SJdir/Latest /proj/env2/polyA_M/BatchCorrectedFiles batch_1 1 Group 1
One such example in the violin plot are the LSVs for ENSG00000162298.19:t:65131470-65131596 coordinates. The control samples are spread across the plot, so how to interpret this specific result? Do you think this is due to some batch corrected step error?
2. Last point regarding voila, the LSVs were quantified using heterogen and I mentioned a limit for the min reads.
majiq heterogen -o /pr/env2/polyA_M/Voila_heterogen/Batch_corrected --minreads 50 -n Control Case -grp1 AMP05_8.scf_adjusted.majiq AMP08_8.scf_adjusted.majiq AMP09_8.scf_adjusted.majiq AMP05_9.scf_adjusted.majiq AMP08_9.scf_adjusted.majiq AMP09_9.scf_adjusted.majiq -grp2 3690.scf_adjusted.majiq 3690_1.scf_adjusted.majiq 3690_2.scf_adjusted.majiq
Since I do not have replicates for my sample, my config file looks like this (first three are case):
[info]
bamdirs=/proj/env2/polyA_M/BAMdir
sjdirs=/proj/env2/polyA_M/SJdir/denovo_disable
genome=hg38
strandness=reverse
[experiments]
3690_8=3690
3690_1_9=3690_1
3690_2_9=3690_2
AMP05_8=AMP05_8
AMP08_8=AMP08_8
.
.
.
Not sure if I am right but shouldn't the --min reads parameter remove the junction reads within that threshold for all samples? why do we see that red LSV there with 1 read?
Please let me know, thanks!
Kind regards,
Swethaa