Hi all,
I'm trying to compare and contrast cellranger and kallisto/bustools. I have 4 replicates that I aligned using:
for sam in samp_list:
print("Processing %s (%s)" % (id2samp[sam], sam))
fq_str = "%s_*_R*.fastq.gz" % sam
print(fq_str)
out_str = "%s/%s" % (run_dir, id2samp[sam])
!kallisto bus -i GRCh38.p13_rna_wMT.idx \
-o $out_str -x 10xv3 -t 4 $(find fastq_path/H3YV2BGXJ -name $fq_str) |sort
I created my own index using the GRCh38.p13 annotations, but I've also used the Ensembl index you can download using the kb package. The percent of reads pseudo-aligned is pretty equivalent regardless of the index (and in fact, it is a little lower for Ensembl) but all are way lower than the reads confidently aligned to the transcriptome by cellranger. I would expect things to be a little lower for kallisto but not this much:
V02_rep1: 16.9% pseudoaligned; 55.8% confidently mapped to the transcriptome by cr
V02_rep2: 51.1% pseudoaligned: 55.7% confidently mapped to the transcriptome by cr (this is what I would expect)
V03_rep1: 37.1% pseudoaligned: 62.3% confidently mapped to the transcriptome by cr
V03_rep2: 37.4% pseudoaligned; 62.4% confidently mapped to the transcriptome by cr
These samples put through 4 channels of the same chip at the same time and were all pooled onto a NextSeq run, and there is nothing in the fastq demux qc metrics to lead me to believe only one sample is good. Any ideas around the stark difference between samples? Tips on trouble shooting greatly appreciated.
best,
-deanna