low pseudo-alignment rate

203 views

Skip to first unread message

Deanna Church

unread,

Aug 3, 2021, 1:37:23 PM8/3/21

to kallisto and applications

Hi all,

I'm trying to compare and contrast cellranger and kallisto/bustools. I have 4 replicates that I aligned using:

for sam in samp_list:

print("Processing %s (%s)" % (id2samp[sam], sam))

fq_str = "%s_*_R*.fastq.gz" % sam

print(fq_str)

out_str = "%s/%s" % (run_dir, id2samp[sam])

!kallisto bus -i GRCh38.p13_rna_wMT.idx \

-o $out_str -x 10xv3 -t 4 $(find fastq_path/H3YV2BGXJ -name $fq_str) |sort

I created my own index using the GRCh38.p13 annotations, but I've also used the Ensembl index you can download using the kb package. The percent of reads pseudo-aligned is pretty equivalent regardless of the index (and in fact, it is a little lower for Ensembl) but all are way lower than the reads confidently aligned to the transcriptome by cellranger. I would expect things to be a little lower for kallisto but not this much:

V02_rep1: 16.9% pseudoaligned; 55.8% confidently mapped to the transcriptome by cr

V02_rep2: 51.1% pseudoaligned: 55.7% confidently mapped to the transcriptome by cr (this is what I would expect)
V03_rep1: 37.1% pseudoaligned: 62.3% confidently mapped to the transcriptome by cr

V03_rep2: 37.4% pseudoaligned; 62.4% confidently mapped to the transcriptome by cr

These samples put through 4 channels of the same chip at the same time and were all pooled onto a NextSeq run, and there is nothing in the fastq demux qc metrics to lead me to believe only one sample is good. Any ideas around the stark difference between samples? Tips on trouble shooting greatly appreciated.

best,

-deanna

Reply all

Reply to author

Forward

0 new messages