using sliced fastq for quantification

Skip to first unread message

Maurizio Podda

May 8, 2023, 5:24:01 AM5/8/23
to kallisto and applications

Dear all,

I have a question regarding the quantification of gene expression using fastq files. Specifically, I have fastq files containing reads for a single gene and its transcripts, and I have noticed that the estimated count (est_count) changes when using these files compared to using the fastq files for the entire transcriptome.

I was wondering if it is possible to accurately calculate the est_count and transcripts per million (tpm) using a reverse kallisto formula or method that takes into account the library size (that i have), starting from the fastq files containing only the reads for a single gene. Like it was a local assembly and then count the reads for each transcript of the gene (without each transcipt of the original fastq).

Any help or advice would be greatly appreciated. Thank you in advance for your time and expertise.

Best regards,
M. Podda

Pall Melsted

May 9, 2023, 6:34:52 AM5/9/23
to kallisto and applications
There are two things that will affect the est_counts and tpm with your sliced dataset.

First, we use the first 10K cleanly mapped reads to estimate the insert size of the library. I would recommend supplying it directly using the -s and -l parameters, you can use the numbers reported when you process the full dataset. 

Second, when you process the sliced reads are you using the same transcriptome as the index or only the single gene? If you do use the full index (which I recommend) then you can essentially use the TPM to compare the distribution of the transcripts within the same gene and scale between the full and sliced library.

- Pall

You received this message because you are subscribed to the Google Groups "kallisto and applications" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
To view this discussion on the web visit
Reply all
Reply to author
0 new messages