Use of long-read kallisto output

6 views
Skip to first unread message

Annika Müller

unread,
Jun 18, 2026, 4:03:41 PM (9 days ago) Jun 18
to kallisto and applications
Heya,

I used lr-kallisto (v 0.52.0) for pseudoalignment and quantification of long reads (bulk RNA-seq) and am planning on performing differential expression analysis using DESeq2. 
I have used kallisto with short read data in the past and there I got abundance.h5 which i could import into R using tximport. Since this file does not exist for the long-read approach, I am wondering whether I understood correctly how to proceed. 

It is my understanding that I will take the .mtx file, skipping the first two lines, and combine this with the transcripts.txt file. The order of the transcripts corresponds to the number in column 2 of the mtx file. So if in one line it says 1 23 1.41, I will ignore the 1 (because I have bulk sequencing and therefore no cell barcodes) and the estimated abundance of 1.41 will be assigned to the transcript in line 23 of transcripts.txt, yes? 

I am wondering why there is no merged file as everything done manually is more prone to error, so I am not sure whether I really understood it correctly. I took a look at the notebook, but since it's all python and I am working with a combination of bash and R, I am not entirely sure whether I understood that correctly. 
Any help would be greatly appreciated!

apptainer exec \
                lr-kallisto.sif \
                kallisto index -k 63 -i "${CDS}.idx" "${CDS}"

apptainer exec \
        lr-kallisto.sif \
        kallisto bus -t 32 --long --threshold 0.7 -x bulk \
        -i "${CDS}.idx" -o ${output} "${reads}/${1}.flnc.fastq.gz"

bustools sort -t 32 ${output}/output.bus \
        -o ${output}/sorted.bus; \
        bustools count ${output}/sorted.bus \
        -t ${output}/transcripts.txt \
        -e ${output}/matrix.ec \
        -o ${output}/count --cm -m \
        -g ${CDS}.t2g;

apptainer exec \
        lr-kallisto.sif \
        kallisto quant-tcc -t 32 \
        --long -P ONT -f ${output}/flens.txt \
        ${output}/count.mtx -i "${CDS}.idx" \
        -e ${output}/count.ec.txt \
        -o ${output};

The output is the following: 
total 14M
-rw-r--r--. 1 17 Jun 18 21:08 count.barcodes.txt
-rw-r--r--. 1 162K Jun 18 21:08 count.ec.txt
-rw-r--r--. 1 123K Jun 18 21:08 count.mtx
-rw-r--r--. 1 104K Jun 18 13:45 flens.txt
-rw-r--r--. 1 473K Jun 18 13:45 index.saved
-rw-r--r--. 1 119K Jun 18 21:16 matrix.abundance.mtx
-rw-r--r--. 1 169K Jun 18 21:16 matrix.abundance.tpm.mtx
-rw-r--r--. 1   7 Jun 18 13:45 matrix.cells
-rw-r--r--. 1 162K Jun 18 13:45 matrix.ec
-rw-r--r--. 1 104K Jun 18 21:16 matrix.efflens.mtx
-rw-r--r--. 1    6 Jun 18 21:16 matrix.fld.tsv
-rw-r--r--. 1   17 Jun 18 13:45 matrix.sample.barcodes
-rw-r--r--. 1  178 Jun 18 13:45 novel.fastq
-rw-r--r--. 1  11M Jun 18 13:45 output.bus
-rw-r--r--. 1  687 Jun 18 13:45 run_info.json
-rw-r--r--. 1 410K Jun 18 21:08 sorted.bus
-rw-r--r--. 1 322K Jun 18 21:16 transcript_lengths.txt
-rw-r--r--. 1 219K Jun 18 21:16 transcripts.txt
Reply all
Reply to author
Forward
0 new messages