Heya,
I used lr-kallisto (v 0.52.0) for pseudoalignment and quantification of long reads (bulk RNA-seq) and am planning on performing differential expression analysis using DESeq2.
I have used kallisto with short read data in the past and there I got abundance.h5 which i could import into R using tximport. Since this file does not exist for the long-read approach, I am wondering whether I understood correctly how to proceed.
It is my understanding that I will take the .mtx file, skipping the first two lines, and combine this with the transcripts.txt file. The order of the transcripts corresponds to the number in column 2 of the mtx file. So if in one line it says 1 23 1.41, I will ignore the 1 (because I have bulk sequencing and therefore no cell barcodes) and the estimated abundance of 1.41 will be assigned to the transcript in line 23 of transcripts.txt, yes?
I am wondering why there is no merged file as everything done manually is more prone to error, so I am not sure whether I really understood it correctly. I took a look at the notebook, but since it's all python and I am working with a combination of bash and R, I am not entirely sure whether I understood that correctly.
Any help would be greatly appreciated!
apptainer exec \
lr-kallisto.sif \
kallisto index -k 63 -i "${CDS}.idx" "${CDS}"
apptainer exec \
lr-kallisto.sif \
kallisto bus -t 32 --long --threshold 0.7 -x bulk \
-i "${CDS}.idx" -o ${output} "${reads}/${1}.flnc.fastq.gz"
bustools sort -t 32 ${output}/output.bus \
-o ${output}/sorted.bus; \
bustools count ${output}/sorted.bus \
-t ${output}/transcripts.txt \
-e ${output}/
matrix.ec \
-o ${output}/count --cm -m \
-g ${CDS}.t2g;
apptainer exec \
lr-kallisto.sif \
kallisto quant-tcc -t 32 \
--long -P ONT -f ${output}/flens.txt \
${output}/count.mtx -i "${CDS}.idx" \
-e ${output}/count.ec.txt \
-o ${output};
The output is the following:
total 14M
-rw-r--r--. 1 17 Jun 18 21:08 count.barcodes.txt
-rw-r--r--. 1 162K Jun 18 21:08 count.ec.txt
-rw-r--r--. 1 123K Jun 18 21:08 count.mtx
-rw-r--r--. 1 104K Jun 18 13:45 flens.txt
-rw-r--r--. 1 473K Jun 18 13:45 index.saved
-rw-r--r--. 1 119K Jun 18 21:16 matrix.abundance.mtx
-rw-r--r--. 1 169K Jun 18 21:16 matrix.abundance.tpm.mtx
-rw-r--r--. 1 7 Jun 18 13:45 matrix.cells
-rw-r--r--. 1 162K Jun 18 13:45
matrix.ec-rw-r--r--. 1 104K Jun 18 21:16 matrix.efflens.mtx
-rw-r--r--. 1 6 Jun 18 21:16 matrix.fld.tsv
-rw-r--r--. 1 17 Jun 18 13:45 matrix.sample.barcodes
-rw-r--r--. 1 178 Jun 18 13:45 novel.fastq
-rw-r--r--. 1 11M Jun 18 13:45 output.bus
-rw-r--r--. 1 687 Jun 18 13:45 run_info.json
-rw-r--r--. 1 410K Jun 18 21:08 sorted.bus
-rw-r--r--. 1 322K Jun 18 21:16 transcript_lengths.txt
-rw-r--r--. 1 219K Jun 18 21:16 transcripts.txt