StringTie readLength when using long reads

48 views
Skip to first unread message

Majd Abdulghani

unread,
Sep 20, 2023, 7:00:19 AM9/20/23
to IsoformSwitchAnalyzeR
Hi!

I have matched short- and long-read cDNA-seq data. I used StringTie to assemble the transcriptome and quantify my samples. I used stringtie --mix, which allows me to leverage both datasets to get accurate sequence and structure of transcripts. When I tried to import the data into IsoformSwitchAnalyzeR, I got the following message:

> quant <- importIsoformExpression(
    parentDir = "merged_gtf"
)
Step 1 of 3: Identifying which algorithm was used...
    The quantification algorithm used was: StringTie
Error in importIsoformExpression(parentDir = "merged_gtf") :
  When importing StringTie results the 'readLength' argument must be specified.
 This argument must be set to the number of base pairs sequenced (e.g. if the
 quantified data is 75 bp paired ends 'readLength' should be set to 75.

I'm not sure what read length I'm supposed to use here, given I have a mixture of short and long reads. Should I just use the read length of my short-read dataset (150 bp)?

Thanks,
Majd

Kristoffer Vitting-Seerup

unread,
Sep 21, 2023, 5:00:29 AM9/21/23
to IsoformSwitchAnalyzeR
You will have to use the average read length of all reads mapped - both long and short.

The problem is that StringTie does not actually count the number of reads mapped. Instead, it reports the coverage - and then we use the readLengt to "back-calculate" the estimated number of reads - which is naturally highly in-accurate (especially when you have both long and short reads!).

I'd suggest you quantify the transcripts you have identified via StringTie with a tool that output counts. I'd probably go with Kallisto/Salmon using just the short reads or IsoQuant/Bambo using just the long reads.

Cheers
Kristoffer

Majd Abdulghani

unread,
Sep 22, 2023, 4:00:32 AM9/22/23
to IsoformSwitchAnalyzeR
Wow, that is more complicated than I expected it to be. It's kind of a shame that I have to stick with just one of the technologies for quantification. Oh well. Thank you so much for responding and helping me with this, Kristoffer! 

Majd Abdulghani

unread,
Sep 23, 2023, 6:48:44 AM9/23/23
to IsoformSwitchAnalyzeR
Hi again Kristoffer,

Sorry, can you please explain which part of the analysis will be affected by the wrong read length? Are the isoform estimates going to be inaccurate, or just the number of reads mapped?

Thank you!
Majd

On Thursday, 21 September 2023 at 12:00:29 UTC+3 k.vittin...@gmail.com wrote:
Reply all
Reply to author
Forward
0 new messages