No overlap between annotation and quantification

46 views
Skip to first unread message

Caleb

unread,
Apr 11, 2023, 4:31:25 PM4/11/23
to IsoformSwitchAnalyzeR
I am trying to broadly follow the protocol in this paper i.e. HISAT2 -> Stringtie -> TACO -> Kallisto. After I run Kallisto on the FASTQ files I import them into Isoform Switch AnalyzeR using importIsoformExpression. Then when I try ImportRData I get an error saying that there are no transcripts that overlap between the annotation and the quantification. For the isoformExonAnnotation I am providing the assembly.gtf that I get following TACO merge. I am using ensembl data, and made sure to build my HISAT index with the .chr_patch_hapl_scaff.gtf file. 

Here is my R code if it helps:

EFTUD2quant = importIsoformExpression(parentDir = "C:/Users/Caleb/BioinfoData/IsoformSwitch/ENCODE_EFTUD2_deNovo",
                                      addIsofomIdAsColumn = TRUE,
                                      showProgress = TRUE) #reads the t_data.ctab file in each sub directory
names(EFTUD2quant)
tail(EFTUD2quant$abundance)
tail(EFTUD2quant$counts)

#Generate the list of isoform switches
myDesign = data.frame(sampleID = c("SRR4421357","SRR4421358","SRR4422087","SRR4422088"),
                      condition = c("KD","KD","WT","WT"))
SwitchList = importRdata(isoformCountMatrix = EFTUD2quant$counts,
                         isoformRepExpression = EFTUD2quant$abundance,
                         designMatrix = myDesign,
                         isoformExonAnnoation = "C:/Users/Caleb/BioinfoData/IsoformSwitch/ENCODE_EFTUD2_deNovo/assembly.gtf",
                         isoformNtFasta = "C:/Users/Caleb/BioinfoData/IsoformSwitch/Homo_Sapiens.GRCh38.cdna.all.fa.gz",
                         showProgress = TRUE,
                         ignoreAfterPeriod = TRUE)

Kristoffer Vitting-Seerup

unread,
Apr 12, 2023, 3:07:44 AM4/12/23
to IsoformSwitchAnalyzeR
My guess is that your fasta file ( Homo_Sapiens.GRCh38.cdna.all.fa.gz ) does not contain the sequences of the de-novo assembled transcripts and hence it fails.

Try just removing the "isoformNtFasta" argument. IsoformSwitchAnalyzeR have build in functionality for extracting the sequences further into the workflow :-)

Reply all
Reply to author
Forward
0 new messages