Import R data error: 'X' Contains Missing Values

Caleb

unread,

Jul 27, 2023, 3:12:24 PM7/27/23

to IsoformSwitchAnalyzeR

I am attempting to run importRdata() but I keep getting an error just for this specific dataset. I have analyzed other datasets from the same source (ENCODE) and have not had any issues. The only difference in terms of data structure that I'm aware of is that other datasets have 2 runs each for the WT and KD conditions, while this dataset has 2 WT and 4KD runs (as seen in the design dataframe below). importIsoformExpression() ran normally.

Here is the code I'm trying to run:

myDesign = data.frame(sampleID = c("SRR14837933","SRR14837934","SRR14844613","SRR14844614","SRR14844615","SRR14844616"),
condition = c("WT","WT","KD","KD","KD","KD"))
SwitchList = importRdata(isoformCountMatrix = PRPF4quant$counts,
isoformRepExpression = PRPF4quant$abundance,
designMatrix = myDesign,
isoformExonAnnoation = "C:/Users/Caleb/BioinfoData/IsoformSwitch/ENCODE_PRPF4_ISAR/st_merged.gtf",
isoformNtFasta = "C:/Users/Caleb/BioinfoData/IsoformSwitch/ENCODE_PRPF4_ISAR/PRPF4_transcripts.fa",
showProgress = TRUE)

And here is the output of importRdata():

Step 1 of 10: Checking data...
Step 2 of 10: Obtaining annotation...
importing GTF (this may take a while)...
148689 ( 50.76%) isoforms were removed since they were not expressed in any samples.
Step 3 of 10: Fixing StringTie gene annoation problems...
11238 isoforms were assigned the ref_gene_id and gene_name of their associated gene_id.
This was only done when the parent gene_id were associated with a single ref_gene_id/gene_name.
3896 isoforms were assigned the ref_gene_id and gene_name of the most similar
annotated isoform (defined via overlap in genomic exon coordinates).
This was only done if the overlap met the requriements
indicated by the three fixStringTieViaOverlap* arguments.
We were unable to assign 458 isoforms (located within annotated genes) to a known ref_gene_id/gene_name.
These were removed to enable analysis of the rest of the isoform from within the merged genes.
2191 gene_ids which were associated with multiple ref_gene_id/gene_names
were split into mutliple genes via their ref_gene_id/gene_names.
32074 genes_id were assigned their original gene_id instead of the StringTie gene_id.
This was only done when it could be done unambiguous.
Step 4 of 10: Calculating expression estimates from count data...
Skipped as user supplied expression via the "isoformRepExpression" argument...
Step 5 of 10: Testing for unwanted effects...
Error in density.default(x, adjust = adj) : 'x' contains missing values
In addition: Warning messages:
1: In importRdata(isoformCountMatrix = PRPF4quant$counts, isoformRepExpression = PRPF4quant$abundance, :
We found 8976 (2.97%) unstranded transcripts.
These were removed as unstranded transcripts cannot be analysed
2: In importRdata(isoformCountMatrix = PRPF4quant$counts, isoformRepExpression = PRPF4quant$abundance, :
No CDS annotation was found in the GTF files meaning ORFs could not be annotated.
(But ORFs can still be predicted with the analyzeORF() function)

Kristoffer Vitting-Seerup

unread,

Jul 31, 2023, 8:28:08 AM7/31/23

to IsoformSwitchAnalyzeR

Thanks for reporting this issue. Which version of IsoformSwitchAnalyzeR are you using?

Caleb

unread,

Jul 31, 2023, 10:45:18 AM7/31/23

to IsoformSwitchAnalyzeR

I am using version 2.01.03

Caleb

unread,

Aug 3, 2023, 4:43:16 PM8/3/23

to IsoformSwitchAnalyzeR

I managed to get it to run by catenating the FASTQ files from the same KD biological replicate so that I was only analyzing two KD and two WT samples. For example, I merged SRR14844613 with SRR14844614 before conducting any of the pre-Isoform Switch AnalyzeR steps of alignment and quantification. When giving importRdata() two runs for each condition it ran normally and gave outputs like % of unstranded transcripts similar to what I've seen in other datasets.

Reply all

Reply to author

Forward