Mycoplasma contamination and genome generation

19 views
Skip to first unread message

Joshua Mincer

unread,
May 4, 2023, 4:17:11 PM5/4/23
to rna-star
Hello everyone!
I have my hands on some data that I was made aware is a product of mycoplasma-contaminated cell lines. This can also be confirmed by odd GC curves in a FASTQ reports. 

All other issues aside with contamination, we want to assess if our data is recoverable. My strategy for this is to combine a few common mycoplasma genomes with the human genome during the genome generation step. The rationale is that the directly contaminated reads will be "sucked up" into the contaminant genomes. 

My question is: would I then need to alter the GTF file used for genome generation? I don't care too much about reads that map to these genomes, and even if we wanted to check mapping statistics, we would probably look into it from the genome perspective rather than the gene/transcript level. 

Alexander Dobin

unread,
May 4, 2023, 4:26:27 PM5/4/23
to rna-star
Hi,

you only need GTF if either (i) there are annotated splice junctions (not for mycoplasma, I guess), or (ii) you need to count reads per gene.
So it seems you can use just the GTF for the main species.

Joshua Mincer

unread,
May 4, 2023, 4:27:31 PM5/4/23
to rna-star
Great, thanks for the confirmation!
Reply all
Reply to author
Forward
0 new messages