Hello everyone!
I have my hands on some data that I was made aware is a product of mycoplasma-contaminated cell lines. This can also be confirmed by odd GC curves in a FASTQ reports.
All other issues aside with contamination, we want to assess if our data is recoverable. My strategy for this is to combine a few common mycoplasma genomes with the human genome during the genome generation step. The rationale is that the directly contaminated reads will be "sucked up" into the contaminant genomes.
My question is: would I then need to alter the GTF file used for genome generation? I don't care too much about reads that map to these genomes, and even if we wanted to check mapping statistics, we would probably look into it from the genome perspective rather than the gene/transcript level.