mismatch in chromosome names between the data file and the IGV genome

1,098 views
Skip to first unread message

Ana Ruiz Manzano

unread,
Jun 3, 2022, 10:17:20 AM6/3/22
to igv-help

Hi,  I’m working with E.coli chromosome and RNA seq data. I uploaded my own genome (Escherichia_coli_str_k_12_substr_mg1655_gca_000005845.ASM584v2.dna.chromosome.Chromosome.fa) And the corresponding gff3 file (Escherichia_coli_str_k_12_substr_mg1655_gca_000005845.ASM584v2.49.gff3) to see the genes.

When I try to load my data sam files, I get the error of sequence names not matching the genome. The gff3 file is using gene name and my data is using transcripts id. (see pic)

I created different chromosome name alias files, with the gene’s names and the transcript_id,or location and transcript_id, but it is not fixing the problem.                        

My IGV version is 2.12.3

Any help will be awesome. Thanks, ana

Screen Shot 2022-06-03 at 09.12.32.png
igv0.log
Escherichia_coli_str_k_12_substr_mg1655_gca_000005845.ASM584v2.dna.chromosome.Chromosome.fa
Escherichia_coli_str_k_12_substr_mg1655_gca_000005845.ASM584v2.dna.chromosome.Chromosome_alias.tab

igv-help

unread,
Jun 3, 2022, 12:22:12 PM6/3/22
to igv-help

Hi,

The error message is about an issue between the bam file and the reference genome sequence. What reference genome was used by the aligner when the bam file was created? You need to load that as your reference genome.  IGV renders the reads at the chromosome/contig & locus specified in the bam file. An alias file can be used if the chromosome/contig names in the bam file are not the same as in the reference genome sequence. For example, if one has chromosomes named chr1, chr2, chr3, ... and the other has chromosomes named 1, 2, 3, ... But the alias file cannot be used to map between contigs/chromosomes of different types/lengths.

Helga

Ana Ruiz Manzano

unread,
Jun 3, 2022, 12:55:40 PM6/3/22
to igv-help
Hi Helga, thanks for replying. 
The loaded reference genome is the same I used to align and create the bam files using minimap2. What type of alias would you use in this case?
ana

igv-help

unread,
Jun 3, 2022, 7:12:00 PM6/3/22
to igv-help
Hi ana,

If you are sure that is the case there is some problem here we haven't seen before.   According to the screenshot the chromosome names in your bam file are of this form:  AAD13438, etc.   If you have loaded the same fasta as your reference that you used to do the alignments there should be no mismatch, the chromosome (or more technically sequence) names in your fasta should also be AAD13438, etc.    However the chromosome (sequence) name you have selected in the pulldown is simply "Chromosome",  which is a bit unusual but is a legal name.

To help debug this further you can extract lines starting with ">" in your fasta file (or just open it in a text editor and look),  and see what the chromosome (sequence) names are.   What do you find there?    The name will be the string following ">" up to the first whitespace.

igv-help

unread,
Jun 3, 2022, 7:18:38 PM6/3/22
to igv-help
Apologies I just noticed you included the fasta.   It has a single sequence with the name "Chromosome" which matches your screenshot.   It does not look like your bam file was aligned to this fasta,  my guess is it was aligned to a fasta file containing transcript sequences.   This is not an aliasing issue,   aliasing refers to different names for the same sequence.    You will need to align your data to the DNA sequence if that is what you want to view it against,  IGV does not do alignments it merely displays them.

Ana Ruiz Manzano

unread,
Jun 3, 2022, 7:54:44 PM6/3/22
to igv-...@googlegroups.com

Yes, you were totally right, those bam files were created with the cdna version of that chromosome. It makes total sense, but I couldn’t figure it out as I was so focused in the alias file.

I ran the alignment with the chromosome file and now I can see the reads. My sequence doesn’t have any genes assigned, as opposed to the chromosome references that IGV has, but I can see the gene names location if I display the gff3 file.  If you know of a way to make that genomic sequence to display gene names, let me know!

Thanks again

ana

--

---
You received this message because you are subscribed to the Google Groups "igv-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to igv-help+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/igv-help/a2236da1-83f2-4661-8dc2-27403514c116n%40googlegroups.com.

James Robinson

unread,
Jun 3, 2022, 8:04:06 PM6/3/22
to igv-...@googlegroups.com
Glad you got that working.   The gene names you see in IGV are just annotation files, like your gff3,  there are no gene names associated with the sequence per se apart from annotation files.    When you load a hosted genome in IGV it automatically loads an annotation file.    If you want to create a file that will do that with your fasta + gff you can create one as described here, then load the resulting "json" file from the genome menu. https://github.com/igvteam/igv/wiki/JSON-Genome-Format.    Note that most of the properties in the json file are optional.     Also note that the "url" fields can be local file paths, either absolute or relative to the "json" file location.

Ana Ruiz Manzano

unread,
Jun 3, 2022, 8:05:56 PM6/3/22
to igv-...@googlegroups.com

Thanks! That will be helpful

--


---
You received this message because you are subscribed to the Google Groups "igv-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to igv-help+u...@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages