Sequence name doesnt match regex

295 views
Skip to first unread message

lailasa...@gmail.com

unread,
Jan 22, 2020, 3:31:41 AM1/22/20
to igv-help
Hi,
I have loaded my genome in IGV (same file as I used for read mapping), but when I want to load my bam files, I get the error message.

Error loading C:\Skrivbordet\ISP-D56-B_merge_sorted.bam: An error occurred while accessing: C:\Skrivbordet\ISP-D56-B_merge_sorted.bam
Error loading BAM file: htsjdk.samtools.SAMException: Sequence name 'gi|0|lcl|HPV-m16031680Anr.1|HumanPapillomavirusm16031680A(HPV-m16031680A),completegenome' doesn't match regex: '[0-9A-Za-z!#$%&+./:;?@^_|~-][0-9A-Za-z!#$%&*+./:;=?@^_|~-]*'  

I see that the name of the chromosome in the reference genome (gi|0|lcl|HPV-m16031680Anr.1|HumanPapillomavirusm16031680A(HPV-m16031680A),completegenome), and the name of the chromosome in the bam file are identical. 
I didnt have this problem before with an older version. Now I am using   v 2.7.2.
Any ideas how to solve this? Thanks.

James Robinson

unread,
Jan 22, 2020, 11:45:02 AM1/22/20
to igv-help
This error is coming from the htsjdk, the library used to parse BAM files.  It looks like it is enforcing the rules on sequence (chromosome)names from the SAM/BAM specification,  section 1.4 here (RNAME): https://samtools.github.io/hts-specs/SAMv1.pdf

I suggest opening an issue here since any solution will involve the htsjdk team.  Let me know when that's done and I will comment and follow it.   Also, if you could post the first few lines of your fasta file that would be helpful.   

--

---
You received this message because you are subscribed to the Google Groups "igv-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to igv-help+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/igv-help/38a7c020-2f2a-4f3c-b004-c1edc9e482c9%40googlegroups.com.

Helga Thorvaldsdottir

unread,
Jan 22, 2020, 5:00:47 PM1/22/20
to igv-help
Here's where you can file the htsjdk issue: https://github.com/samtools/htsjdk/issues 

lailasa...@gmail.com

unread,
Jan 23, 2020, 4:19:04 AM1/23/20
to igv-help
Hi,

Now I have seem that the problem comes from the parenthesis I have in the naming of the chromosomes. This is the example of my fasta file:

>gi|0|lcl|HPV-mEV03c09nr.1|HumanPapillomavirus_mEV03c09(HPV-mEV03c09),completegenome
atggatttaaagacagtacaggatctcagaaaacatttacaggtacctacagaggacctgttggtagcttgtaatttttg
tggacactttttaacttttctagagcttcttgaatttgataataaaactttaaatcttatttggaaagaagggcatgctt
ttggatgttgctcacgctgtgcagcagcagtagctaaaatagaatttgagaatttttatgagaaaactgtaataggaaag
gaaatagaagctgagagtagaagtttgctttgttgtatagctgtaaggtgctgttactgtttaagatttttagattatct
agaaaagttgtatatttgctctcacaatttacaattttataaagtgagaggttgttggaaaggtgtttgcagacattgta
gacagatatgattgggaaagaagtaactataccagacattgagcttgagctgcaagaccttgtccagcccattgacctgc
attgtgacgaagtgctacctgaagagtcagaaaacctgtcagaatcttcacaagcagaggtggagcctgaaagaatcctc
ttcaagattgttgctccgtgtggaggctgtgaaacccgtctgaaaattcacatcgcgtctactcgttttggaattcgttg
cttggaagaattgttgctgtctgaactttgtttgctgtgtcctgtgtgcagaaatggcagacaatagaggtactgatcct

>gi|0|lcl|HPV-mEV03c104nr.1|HumanPapillomavirus_mEV03c104(HPV-mEV03c104),completegenome
atggctgcttattttcctcgtagtttagatgaatattgtaaatattttgagattgatttttttagtttacgactgcgttg
cattttttgtttattttatacaagctttgaggaccttgctgcttttcatactaagaaacttaatatagtttggagaagta
atgtaccttttgtatgctgtacaaaatgttgtagacattcagctttaatagagaaacagaaatactttcagtgctctgtc
aaatgcagaattttagatgctgttgttggtaaacctataactgaaatttatatgcggtgtacttggtgctttgcattact
ggattgtgctgaaaaagtagatttgtgtgccagagatgatttagcacttttaattcgtggctattggagagctgattgca
gaaactgctgcataaaagaatgagaggtcacgcaccagatataaaagacatagaattagattttcatgacttaattttac
cagcaaatgttcttgctccagaagagtcattgtcactagacgaagaaccagaggaggagccaaaagaaccttatagggtt
gacacctcttgtggagtttgcggaacaggtgtaaggattattgttatagccacttgtgcttcagtgcgtacagtgcagca
attacttttaagagaattgtcttttgtttgcgtcgagtgctacaggaccaggattcatcatgggggatcccagtaaaggt
actgataaaaatacttatgaaaacagtgaaagtgttgattggtttattgtgcatgaagctgactgtgtagatgatgactt
aaatgcgttagaaaatatatttgaagaaagcaacagtgatactgatatttcgaaccttatagacgatgatcaagtggatc
agggaaattccctggcactcttcaatactcagctagcaaatgattgtgagagagctttactagatctaaaacgaaagtat
attccaagccctgaaagatctattgcagatctgagtccgaggcttgctgctgtacatatttctccacaaagacaaatcaa
gaaaagattgtttgaggacagtggcgtagttgaagatgaagctgaaagtactaatgaaaatgtgcaggtagaaccaccac
gtgatagccgccatactgaaattcaaagtggtggtcaaataattgagttattaaaatgttctaactttaaagctttattt
ttagctaaatttaaagaactatttggggtttcgtaccatgatctgactagaacattcaaaagtgataaaacatgttgtga

I can easily remove it and rename the chromosomes for further analysis, but not sure if there is a turnaround to be able to see it in IGV for previous analyzed data. 

Thank you very much for the answer.





To unsubscribe from this group and stop receiving emails from it, send an email to igv-...@googlegroups.com.

katiev...@gmail.com

unread,
Apr 7, 2020, 8:37:25 AM4/7/20
to igv-help
Hi James, Helga, 

I have the same regex error (in my case it's due to a comma in the header e.g.  'gi|545903863|ref|NZ_BATA01000117.1|:1938-2201,2205-2258' and I've also seen it before with single quotes, as in 5' or 3'. I don't see an issue opened under htsjdk, did anyone come up with a solution or shall I open an issue?

Regards,
Katie.
To unsubscribe from this group and stop receiving emails from it, send an email to igv-...@googlegroups.com.

Laila Sara Arroyo Mühr

unread,
Apr 7, 2020, 9:06:05 AM4/7/20
to igv-...@googlegroups.com
I didn’t find a way to fix it but what I did is renaming my headers avoiding those “()” that gave me the issue, and it worked perfectly. Downside is that I had to redo the mapping...

Best

Sara

You received this message because you are subscribed to a topic in the Google Groups "igv-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/igv-help/8wRmwA-4skE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to igv-help+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/igv-help/85d27c29-1e73-43fb-b069-0bc2ffa5adc2%40googlegroups.com.
--
Laila Sara Arroyo

James Robinson

unread,
Apr 7, 2020, 11:41:55 AM4/7/20
to igv-help
Hi,
Does samtools load these bams?  If so I would suggest opening an issue in the htsjdk as they are inconsistent.   However if the sequence name is out of spec they might consider it a samtools problem, the htsjdk is stricter with respect to the spec.

If samtools can read the file you could fix the BAM without having to redo the alignments with a script,   bam -> sam -> rename script -> bam -> index

Laila Sara Arroyo Mühr

unread,
Apr 7, 2020, 12:14:41 PM4/7/20
to igv-...@googlegroups.com
Hi,

Thanks for answering! I have worked with these files and Samtools to sort them, extracted only mapped reads, index, etc and it worked fine. The problem came when loading the reference in IGV.

Best

Sara

--

---
You received this message because you are subscribed to a topic in the Google Groups "igv-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/igv-help/8wRmwA-4skE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to igv-help+u...@googlegroups.com.
--
Laila Sara Arroyo

lailasa...@gmail.com

unread,
Apr 7, 2020, 12:16:18 PM4/7/20
to igv-help
Great! Thanx!

Katie Lennard

unread,
Apr 8, 2020, 4:55:09 AM4/8/20
to igv-...@googlegroups.com
Same here, samtools doesn't have a problem with the headers, only when trying to load to IGV.

On Tue, Apr 7, 2020 at 6:16 PM <lailasa...@gmail.com> wrote:
Great! Thanx!


--

---
You received this message because you are subscribed to a topic in the Google Groups "igv-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/igv-help/8wRmwA-4skE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to igv-help+u...@googlegroups.com.


--

Katie

Bioinformatician
Division of Computational Biology
Institute of Infectious Diseases & Molecular Medicine
University of Cape Town
South Africa

Katie Lennard

unread,
Apr 8, 2020, 4:59:15 AM4/8/20
to igv-...@googlegroups.com

John Marshall

unread,
Apr 8, 2020, 5:38:20 AM4/8/20
to igv-help
On Tuesday, April 7, 2020, James Robinson wrote:
Does samtools load these bams?  If so I would suggest opening an issue in the htsjdk as they are inconsistent.   However if the sequence name is out of spec they might consider it a samtools problem, the htsjdk is stricter with respect to the spec.

Samtools does not in general enforce things like this when merely reading files, so would happily load the BAM files. Nonetheless it considers such sequence names out of spec and would likely complain or not find the expected sequence if you e.g. queried for regions in out-of-spec-named sequences in samtools view.

You can use this to your advantage to use the samtools reheader command to change the problematic characters directly in your BAM files (unless the records have SA tags or others that contain sequence names as text), or via samtools view -> SAM -> sed -> samtools view -> BAM as already suggested (which will also handle SA tags etc).

    John

Katie Lennard

unread,
Apr 8, 2020, 6:12:15 AM4/8/20
to igv-...@googlegroups.com
Thanks John!

--

---
You received this message because you are subscribed to a topic in the Google Groups "igv-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/igv-help/8wRmwA-4skE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to igv-help+u...@googlegroups.com.

Laila Sara Arroyo Mühr

unread,
Apr 8, 2020, 6:14:08 AM4/8/20
to igv-...@googlegroups.com
Thanks! Very much appreciated. 

--

---
You received this message because you are subscribed to a topic in the Google Groups "igv-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/igv-help/8wRmwA-4skE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to igv-help+u...@googlegroups.com.
--
Laila Sara Arroyo
Reply all
Reply to author
Forward
0 new messages