Non-numeric chromosome name

797 views
Skip to first unread message

Ashley Curtis

unread,
Dec 9, 2013, 6:27:08 PM12/9/13
to tas...@googlegroups.com
Hello,

I am having difficulty trying to build my reference genome using BWA without any non-numeric names. When I get to the SAMConverterPlugin I keep getting the error:

SAMConverterPlugin detected a non-numeric chromosome name: >gi|212551328|ref|NW_002198269.1| Taeniopygia guttata isolate Black17 chromosome 2 genomic scaffold, Taeniopygia_guttata-3.2.4 Tgu2_WGA213_1, whole genome shotgun sequence

Please change the FASTA headers in your reference genome sequence to integers (>1, >2, >3, etc.) OR to 'chr' followed by an integer (>chr1, >chr2, >chr3, etc.)

I have tried the grep command on my fasta file and found that there are thousands of these names in my file. I am new to unix/linux systems and I found the sed command online but since the numeric part of the names differ for each scaffold I thought I would have to change each scaffold separately which seemed like a ridiculous amount of work and I figured somebody somewhere must have been smart enough to find an easier way to do this. I instead tried to only change the non-numeric parts of the scaffold names (since they are the same for each scaffold). I used
sed 's/>gi/1/g'
for example to change the >gi portion into a number. When I had finished this for all non-numeric parts of the name I then continued to build the reference index in BWA and align my contigs to it. When I got to the SAMConverterPlugin, however, I ran into the same problem, only now it says:
SAMConverterPlugin detected a non-numeric chromosome name: 155357600021973620

The chromosome name it tells me is non-numeric is only composed of numbers so I'm not sure what the problem is. I figure it's because I am doing something wrong with the sed command but I am too new to unix based systems that I don't know how to fix it. If you can tell me where I'm going wrong or if there's a different method I should be using to change my chromosome names to numeric ones I would greatly appreciate it. I downloaded the chromosome sequences from ncbi using FTP and then concatenated them using the cat function. I also am using the zebra finch reference genome (and if I can figure out how to get it to work, I'd also like to try one or two other bird genomes).

Please let me know if you require any further information from me and thank you for any help you can offer.
Ashley

Kanokwan

unread,
Aug 26, 2014, 4:50:43 PM8/26/14
to tas...@googlegroups.com
Hi Ashley, 

I am very new for TASSEL and GBS pipeline. I got the same error with SAMConverterPlugin. 

"SAMConverterPlugin detected a non-numeric chromosome name: 1GATCGATGCTCCTGTGCTCCCAACCCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCT" 

My reference genome is in this format  >1, >2, >3, etc. OR to 'chr' followed by an integer >chr1, >chr2, >chr3, etc.

I tried to change the format in numeric in Linux but I still get the same error. 

Do you have any suggestion? Thank you 

Kanokwan

Kanokwan

unread,
Aug 30, 2014, 10:19:41 PM8/30/14
to tas...@googlegroups.com
I solved this problem by checking the reference fasta file using hex editor program. When you edit the fasta headers to >chr1, >chr2, >chr3, ....in text editor such as notepad make sure that you press the "Enter" key to insert a line break once the header line is edited. 
Read more : http://www.ehow.com/how_6158421_convert-txt-file-fasta.html   


On Monday, December 9, 2013 5:27:08 PM UTC-6, Ashley Curtis wrote:

Peri Bolton

unread,
May 27, 2015, 10:35:06 PM5/27/15
to tas...@googlegroups.com
Hi Ashley,

I am encountering the exact same problem, but only when using bwa. Did you manage to solve it for bwa? Infact, I am using the same reference genome.

I can convert it into a .topm when I use bowtie2, but then I can't get the final SNP calls using TagsToSNPSByAlignment or DiscoverySNPCaller

Cheers,

Peri

Lynn Carol Johnson

unread,
May 28, 2015, 6:54:49 AM5/28/15
to tas...@googlegroups.com
TO clarify:  You are using the old GBS pipeline, not the GBSv2 pipeline?  The non-numeric support for chromosomes is only implemented for GBSv2.  The old pipeline expects the chromosome names to be as shown in your error message below.  Sorry for the confusion – I should have been clearer earlier.

--
You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tassel+un...@googlegroups.com.
To post to this group, send email to tas...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tassel/e8fd3133-05c5-4610-b71a-683266105319%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jeff Glaubitz

unread,
May 28, 2015, 2:19:07 PM5/28/15
to tas...@googlegroups.com

>SAMConverterPlugin detected a non-numeric chromosome name: 155357600021973620
>The chromosome name it tells me is non-numeric is only composed of numbers so I'm not sure what the problem is.

 

The number is too big.  The maximum integer size in Java is 2,147,483,647. 

 

Best,

 

Jeff

 

--
Jeff Glaubitz
Project Manager
Biology of Rare Alleles in Maize and its Wild Relatives
National Science Foundation award IOS-1238014
http://www.panzea.org
Institute for Genomic Diversity
Cornell University
175 Biotechnology Bldg
Ithaca, NY 14853
Phone: 607-255-1386
jcg...@cornell.edu

Peri Bolton

unread,
May 28, 2015, 7:47:05 PM5/28/15
to tas...@googlegroups.com, lc...@cornell.edu
Hi Lynn,

I don't know what version of the pipeline I am using. I have been using both TASSEL 3.0 and TASSEL 5.0 to implement this, and I have been using the flowchart on the website as my guide (https://bytebucket.org/tasseladmin/tassel-5-source/wiki/docs/TasselPipelineGBS.pdf).

Cheers,

Peri

Lynn Carol Johnson

unread,
May 28, 2015, 9:30:37 PM5/28/15
to Peri Bolton, tas...@googlegroups.com
Looks like you are using the old pipeline.  I believe jeff identified the problem as a  chromosome number that is out of range.
Reply all
Reply to author
Forward
0 new messages