all chromosomes in chromInfo.tab not loading into chromInfo mysql table

6 views
Skip to first unread message

dv...@bio.fsu.edu

unread,
Aug 11, 2014, 11:26:16 AM8/11/14
to genome...@soe.ucsc.edu
Hello,

I'm trying to load a genome with 731,921 scaffolds into a browser instance, but only ~40 of the scaffolds are loaded into the chromInfo sql table.

Fasta can be found here:

Here's what I'm doing in bash:

$ echo $MYSQL
mysql -uroot -ppassword
$ db=triAes
$ wget http://ftp.gramene.org/current/data/fasta/triticum_aestivum/dna/Triticum_aestivum.IWGSP1.22.dna.genome.fa.gz
$ gunzip Triticum_aestivum.IWGSP1.22.dna.genome.fa.gz
$ faToTwoBit Triticum_aestivum.IWGSP1.22.dna.genome.fa genome.2bit
$ twoBitInfo genome.2bit stdout | awk "{printf \"%s\t%s\tgenome.2bit\n\", \$1,\$2}" > chromInfo.tab
$ echo "$(wc -l chromInfo.tab | awk '{print $1}') scaffolds found"
731921 scaffolds found
$ $MYSQL -e "create database $db;"
$ $MYSQL $db < /path/to/kent/src/hg/lib/chromInfo.sql
$ $MYSQL -e "load data local infile \"chromInfo.tab\" into table $db.chromInfo;"
$ $MYSQL -e "select * from $db.chromInfo;" | wc -l
42

This process seems to work fine for other genomes I've loaded, but is failing on this one. Any advice would be appreciated.

Thanks,

Dan

Hiram Clawson

unread,
Aug 11, 2014, 11:38:59 AM8/11/14
to dv...@bio.fsu.edu, genome...@soe.ucsc.edu
Good Morning Dan:

This is due to size of the INDEX defined in chromInfo.sql:
PRIMARY KEY(chrom(16))

compared to the naming pattern in your genome sequence:
IWGSC_CSS_2AL_scaff_6336469 70808
IWGSC_CSS_4AS_scaff_5969124 70057
IWGSC_CSS_4AS_scaff_5933163 66956
...
IWGSC_CSS_7DS_scaff_1212969 200
IWGSC_CSS_7DS_scaff_1542448 200
IWGSC_CSS_7DS_scaff_2458508 200

The names are not unique to the size of the index (16)
28 is the maximum name length, you could use a key of 28:
PRIMARY KEY(chrom(28))

--Hiram

dv...@bio.fsu.edu

unread,
Aug 11, 2014, 12:11:51 PM8/11/14
to Hiram Clawson, genome...@soe.ucsc.edu
Makes perfect sense.

Changing primary key length to 28 worked perfectly.

Thanks Hiram,

Dan


Reply all
Reply to author
Forward
0 new messages