error in create custom genome

30 views
Skip to first unread message

翰欽鄭

unread,
Jul 6, 2017, 11:02:05 PM7/6/17
to biovalidation
Hi:

      I try to create custom genome. But it return errors below. It seems that the process failed in creating hisat2 index.  How do I solve it?

Note: 

The attachment file is the log of the second run with same command.

Command:
bcbio_setup_genome.py -f CP020914.gbk.fna -c 5 --gff3 -g CP020914.gb.gff -n Denitratisoma_sp_DHT3 -b D_DHT3 -i bowtie bowtie2 bwa novoalign star rtg snap star ucsc seq hisat2

Error messages:
Creating the hisat2 index.
Traceback (most recent call last):
  File "/home/zhenghank/zhenghank/bin/bcbio/tools/bin/bcbio_setup_genome.py", line 297, in <module>
    indexed[index] = index_fn(fasta_file)
  File "/mnt/nfsfile/NAS6/zhenghank/bin/bcbio/bcbio_test/test_data/test_genome/cloudbiolinux/cloudbio/biodata/genomes.py", line 425, in decorator
    return func(*args, **kwargs)
  File "/mnt/nfsfile/NAS6/zhenghank/bin/bcbio/bcbio_test/test_data/test_genome/cloudbiolinux/cloudbio/biodata/genomes.py", line 751, in _index_hisat2
    _index_w_command(dir_name, cmd, ref_file, pre=pre_func)

Files in created genome dir


log

翰欽鄭

unread,
Jul 7, 2017, 12:27:05 AM7/7/17
to biovalidation
Hello again:
  

I have another problem. In my command, I input the gff, which transferred from NCBI genbank with bioperl bp_genbank2gff.pl.
However, in my "/custom genome dir/rnaseq", the ref-transcripts.gtf file is null. What's happened? How do I deal with it?

Thank you.

翰欽鄭

unread,
Jul 7, 2017, 4:23:16 AM7/7/17
to biovalidation
I find the reason which causes the problem "ref-transcripts.gtf file is null".
The "gff 2 gtf" process executes after hisat2 index building. However, the error of hisat2 index building causes that the main process stop. Therefore, the "gff 2 gtf" process doesn't executes.

When I run the command without building hisat2, I found the error about "gff 2gtf".
So how do i deal with this error?

Thank you.

Command:
bcbio_setup_genome.py -f CP020914.gbk.fna -c 5 --gff3 -g CP020914.gb.gff -n Denitratisoma_sp_DHT3 -b D_DHT3 -i bowtie bowtie2 bwa novoalign star rtg snap star ucsc seq

error messages:

Creating gffutils database for /NAS6/zhenghank/bin/bcbio/genomes/Denitratisoma_sp_DHT3/D_DHT3/tmpcbl/ref-transcripts.gtf.
Traceback (most recent call last):
  File "/NAS6/zhenghank/bin/bcbio/bcbio_test/test_data/test_genome/cloudbiolinux/utils/prepare_tx_gff.py", line 821, in <module>
    main(args.org_build, args.gtf, args.fasta, genome_dir, args.cores)
  File "/NAS6/zhenghank/bin/bcbio/bcbio_test/test_data/test_genome/cloudbiolinux/utils/prepare_tx_gff.py", line 286, in main
    db = _get_gtf_db(gtf_file)
  File "/NAS6/zhenghank/bin/bcbio/bcbio_test/test_data/test_genome/cloudbiolinux/utils/prepare_tx_gff.py", line 757, in _get_gtf_db
    disable_infer_transcripts, disable_infer_genes = guess_disable_infer_extent(gtf)
  File "/NAS6/zhenghank/bin/bcbio/bcbio_test/test_data/test_genome/cloudbiolinux/utils/prepare_tx_gff.py", line 729, in guess_disable_infer_extent
    db = _create_tiny_gffutils_db(gtf_file)
  File "/NAS6/zhenghank/bin/bcbio/bcbio_test/test_data/test_genome/cloudbiolinux/utils/prepare_tx_gff.py", line 700, in _create_tiny_gffutils_db
    disable_infer_transcripts=True)
  File "/home/zhenghank/zhenghank/bin/bcbio/anaconda/lib/python2.7/site-packages/gffutils/create.py", line 1273, in create_db
    c.create()
  File "/home/zhenghank/zhenghank/bin/bcbio/anaconda/lib/python2.7/site-packages/gffutils/create.py", line 488, in create
    self._populate_from_lines(self.iterator)
  File "/home/zhenghank/zhenghank/bin/bcbio/anaconda/lib/python2.7/site-packages/gffutils/create.py", line 609, in _populate_from_lines
    raise ValueError("No lines parsed -- was an empty file provided?")
ValueError: No lines parsed -- was an empty file provided?
Traceback (most recent call last):
  File "/home/zhenghank/zhenghank/bin/bcbio/tools/bin/bcbio_setup_genome.py", line 304, in <module>
    subprocess.check_call(cmd.format(**locals()), shell=True)
  File "/home/zhenghank/zhenghank/bin/bcbio/anaconda/lib/python2.7/subprocess.py", line 186, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '/home/zhenghank/zhenghank/bin/bcbio/anaconda/bin/python /NAS6/zhenghank/bin/bcbio/bcbio_test/test_data/test_genome/cloudbiolinux/utils/prepare_tx_gff.py --cores 5 --genome-dir /NAS6/zhenghank/bin/bcbio/genomes --gtf /NAS6/zhenghank/bin/bcbio/genomes/Denitratisoma_sp_DHT3/D_DHT3/rnaseq/ref-transcripts.gtf Denitratisoma_sp_DHT3 D_DHT3' returned non-zero exit status 1



翰欽鄭於 2017年7月7日星期五 UTC+8下午12時27分05秒寫道:

Rory Kirchner

unread,
Jul 7, 2017, 10:24:10 AM7/7/17
to 翰欽鄭, biovalidation
Hi,

Sorry about the problems, I’m guessing that the genbank conversion produces a GFF3 file that doesn’t have some features we need. If you could pass along your .fna file and the gff file, I can take a look and see if we can figure out how to parse it correctly. You can email them directly to me at rory.k...@gmail.com. Thanks!

Best,

Rory

--
You received this message because you are subscribed to the Google Groups "biovalidation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biovalidatio...@googlegroups.com.
To post to this group, send email to bioval...@googlegroups.com.
Visit this group at https://groups.google.com/group/biovalidation.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages