Question about reference genome install

已查看 58 次
跳至第一个未读帖子

翰欽鄭

未读,
2017年7月11日 04:22:122017/7/11
收件人 biovalidation
Hello:

I have a question to ask and some bugs to report.

Question:
In bcbio_nextgen_insall.py, I saw several species genome which can be installed.

such as: GRCh37", "hg19", "hg38", "hg38-noalt", "mm10", "mm9", "rn6", "rn5","canFam3", "dm3", "galGal4", "phix","pseudomonas_aeruginosa_ucbpp_pa14","sacCer3", "TAIR10", "WBcel235", "xenTro3", "GRCz10

Does these installed references of these species support all Bcbio functions? (i.e. RNA-seq)
Because, I didn't see the "rnaseq" directory in phiX174, pseudomonas_aeruginosa_ucbpp_pa14. But I found it in GRCh37.

Thank you


Bug report:
Tair
command:
bcbio_nextgen.py upgrade --genomes TAIR10 > tair10_install_log 2> tair10_install_log

bug:
The link below is unavailable.
0001735.3_TAIR10/GCF_000001735.3_TAIR10_genomic.gff.gz 

Correct link:


dm3
command
bcbio_nextgen.py upgrade --genome dm3 > dm3_log 2> dm3_errorlog

bug:
The wget command below didn't add the "--no-check-certificate" when links to "Encrypted website (https)".


PS: The attach file are the log of installing genome (dm3 and tair10).

dm3_log
tair10_install_log
dm3_errorlog

Rory Kirchner

未读,
2017年7月12日 10:40:322017/7/12
收件人 翰欽鄭、biovalidation
Hi,

Sorry about that, regarding the question, not all of the genomes are set up to run RNA-seq, for the most part we just support the main model organisms. We have a script that can take a FASTA sequence of an organism and a GTF file and set up for RNA-seq if you need that for some unsupported organisms

Thanks for the bug report and providing the fixes, that is super helpful. 

I added —no-check-certificate to the snpEff download call here:


and moved the erroneous URL for the mirbase setup here:


Hope that gets you going. Thanks again for the awesome bug report!

Best,

Rory

--
You received this message because you are subscribed to the Google Groups "biovalidation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biovalidatio...@googlegroups.com.
To post to this group, send email to bioval...@googlegroups.com.
Visit this group at https://groups.google.com/group/biovalidation.
For more options, visit https://groups.google.com/d/optout.
<dm3_log><tair10_install_log><dm3_errorlog>

翰欽鄭

未读,
2017年7月12日 21:53:392017/7/12
收件人 biovalidation、hank...@gmail.com
Hello:

        After updating the Bcbio with update command, I rerun the install genome command and the error occurred.
The messages below is short. Please fix it, thank you.


[zhenghank@r910a bcbio]$ bcbio_nextgen.py upgrade --genome TAIR10
Traceback (most recent call last):
  File "/home/zhenghank/zhenghank/bin/bcbio/tools/bin/bcbio_nextgen.py", line 33, in <module>
    from bcbio import install, utils, workflow
  File "/home/zhenghank/zhenghank/bin/bcbio/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 226
    subprocess.check_call(["wget", "-O", req_file, "--no-check-certificate", REMOTES["requirements"]]) subprocess.check_call([conda_bin, "install", "--update-deps", "--quiet", "--yes",
                                                                                                                ^
SyntaxError: invalid syntax





Rory Kirchner於 2017年7月12日星期三 UTC+8下午10時40分32秒寫道:

Brad Chapman

未读,
2017年7月13日 03:31:222017/7/13
收件人 翰欽鄭、biovalidation、hank...@gmail.com

Sorry about the issue, this was a typo in the update fixes. You might have to
grab a new version of the install file since it won't allow you to update
though the regular mechanism:

wget --no-check-certificate https://raw.githubusercontent.com/chapmanb/bcbio-nextgen/master/bcbio/install.py
mv install.py /home/zhenghank/zhenghank/bin/bcbio/anaconda/lib/python2.7/site-packages/bcbio/install.py
bcbio_nextgen.py upgrade -u development --genome TAIR10

Thanks again for the report and hope this gets it working for you,
Brad


> [ text/plain ]
>> On Jul 11, 2017, at 4:22 AM, 翰欽鄭 <hank...@gmail.com <javascript:>> wrote:
>>
>> Hello:
>>
>> I have a question to ask and some bugs to report.
>>
>> *Question:*
>> In bcbio_nextgen_insall.py, I saw several species genome which can be
>> installed.
>>
>> such as: GRCh37", "hg19", "hg38", "hg38-noalt", "mm10", "mm9", "rn6",
>> "rn5","canFam3", "dm3", "galGal4",
>> "phix","pseudomonas_aeruginosa_ucbpp_pa14","sacCer3", "TAIR10", "WBcel235",
>> "xenTro3", "GRCz10
>>
>> Does these installed references of these species support all Bcbio
>> functions? (i.e. RNA-seq)
>> Because, I didn't see the "rnaseq" directory in phiX174,
>> pseudomonas_aeruginosa_ucbpp_pa14. But I found it in GRCh37.
>>
>> Thank you
>>
>>
>> *Bug report:*
>> *Tair*
>> command:
>> bcbio_nextgen.py upgrade --genomes TAIR10 > tair10_install_log 2>
>> tair10_install_log
>>
>> bug:
>> The link below is unavailable.
>> ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_00
>> 0001735.3_TAIR10/GCF_000001735.3_TAIR10_genomic.gff.gz
>>
>> Correct link:
>>
>> ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/735/GCF_000001735.3_TAIR10/
>>
>>
>> *dm3*
>> command
>> bcbio_nextgen.py upgrade --genome dm3 > dm3_log 2> dm3_errorlog
>>
>> bug:
>> The wget command below didn't add the "--no-check-certificate" when links
>> to "Encrypted website (https)".
>>
>> wget
>> https://jaist.dl.sourceforge.net/project/snpeff/databases/v4_3/snpEff_v4_3_BDGP5.75.zip
>>
>> PS: The attach file are the log of installing genome (dm3 and tair10).
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "biovalidation" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to biovalidatio...@googlegroups.com <javascript:>.
>> To post to this group, send email to bioval...@googlegroups.com
>> <javascript:>.

翰欽鄭

未读,
2017年7月13日 05:07:032017/7/13
收件人 biovalidation、hank...@gmail.com
Hi:

After updating, the install.py works fine.

However, I find a place that looks like a typo in ggd-recipes/TAIR10/mirbase.yaml.
The line "GCF/000/001/735/GCF_000001735.3_TAIR10/" (in row 14) seems to be a typo. Because, the file with this line can't pass the scanner (/home/zhenghank/zhenghank/bin/bcbio/anaconda/lib/python2.7/site-packages/yaml/scanner.py)(The error messages lists below) .

Otherwise, I have try to re-update bcbio after I update the install.py. But the error occurred. 

UnsatisfiableError: The following specifications were found to be in conflict:
  - bioconductor-deseq2
  - bioconductor-iranges 2.4.7*
  - r-tibble
Use "conda info <package>" to see the dependencies for each package.


Fatal error: local() encountered an error (return code 1) while executing '/NAS6/zhenghank/bin/bcbio/anaconda/bin/conda install --quiet -y -c bioconda -c conda-forge -c r age-metasv bamtools bamutil ............................(omitted)


The detail messages presents in attach files.


Please solve it, thank you.

The information of ggd-recipes/TAIR10/mirbase.yaml.
Before modify


After Modify:


The error message:

INFO: List of genomes to get (from the config file at '{'install_liftover': False, 'genome_indexes': ['bwa', 'bowtie2', 'rtg', 'bowtie'], 'genome s': [{'name': 'Arabidopsis thaliana (TAIR10)', 'dbkey': 'TAIR10', 'annotations': ['mirbase']}], 'install_uniref': False}'): Arabidopsis thaliana  (TAIR10)
Traceback (most recent call last):
  File "/home/zhenghank/zhenghank/bin/bcbio/tools/bin/bcbio_nextgen.py", line 215, in <module>
    install.upgrade_bcbio(kwargs["args"])
  File "/home/zhenghank/zhenghank/bin/bcbio/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 96, in upgrade_bcbio
    upgrade_bcbio_data(args, REMOTES)
  File "/home/zhenghank/zhenghank/bin/bcbio/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 292, in upgrade_bcbio_data
    cbl_deploy.deploy(s)
  File "/NAS6/zhenghank/bin/bcbio/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 65, in deploy
    _setup_vm(options, vm_launcher, actions)
  File "/NAS6/zhenghank/bin/bcbio/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 110, in _setup_vm
    configure_instance(options, actions)
  File "/NAS6/zhenghank/bin/bcbio/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 268, in configure_instance
    setup_biodata(options)
  File "/NAS6/zhenghank/bin/bcbio/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/__init__.py", line 250, in setup_biodata
    install_proc(options["genomes"], ["ggd", "s3", "raw"])
  File "/NAS6/zhenghank/bin/bcbio/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 338, in install_data
    _prep_genomes(env, genomes, genome_indexes, ready_approaches)
  File "/NAS6/zhenghank/bin/bcbio/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 467, in _prep_genomes
    retrieve_fn(env, manager, gid, idx)
  File "/NAS6/zhenghank/bin/bcbio/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 812, in _install_with_ggd
    ggd.install_recipe(env.cwd, recipe_file, gid)
  File "/NAS6/zhenghank/bin/bcbio/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 23, in install_recipe
    recipe = _read_recipe(recipe_file)
  File "/NAS6/zhenghank/bin/bcbio/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 86, in _read_recipe
    recipe = yaml.safe_load(in_handle)
  File "/home/zhenghank/zhenghank/bin/bcbio/anaconda/lib/python2.7/site-packages/yaml/__init__.py", line 93, in safe_load
    return load(stream, SafeLoader)
  File "/home/zhenghank/zhenghank/bin/bcbio/anaconda/lib/python2.7/site-packages/yaml/__init__.py", line 71, in load
    return loader.get_single_data()
  File "/home/zhenghank/zhenghank/bin/bcbio/anaconda/lib/python2.7/site-packages/yaml/constructor.py", line 37, in get_single_data
    node = self.get_single_node()
  File "/home/zhenghank/zhenghank/bin/bcbio/anaconda/lib/python2.7/site-packages/yaml/composer.py", line 36, in get_single_node
    document = self.compose_document()
  File "/home/zhenghank/zhenghank/bin/bcbio/anaconda/lib/python2.7/site-packages/yaml/composer.py", line 55, in compose_document
    node = self.compose_node(None, None)
  File "/home/zhenghank/zhenghank/bin/bcbio/anaconda/lib/python2.7/site-packages/yaml/composer.py", line 84, in compose_node
    node = self.compose_mapping_node(anchor)
  File "/home/zhenghank/zhenghank/bin/bcbio/anaconda/lib/python2.7/site-packages/yaml/composer.py", line 133, in compose_mapping_node
    item_value = self.compose_node(node, item_key)
  File "/home/zhenghank/zhenghank/bin/bcbio/anaconda/lib/python2.7/site-packages/yaml/composer.py", line 84, in compose_node
    node = self.compose_mapping_node(anchor)
  File "/home/zhenghank/zhenghank/bin/bcbio/anaconda/lib/python2.7/site-packages/yaml/composer.py", line 133, in compose_mapping_node
    item_value = self.compose_node(node, item_key)
  File "/home/zhenghank/zhenghank/bin/bcbio/anaconda/lib/python2.7/site-packages/yaml/composer.py", line 84, in compose_node
    node = self.compose_mapping_node(anchor)
  File "/home/zhenghank/zhenghank/bin/bcbio/anaconda/lib/python2.7/site-packages/yaml/composer.py", line 133, in compose_mapping_node
    item_value = self.compose_node(node, item_key)
  File "/home/zhenghank/zhenghank/bin/bcbio/anaconda/lib/python2.7/site-packages/yaml/composer.py", line 82, in compose_node
    node = self.compose_sequence_node(anchor)
  File "/home/zhenghank/zhenghank/bin/bcbio/anaconda/lib/python2.7/site-packages/yaml/composer.py", line 110, in compose_sequence_node
    while not self.check_event(SequenceEndEvent):
  File "/home/zhenghank/zhenghank/bin/bcbio/anaconda/lib/python2.7/site-packages/yaml/parser.py", line 98, in check_event
    self.current_event = self.state()
  File "/home/zhenghank/zhenghank/bin/bcbio/anaconda/lib/python2.7/site-packages/yaml/parser.py", line 382, in parse_block_sequence_entry
    if self.check_token(BlockEntryToken):
  File "/home/zhenghank/zhenghank/bin/bcbio/anaconda/lib/python2.7/site-packages/yaml/scanner.py", line 115, in check_token
    while self.need_more_tokens():
  File "/home/zhenghank/zhenghank/bin/bcbio/anaconda/lib/python2.7/site-packages/yaml/scanner.py", line 149, in need_more_tokens
    self.stale_possible_simple_keys()
  File "/home/zhenghank/zhenghank/bin/bcbio/anaconda/lib/python2.7/site-packages/yaml/scanner.py", line 289, in stale_possible_simple_keys
    "could not find expected ':'", self.get_mark())
yaml.scanner.ScannerError: while scanning a simple key
  in "/NAS6/zhenghank/bin/bcbio/tmpbcbio-install/cloudbiolinux/ggd-recipes/TAIR10/mirbase.yaml", line 14, column 1
could not find expected ':'
  in "/NAS6/zhenghank/bin/bcbio/tmpbcbio-install/cloudbiolinux/ggd-recipes/TAIR10/mirbase.yaml", line 24, column 20




Brad Chapman於 2017年7月13日星期四 UTC+8下午3時31分22秒寫道:
update_error_log
update_log

Brad Chapman

未读,
2017年7月13日 17:03:052017/7/13
收件人 翰欽鄭、biovalidation、hank...@gmail.com

Apologies about the continued issues. We had some git merge/rebase issues that
removed a few older fixes you ran into. If you do:

rm -rf /NAS6/zhenghank/bin/bcbio/tmpbcbio-install

and re-run hopefully it will work cleanly for you now. Thanks for the reports
and the patience testing this,
Brad


> [ text/plain ]
> Hi:
>
> After updating, the install.py works fine.
>
> However, I find a place that looks like a typo in
> ggd-recipes/TAIR10/mirbase.yaml
> <https://github.com/chapmanb/cloudbiolinux/commit/a0261e62acdb32486ffd4a4ab2e95bb1dc6e1306#diff-5abbfd77989c0716c7543c4a179b7ec4>
> .
> The line "GCF/000/001/735/GCF_000001735.3_TAIR10/" (in row 14) seems to be
> a typo. Because, the file with this line can't pass the scanner (
> /home/zhenghank/zhenghank/bin/bcbio/anaconda/lib/python2.7/site-packages/yaml/scanner.py)(The
> error messages lists below) .
>
> Otherwise, I have try to re-update bcbio after I update the install.py. But
> the error occurred.
>
> UnsatisfiableError: The following specifications were found to be in
> conflict:
> - bioconductor-deseq2
> - bioconductor-iranges 2.4.7*
> - r-tibble
> Use "conda info <package>" to see the dependencies for each package.
>
>
> Fatal error: local() encountered an error (return code 1) while executing
> '/NAS6/zhenghank/bin/bcbio/anaconda/bin/conda install --quiet -y -c
> bioconda -c conda-forge -c r age-metasv bamtools bamutil
> ............................(omitted)
>
>
> The detail messages presents in attach files.
>
>
> Please solve it, thank you.
>
> *The information of ggd-recipes/TAIR10/mirbase.yaml
> <https://github.com/chapmanb/cloudbiolinux/commit/a0261e62acdb32486ffd4a4ab2e95bb1dc6e1306#diff-5abbfd77989c0716c7543c4a179b7ec4>.*
> *Before modify*
>
> <https://lh3.googleusercontent.com/-JUDmOpvk-bo/WWcvJj5IZSI/AAAAAAAAL_8/Z-6DocjMFs8eYtKBakYgsvlv-xY8o0c_gCLcBGAs/s1600/mirbase_yaml_beforemodify.png>
>
> *After Modify:*
>
> <https://lh3.googleusercontent.com/-5S5B9M5neto/WWcvSuPFtgI/AAAAAAAAMAA/J_RaQsRS2LUjEvnzhL578AnuzXR8fzc7wCLcBGAs/s1600/mirbase_yaml_modify.png>
>
> *The error message:*
> [ update_error_log: application/octet-stream ]
> [ update_log: application/octet-stream ]

翰欽鄭

未读,
2017年7月14日 05:48:392017/7/14
收件人 biovalidation、hank...@gmail.com

I am indebted to you for your kindly help. And the genome install of bcbio_nextgen upgrade function works fine except GRCh37.
I have test the ggd-run.sh (/mnt/nfsfile/NAS6/zhenghank/bin/bcbio/genomes/Hsapiens/GRCh37/txtmp/ggd-run.sh), and it works fine.
So, I guess that the main process can not catch the return signal of ggd-run.sh.
The detail information list below.

Command:
bcbio_nextgen.py upgrade --genomes GRCh37 > GRCh37_log 2> GRCh37_log_errorlog

Log file:
GRCh37, GRCh37_log_errorlog



I also test the function of install genome from local (bcbio_setup_genome.py). And, it failed in hisat2 index build and gff3 translation.
However, I lack the ability to search the origin of the bugs. Therefore, I can't provide the precise advice except the log files.
Please solve the problem, thank you.

The detail information lists below.

Material:
GFF3 and fasta file are downloaded from NCBI.

NCBI link:


Command:
bcbio_setup_genome.py -f sequence.fasta --gff3 -g sequence.gff3 -n Sulfolobus_islandicus -b LAL14_1 -i bowtie bowtie2 bwa novoalign star rtg snap star ucsc seq hisat2 > log_hisat2 2> log_hisat2_error
bcbio_setup_genome.py -f sequence.fasta --gff3 -g sequence.gff3 -n Sulfolobus_islandicus -b LAL14_1 -i bowtie bowtie2 bwa novoalign star rtg snap star ucsc seq > log_gff 2> log_gff_error



Attach file:
log_gff and log_gff_error are the log of the gff3-to-gtf transformation.

log_hisat2 and log_hisat2_error are the log of the hisat2 index building.



Brad Chapman於 2017年7月14日星期五 UTC+8上午5時03分05秒寫道:
log_gff_error
log_hisat2
log_hisat2_error
log_gff
GRCh37_log
GRCh37_log_errorlog
回复全部
回复作者
转发
0 个新帖子