Re: Biopieces Genomes

Martin Asser Hansen

unread,

Jan 21, 2014, 3:15:07 PM1/21/14

to Gisle Vestergaard, Daniel Fernandez, biop...@googlegroups.com

list_genomes requires that some genomes have been run through format_genome:

https://code.google.com/p/biopieces/wiki/format_genome

I am afraid, I haven't used format_genome myself for years, and I do hope it is not broken. If you have problems, let me know.

CC Biopieces mailing list

Cheers,

Martin

On Tue, Jan 21, 2014 at 9:03 PM, Gisle Vestergaard <gisleves...@gmail.com> wrote:

Den 21-01-2014 20:53, Daniel Fernandez skrev:

Hey guys, any chance you could give us some advice here, let us know. No pressure but having an answer would help us move our project forward.

Thanks!

On Sun, Jan 19, 2014 at 3:29 PM, Daniel Fernandez <dfe...@gmail.com> wrote:

Hi,

I managed to install the biopieces software using the .sh installer (biopieces_installer-0.51.sh). Awesome and fast! (once I got all dependencies installed).

However, if I do:

[tin.broadinstitute.org]biopieces_installer> list_genomes

ERROR!

Program 'list_genomes' failed: Could not open dir "/home/unix/dfernand/biopieces/BP_DATA/genomes": No such file or directory.

Is there a way to get the mm9, mm10, and hg19 genomes, indexed and ready for biopieces?

Let me know where and how to create the genomes.

A follow-up assuming i have the genomes.

I mainly want to use:

get_genome_phastcons

How should I use it to get the cons of a bed file (in mm9 coordinates for example)? From the documentation seems it runs region by region so I guess I'd have to write a wrapper that goes line by line in the bed file?

Let me know if you can help here.

Thanks!

--
Daniel F.

--
Daniel F.

yaximik

unread,

Feb 3, 2014, 3:15:06 PM2/3/14

to biop...@googlegroups.com, Gisle Vestergaard, Daniel Fernandez, ma...@maasha.dk

I encountered relevant problem. I downloaded and installed bowtie indices as advised in the manpage, but attempt to extract specific sequence

get_genome_seq -g hg19 -c chr1 -b554327 -e 560167

resulted in error - hg19.fna is needed

Program 'get_genome_seq' failed: Could not read-open file "/home/yaximik/BP_DATA/genomes/hg19/fasta/hg19.fna": No such file or directory.

I added link to hg19.fna, but get this error:

Program 'get_genome_seq' died->can't open /home/yaximik/BP_DATA/genomes/hg19/fasta/hg19.index:

But I already have bowtie indices installed! OK, then I did

read_2bit -i /media/FantomHD2/RefSeq/Human/hg19/hg19.2bit | format_genome -f fasta -x

However, I got

ERROR!

Program 'read_2bit' failed: 2bit file signature didn't match - inverse bit order?.

hg19.2bit was downloaded from USCS and it can be converted to hg19.fa with their twobit2fa.... So I stuck...

Martin Asser Hansen

unread,

Feb 3, 2014, 3:23:28 PM2/3/14

to biop...@googlegroups.com, Gisle Vestergaard, Daniel Fernandez

This is two different problems.

First error:

get_genome_seq works by random access to the FASTA file using an index which is created with format_genome. That is the reason for the first error. The index is missing - you need format_genome (as stated in the docs :o).

Second error:

There is a magic number build into the 2bit format that you check if your system uses big endian or little endian encoding. Somehow this goes wrong. What architecture is your computer?

Cheers,

Martin

--
You received this message because you are subscribed to the Google Groups "biopieces" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biopieces+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

yaximik

unread,

Feb 3, 2014, 3:29:19 PM2/3/14

to biop...@googlegroups.com, Gisle Vestergaard, Daniel Fernandez, ma...@maasha.dk

Linux 64 bit.

But I just have another failure I do not understand:

[yaximik@G5NNJN1 fasta]$ read_fasta -i /media/FantomHD2/RefSeq/Human/hg19/hg19.fa | format_genome -f fasta -x

ERROR!

Program 'format_genome' failed: Argument --genome is mandatory.

So it would not read stdout from read_fasta? While

read_fasta -n 100 -i /media/FantomHD2/RefSeq/Human/hg19/hg19.fa

spews out the sequence to stdout.

Martin Asser Hansen

unread,

Feb 3, 2014, 3:33:02 PM2/3/14

to biop...@googlegroups.com

Yeah, --genome switch is missing from the examples. Fixing ...

Martin

Benedikt Kirchner

unread,

Jun 23, 2015, 6:44:27 AM6/23/15

to biop...@googlegroups.com, gisleves...@gmail.com, dfe...@gmail.com, ma...@maasha.dk

Hi everyone!
I have a quick questions concerning the genome requisites for plotting karyograms. I'm trying to plot some bed files on the bovine genome but always run into the same error:

read_bed *.bed | plot_karyogram -xvg bta -o karyogram.svg

ERROR!

Program 'plot_karyogram' failed: Argument to --genome bta is not allowed.

Routine                                File                                                         Line
-------                                ----                                                         ----
                                       /home/ngs-benedikt/biopieces/code_perl/Maasha/Biopieces.pm   664
Maasha::Biopieces::check_allowed ...   /home/ngs-benedikt/biopieces/code_perl/Maasha/Biopieces.pm   355
Maasha::Biopieces::parse_options ...   /home/ngs-benedikt/biopieces/bp_bin/plot_karyogram           40

output of list_genomes is

list_genomes
Genome    blast    bowtie    fasta    phastcons    vmatch
bta    yes    yes    yes    yes    yes

Any help is much appreciated.
Benedikt

Gisle Vestergaard

unread,

Jun 23, 2015, 6:51:56 AM6/23/15

to Benedikt Kirchner, biop...@googlegroups.com, Daniel Fernandez, Martin Asser Hansen

According to https://code.google.com/p/biopieces/wiki/plot_karyogram only options hg18 and mm9 are allowed.

[-g <string> | --genome=<string>]      #  Genome layout of karyogram: hg18|mm9

Benedikt Kirchner

unread,

Jun 23, 2015, 7:16:51 AM6/23/15

to biop...@googlegroups.com, benedikt...@web.de, dfe...@gmail.com, ma...@maasha.dk

Thanks, I always thought these were just examples. Is there a specific reason why only murine and human karyograms are allowed? The seq_lengths should be easily avalaible from every genome.

Martin Asser Hansen

unread,

Jun 23, 2015, 7:20:24 AM6/23/15

to Benedikt Kirchner, biop...@googlegroups.com, Daniel Fernandez, Martin A. Hansen

The karyogram band data is lifted from the USCS Genome browser database at a time where this data was only available for a few genomes.

Martin

Benedikt Kirchner

unread,

Jun 23, 2015, 8:16:37 AM6/23/15

to biop...@googlegroups.com, ma...@maasha.dk, dfe...@gmail.com, benedikt...@web.de

Is there an easy way for me to incorporate other genomes (like cow) into this function?

Martin Asser Hansen

unread,

Jun 23, 2015, 8:54:51 AM6/23/15

to biop...@googlegroups.com, Martin A. Hansen, Daniel Fernandez, Benedikt Kirchner

I have not looked at plot_karyogram for years and I am afraid it is rather hard-coded so it will not be easy to include a new species :o(.

The code is here:

https://code.google.com/p/biopieces/source/browse/trunk/code_perl/Maasha/Plot.pm

The data is here:

https://code.google.com/p/biopieces/source/browse/#svn%2Ftrunk%2Fbp_data

Martin

On Tue, Jun 23, 2015 at 2:16 PM, Benedikt Kirchner <benedikt...@web.de> wrote:

Is there an easy way for me to incorporate other genomes (like cow) into this function?

--
You received this message because you are subscribed to the Google Groups "biopieces" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biopieces+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward