Re:

7 views
Skip to first unread message

Scott Cain

unread,
Feb 8, 2013, 12:11:48 PM2/8/13
to Block,Andrew, he...@gmod.org, GMOD Schema/Chado List
Hi Andrew,

Since this email is mostly related to databases and Chado, I'm going
to cc it to the schema mailing list. Future responses can trim the
help email address off the cc list. I'm going to do my best to answer
in line below.

Scott


On Thu, Feb 7, 2013 at 1:24 PM, Block,Andrew <Andrew...@colostate.edu> wrote:
> Hello GMOD,
>
> My lab is starting to create a database for out gene of interest, since I am
> new to databases, I looked for models of great databases leading me to GMOD.
> We study RNA dependent RNA polymerases (RdRp) and have collected many
> different mutants for different viruses species. The goal of the database
> is to analysis structure, sequences, & molecular data and keep track of
> storage through a webpage. I have created a database --in postgreSQL--
> using Chado's general, control vocabularies and stock modules. I have also
> created a simplified version of the sequence module with just feature,
> location and relationship. I have several questions on the next steps for
> my database.
>
> The RdRp have several motifs (like introns and exons) that we like to have
> new sequences automatically align to. Would Apollo or any of the other
> software offered be able to do this? Can I use the simplified database or
> would I have to use the full Chado database? What would be the easiest or
> simplified way I could go about setting this up?

I'm not sure what you mean by "automatically align to." When you're
talking about doing anything automatically, I would think your talking
about some sort of pipeline to do the analysis and then parse the
results in to some usable format (like GFF3). You might want to look
at MAKER for that, though since you're working with viral sequences,
it may not be appropriate. Once you have GFF3, you can load it into
Chado using either Tripal (if you're using that) or the GFF bulk
loader that comes with Chado, gmod_bulk_load_gff3.pl.

If it's not really automatic, meaning you need a human to look at the
sequence and assign features, then yes, Apollo can do it. A new
editor, WebApollo, was just released; it is focused on protein coding
genes in it's initial release, but that seems like it would work find
for you; setting it up is still a bit of a chore, but is certainly
doable (I set up a simple WebApollo instance in an afternoon last
week--but that was without it being connected to Chado). The
Apollo/WebApollo mailing list is apo...@lists.lbl.gov.

Also, when you're talking about alignments (and thus computed
results), in Chado you need to store them in the companalysis module.
While I applaud your efforts to make something "simplified" from
Chado, it may make more sense to just use Chado if you want to use
other GMOD tools with it. Having several empty tables won't hurt
anything.

Finally, I am working on a new instance of GMOD in the Cloud, which is
several GMOD components (Chado, GBrowse, JBrowse, Tripal, and
WebApollo) installed on an Amazon Web Services AMI. While this
requires some money to run, it saves much of the hassle of setting up
and maintaining software (and makes it so you don't have to buy new
server hardware). The current version is 1.3, but that doesn't have
Apollo or WebApollo on it. People frequently worry about costs
getting out of control with AWS, but I can tell you that WormBase, a
large and heavily used model organism database, uses AWS and it costs
them around $2000/year.

>
> We are looking at single nucleotide polymorphism (snps). Can GBrowser or
> any of the other visualization program display snps? We would like to find
> the same snp across many different species of viruses. Does any of your
> software packages have the ability to do this or do I need to custom code
> this?

Anything that can be localized on a reference sequence with
coordinates can be displayed by either GBrowse or JBrowse. Lots of
people use them for displaying SNPs. GMOD doesn't really do
"analysis", except to wrap analysis programs written by other groups
into pipelines, like MAKER does, so no, there isn't a tool provided by
GMOD to do that, but I wouldn't be surprised if something like that
didn't already exist. Once you have the results relating SNPs, you
could store that in Chado, and we can talk about how best to do that
in another email. It would almost certainly require custom code to
load the data into the database.

>
> The lab is also in the structure of each RdRp. I have looked through your
> list of websites using GMOD and could not find any structural databases, so
> I do not know how familiar you are with a database like this. The goal
> would be to create a database similar to the RCSB Protein Data Bank, but
> that can link to the sequence database. I would like to go from the
> GBrowser to the structure side with one link. Would this be possible?

Yes, GBrowse is extremely flexible in how it's configured, and having
it generate links from features to anywhere else on the web is fairly
straight forward (take a look a the GBrowse tutorial for
this--generating simple links is one of the first sections, and
generating links with arbitrary complexity can be created with perl
callbacks in the GBrowse configuration file. More questions on this
can be directed to the GBrowse mailing list,
gmod-g...@lists.sourceforge.net.

>
> I would use Tripal and Chado::AutoDBI to help create the website. Is there
> any other software you would recomend using in our database?
>

I would stick with Tripal and avoid Chado::AutoDBI unless you really
need it for something. While I still generate the classes for
Chado::AutoDBI, it is rather archaic technology and I don't think many
people use it anymore. Tripal is extremely flexible and likely to do
everything you need it to do (and the Tripal developers are quite
clever--if you can't figure out how to do something, ask them and
you're likely to get a good answer).

>
> Thank you,
>
> Andrew Block
>
> Research Associate
> -------------------------------------------------------
> Colorado State University
> Dept. of Biochemistry & Molecular Biology
> Molecular and Radiological Biosciences 139
> 1870 Campus Delivery
> Colorado State University
> Ft. Collins, CO 80523-1870
> -------------------------------------------------------
> (970) 491-0271 Lab
> Andrew...@colostate.edu
> -------------------------------------------------------
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator (http://gmod.org/) 216-392-3087
Ontario Institute for Cancer Research

Suzanna Lewis

unread,
Feb 8, 2013, 12:46:39 PM2/8/13
to Scott Cain, Block,Andrew, GMOD Schema/Chado List, he...@gmod.org
Just a brief update. I believe that Carson has made some new additions that have Maker automatically set up/launch Apollo at the end of it's run in JBrowse mode (i.e. read/view only). Then you just have to set up permissions to enable editing. Carson could give more details though better than I can.

-S
> ------------------------------------------------------------------------------
> Free Next-Gen Firewall Hardware Offer
> Buy your Sophos next-gen firewall before the end March 2013
> and get the hardware for free! Learn more.
> http://p.sf.net/sfu/sophos-d2d-feb
> _______________________________________________
> Gmod-schema mailing list
> Gmod-...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gmod-schema

Reply all
Reply to author
Forward
0 new messages