Many years ago (two?), I had a biology teacher who told me that we
didn't know the synthesis pathways for neurotransmitters. I thought
that this was rather unfortunate because of how useful they are. Many
medical health issues could be solved with neurotransmitters, agonists
and antagonists, or even some other interesting biotechnologies could
be developed. So, the other day I spent a few hours and found that we
do, in fact, know the synthesis pathway of, say, dopamine.
So, I pulled the pathway out of the human genome via looking at
reactome.org, then I looked at what ecoli has by looking at KEGG, and
then comparing what the human genome has that ecoli doesn't have, and
it looks like there's only three main genes that ecoli would need in
order to synthesize dopamine- not counting promoters. So, here's my
notes and some DNA sequences of the relevant genes:
http://heybryan.org/~bbishop/docs/dopamine/synthesis_of_dopamine.txt
There's a problem with it though: one of the genes is 100k+ bp, which
isn't really reasonable if this was to be sent to the online DNA
synthesis services (which only do about 3k bp per strand at the
moment). Maybe there's an equivalent enzyme to tyrosinase that isn't
so much of a huge beast? Maybe from another organism?
Anyway, this could go somewhere interesting. Doing the genetic
dependency checks by hand kind of sucked though- that's something
software should be doing (to compare the genomes between organisms-
and then copying anything that the goal product is dependent on, in
order to be created, and importing that into the target genome- which
was essentially what I was doing, manually).
Is that 100kb with introns or without introns?
-Cory
That's with introns. I think the way to get rid of the introns would
be to find the protein/amino acid denaturation of tyrosinase, and then
figure out what the DNA sequence should be. Right?
Yes, absolutely. Basically what I wanted to do is come up with some
software that would compare two lists of genes from two genomes. So we
can do this by ripping the data from NCBI's servers, or the KEGG
database, and I wasn't able to find a KEGG API or some "friendly way
to download all of the data". KEGG has what reactome.org has, except
with codified IDs for each of the proteins/enzymes that interact and
which genes produce them. So that's what we should be looking at.
http://www.genome.jp/kegg/soap/
http://www.genome.jp/kegg/download/ftp.html
http://www.genome.jp/kegg/download/kegtools.html
http://www.genome.jp/kegg/xml/
-Cory
This should do it:
http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=NM_000372.4
Please double check.
I'm not sure-- but I think that they do not, or in other words, it's a
drop-in package, but I have no testing to confirm. I was looking over
that data the other day. Here's where to look:
http://reactome.org/cgi-bin/eventbrowser?DB=gk_current&FOCUS_SPECIES=Mus%20musculus&ID=379391&
More importantly:
R00031,R00731,R01815 C00082 (l-tyrosine) -> 2 C00355
R02080 C00355 -> C03758
http://www.genome.jp/dbget-bin/www_bget?md+M00075
Yeah, the coding sequence is only ~2kb. That's easily PCR-able. If
you had some human cells in large enough quantity that express this
mRNA then making that sequence, minus the introns, is a piece of cake
with RT-PCR (reverse transcription PCR, not real time PCR).
Otherwise you could order the cDNA clone and PCR off of that (most
human cDNAs are available for purchase). It would be much cheaper
than synthesizing it (~$90 I think).
http://openwetware.org/wiki/Finding_cDNA_clones
-Cory
So I'm trying to make sense of the XML metabolic pathway data. Take a
look at this, from the pathways dataset.
<pathway name="path:ko00720" org="ko" number="00720"
title="Reductive carboxylate cycle (CO2 fixation)"
image="http://www.genome.jp/kegg/pathway/ko/ko00720.gif"
link="http://www.genome.jp/dbget-bin/show_pathway?ko00720">
<entry id="1" name="path:ko00720" type="map"
link="http://www.genome.jp/dbget-bin/get_linkdb?pathway+ko00720">
<graphics name="TITLE:Reductive carboxylate cycle (CO2
fixation)" fgcolor="#000000" bgcolor="#FFFFFF"
type="roundrectangle" x="309" y="67" width="499" height="25"/>
</entry>
<reaction name="rn:R00345" type="irreversible">
<substrate name="cpd:C00011"/>
<substrate name="cpd:C00074"/>
<product name="cpd:C00036"/>
</reaction>
</pathway>
The reaction element makes sense. But then how do I go about finding
R00345-- which dataset might that be in? On the FTP directory, there's
a full download available, so I'll just run wget -m -np, untar/unzip,
and then do a grep for R00345 in some file, but still, it kind of
sucks that I have to do that as the quickest way to figure this out.
ftp://ftp.genome.jp/pub/kegg/release/current/
Anyway, I'll be using XML::Simple.
http://search.cpan.org/dist/XML-Simple/lib/XML/Simple.pm
It will probably turn out that the bioperl project already has this
all written out. Oops. :-)
I searched for R00345 on the main KEGG page. That took me here
http://www.genome.ad.jp/dbget-bin/www_bget?rn+R00345
At the bottom of that page there is a link that says "All DBs" which
leads to a page that shows all the sets that have R00345 listed.
http://www.genome.ad.jp/dbget-bin/get_linkdb?reaction+R00345
I haven't poked around the XML very much. Is there a "Chemical
Reaction" set? If there is, I suppose R00345 would be in there.
-Cory
I see. The different datasets have it by filename, so that's sort of
useful, and lets me cut down on search time. Thanks. I should have
figured that out on my own. Okay, back to work.