Wow: AlphaFold .ipynb and Google Colab

222 views
Skip to first unread message

Edward K. Ream

unread,
Sep 13, 2021, 10:16:22 AM9/13/21
to leo-editor
AlphaFold (See this Nature article) solves one of the grand challenges in computational science.  Full sources for AlphaFold are available here.

The AlphaFold sources contain  AlphaFold.ipynb, a Jupyter notebook.The FAQ in the notebook contains a reference to Google Colab. OMG!  Take a look at the Colab FAQ!

"Colab allows anybody to write and execute arbitrary python code through the browser, and is especially well suited to machine learning, data analysis and education. More technically, Colab is a hosted Jupyter notebook service that requires no setup to use, while providing free access to computing resources including GPUs."

!!!!!

Edward

Alexey Tikhonov

unread,
Sep 14, 2021, 7:02:59 AM9/14/21
to leo-editor
Wow! Thanks for sharing!

понедельник, 13 сентября 2021 г. в 21:16:22 UTC+7, Edward K. Ream:

Edward K. Ream

unread,
Sep 14, 2021, 9:07:34 AM9/14/21
to leo-editor
On Tue, Sep 14, 2021 at 6:03 AM Alexey Tikhonov <tickli...@gmail.com> wrote:
Wow! Thanks for sharing!

You're welcome. I've converted the Alpha sources (almost all python) to a study outline. I'll make it available once I finish editing it.

Edward

Edward K. Ream

unread,
Sep 14, 2021, 11:20:51 AM9/14/21
to leo-editor
On Tuesday, September 14, 2021 at 8:07:34 AM UTC-5 Edward K. Ream wrote:

> I've converted the Alpha sources (almost all python) to a study outline. I'll make it available once I finish editing it.

Still true, but the "raw" python code is pretty much impenetrable. Heh, I had to reread the Nature article to verify that yes, neural networks  are involved. The article says so right at the beginning, but I have not yet discovered where the neural networks make their appearance in the code!

In effect, the supplementary data for the article contains the code's theory of operation:

Not sure if this .pdf file is freely available. I can send it to anyone who is interested.

The supplementary .pdf is not easy reading. Probably one needs multiple PhD's to understand it.

Edward

Edward K. Ream

unread,
Sep 14, 2021, 11:25:37 AM9/14/21
to leo-editor
On Tuesday, September 14, 2021 at 10:20:51 AM UTC-5 Edward K. Ream wrote:

The supplementary .pdf is not easy reading. Probably one needs multiple PhD's to understand it.

Here are the supplementary videos that we mortals can enjoy.

Supplementary Video 1

Video of the intermediate structure trajectory of the CASP14 target T1024 (LmrP) A two-domain target (408 residues). Both domains are folded early, while their packing is adjusted for a longer time.

Supplementary Video 2

Video of the intermediate structure trajectory of the CASP14 target T1044 (RNA polymerase of crAss-like phage). A large protein (2180 residues), with multiple domains. Some domains are folded quickly, while others take a considerable amount of time to fold.

Supplementary Video 3

Video of the intermediate structure trajectory of the CASP14 target T1064 (Orf8). A very difficult single-domain target (106 residues) that takes the entire depth of the network to fold.

Supplementary Video 4

Video of the intermediate structure trajectory of the CASP14 target T1091. A multi-domain target (863 residues). Individual domains’ structure is determined early, while the domain packing evolves throughout the network. The network is exploring unphysical configurations throughout the process, resulting in long ‘strings’ in the visualization.

Edward

jkn

unread,
Sep 14, 2021, 2:05:03 PM9/14/21
to leo-editor
The link to the PDF worked for me, thanks

    J^n

Edward K. Ream

unread,
Sep 14, 2021, 4:21:02 PM9/14/21
to leo-editor
On Tue, Sep 14, 2021 at 1:05 PM jkn <jkn...@nicorp.f9.co.uk> wrote:

The link to the PDF worked for me, thanks

You're welcome :-) Good luck chasing the white rabbit...

Edward

David Szent-Györgyi

unread,
Sep 14, 2021, 10:26:47 PM9/14/21
to leo-editor
My day job is technical support for basic research in life science, and abuts drug discovery, which is targeted work. Below is a note that I sent to my colleagues at the end of 2020, when last winter's COVID outbreaks were at a terrible high. 

--- note begins ---
This is really interesting work in life science and work in drug discovery, driven by computation:

From <https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology>:
. . . In the meantime, we’re also looking into how protein structure predictions could contribute to our understanding of specific diseases with a small number of specialist groups, for example by helping to identify proteins that have malfunctioned and to reason about how they interact. These insights could enable more precise work on drug development, complementing existing experimental methods to find promising treatments faster.

We’ve also seen signs that protein structure prediction could be useful in future pandemic response efforts, as one of many tools developed by the scientific community. Earlier this year, we predicted several protein structures of the SARS-CoV-2 virus, including ORF3a, whose structures were previously unknown. At CASP14, we predicted the structure of another coronavirus protein, ORF8. Impressively quick work by experimentalists has now confirmed the structures of both ORF3a and ORF8. Despite their challenging nature and having very few related sequences, we achieved a high degree of accuracy on both of our predictions when compared to their experimentally determined structures.


An associated video:
<https://www.youtube.com/watch?v=W7wJDJ56c88>

--- note ends ---

David Szent-Györgyi

unread,
Sep 14, 2021, 11:12:32 PM9/14/21
to leo-editor
AlphaFold is an extraordinary advance in the speed of the development of knowledge. Compare the rapidity of the understanding of the structure of COV-2 with the decades of labor required in the 1950s and 1960s for determination of the structure of myosin, the protein that is the largest constituent of skeletal muscle, as described in this article on the investigation of the molecular motor of muscle

(No, I'm not a biochemist; my parents were - they spent their working lives on the regulation of muscle contraction). 

Edward K. Ream

unread,
Sep 15, 2021, 7:41:02 AM9/15/21
to leo-editor
On Tue, Sep 14, 2021 at 10:12 PM David Szent-Györgyi <das...@gmail.com> wrote:
AlphaFold is an extraordinary advance in the speed of the development of knowledge. Compare the rapidity of the understanding of the structure of COV-2 with the decades of labor required in the 1950s and 1960s for determination of the structure of myosin, the protein that is the largest constituent of skeletal muscle, as described in this article on the investigation of the molecular motor of muscle

Great to know your scientific pedigree!

Edward

Edward K. Ream

unread,
Sep 15, 2021, 7:41:39 AM9/15/21
to leo-editor
On Tue, Sep 14, 2021 at 9:26 PM David Szent-Györgyi <das...@gmail.com> wrote:
My day job is technical support for basic research in life science, and abuts drug discovery, which is targeted work. Below is a note that I sent to my colleagues at the end of 2020, when last winter's COVID outbreaks were at a terrible high. 

Thanks for the personal note!

Edward

Edward K. Ream

unread,
Sep 15, 2021, 7:44:42 AM9/15/21
to leo-editor
Namely the Nobelist Albert Szent-Györgyi.

Edward

David Szent-Györgyi

unread,
Sep 16, 2021, 6:36:24 PM9/16/21
to leo-editor
On Wednesday, September 15, 2021 at 7:44:42 AM UTC-4 Edward K. Ream wrote:
Namely the Nobelist Albert Szent-Györgyi.

The National Library of Medicine of the National Institutes of Health has a section of its Web site devoted to him.

While I did not know Albert adult-to-adult, I grew up well aware of his work and the regard in which other scientists held him. My mother and father started their careers in basic research in his lab, and we all lived in the same town on Cape Cod, so his family and mine spent a lot of time together. 

His English was first-rate, an accomplishment since English and Hungarian are not related, and learning either is difficult for the person whose language is the other. I was fortunate to hear him give a lecture aimed at the generalist. 

Albert is worthy of note because receiving the 1937 Nobel in Physiology or Medicine did not lead him to cease conducting pioneering research. The above-linked article on the investigation of the molecular motor of muscle mentions groundbreaking work that his lab did in isolation during World War II. That work continued after the war, and moved with him to the United States. The loss of a wife and daughter to cancer led him to shift his focus to that disease when he was in his late seventies; he continued that work until shortly before his death at 93. 

His birth and upbringing in Habsburg Hungary, a country that had yet to leave behind prescientific structures and roots, along with his interest in basic research spurred in him an interest the lag in society's progression from a prescientific to a scientific basis. That, combined with his experience as a public figure during a turbulent period in Hungary, meant that he was privately and publicly involved in political matters. The enmity of the German Nazis and their Hungarian allies drove him into hiding. Later on, risk to his life earned through his opposition to the coming Communist order in Hungary led him to move to the United States. There, he continued to involve himself in politics - he was one of the eminent scientists who contacted the Kennedy Administration to educate it on the nature of nuclear weapons and the threat that they posed.

When he reached seventy, when most retire, he wrote an essay that drew on all that, "Lost in the Twentieth Century", which opens the 1963 Annals of Biochemistry. In 1971, in his late seventies, he wrote another essay, "Looking Back", which is primarily about science but places it in context of life and conditions in the US. Anyone who share's Edward's broad perspective on science, technology, and society should look them up. 

Edward K. Ream

unread,
Sep 17, 2021, 9:52:54 AM9/17/21
to leo-editor
On Thu, Sep 16, 2021 at 5:36 PM David Szent-Györgyi <das...@gmail.com> wrote:

The National Library of Medicine of the National Institutes of Health has a section of its Web site devoted to him.
...
he wrote an essay... "Lost in the Twentieth Century"

"We are living in the transition from prescientific to the scientific thinking, hence the 'tumult'.

I couldn't agree more. The "demon haunted world" does not go quietly.

"I also became interested in vegetable respiration, being convinced that there is no basic difference between man and the grass he mows".

He has a way with words :-) The rest of the paper is poignant. It's hard to believe that the 20th century existed as it did. The history of those times is more scary than any horror movie.
 
he wrote another essay, "Looking Back"

A lovely essay.  The description of what it takes to write well may inspire anyone to work more and worry less about so-called "talent".

Many thanks for sharing these links.

Edward
Reply all
Reply to author
Forward
0 new messages