Helping with COVID-19 Sequencing Biohackathon

180 views
Skip to first unread message

v

unread,
Mar 26, 2020, 6:20:57 PM3/26/20
to virtual biohackathon COVID-19 2020
Hey Everyone!

Pjotr and Roman looped me in to the discussion on https://github.com/virtual-biohackathons/covid-19-bh20/wiki/PublicSequenceResource, and since containers, web tools and APIs, and general scientific programming is near and dear to me, I'd like to offer to help.

I actually developed a Django application (freegenes) at the end of last year that serves genomic data, and while it would need some tuning up, I think it could be a solid starting base for this web interface that you have in mind. I strongly advocate for development in Python (Django is a python framework) because it's hugely important that others in the scientific community can contribute. There isn't a good method to put comments on GitHub wikis, so I'd like to discuss some points here:

Galaxy: I don't use it, but doesn't Galaxy provide an interface that might be a good start? Is there a reason to roll a new tool (what doesn't galaxy do?)

Sequencer: "One recurring idea is to create an uploader where raw data from a sequencer (long reads and short reads) is loaded onto a backend and mapped using traditional tools as well as the variation graph/pangenome tools."
Is this part of the job of the interface? I would want to ask if the sequencers in question have API endpoints that would allow for this?

Visualizer: Next a visualization is generated of the viral strain in comparison with data we already have in the database.
Is this a database deployed by this web interface, or some other one?

The way I see this work is a bit different, here is how I'd think about the design:

 1. The sequencer needs an API to push / notify of new data to parse
 2. Some kind of trigger from the sequencer needs to ping an endpoint to start running a job (e.g., could these workflows be handled with snakemake?) For example, for Singularity Hub, a commit to GitHub (the trigger) pings the server to launch a container build (a separate instance) that then sends the container image to storage, and notifies Singularity Hub. We'd want something similar to this, but with a trigger coming from an authenticated sequencer, and instead of a container builder, a launch of some cloud pipeline (snakemake, nextflow, take your pick!)
 3. The job, on success, puts some result files in object storage, and pings this interface to add an entry for new data

The interface serving the data should do little except receive the final metadata, and serve an API that redirects to storage URLs for download (preferably with signed URLs unless you want some malicious user to be able to charge you up the wazoo). The workflows should be modular (this really comes down to being packaged in containers) and deployed with an orchestration tool that matches the sequencer API (e.g., if we use Python, we want snakemake). The metadata served by the interface should be structured (ontology this is where you come in).

Let me know your thoughts! I'm really hopeful that I can help with this - I can whip out these Django interfaces very quickly, and I've done quite a bit of development work for snakemake, and have used Google Cloud a ton too.

Best,

Vanessa

Björn Grüning

unread,
Mar 27, 2020, 4:12:13 PM3/27/20
to virtual biohackathon COVID-19 2020
Hi Vanessa,

 > Galaxy: I don't use it, but doesn't Galaxy provide an interface that might be a good start? Is there a reason to roll a new tool (what doesn't galaxy do?)

If you get an answer please let us know, we would like to improve Galaxy.
We are also planning to add a third section about evolutionary analysis in the next coming days. So please check out https://covid19.galaxyproject.org regularly.
If you have any questions let us know.

Cheers,
Bjoern

Scott Cain

unread,
Apr 5, 2020, 11:01:39 PM4/5/20
to virtual biohackathon COVID-19 2020
With regard to visualization, I created the browser at http://covid19.jbrowse.org/ which is also available as a docker container at https://hub.docker.com/r/gmod/sars-cov-2-jbrowse and in github at https://hub.docker.com/r/gmod/sars-cov-2-jbrowse.  I would be very interested in helping getting any sequence features displayed on this browser.

Also, note that people in my group also created a pastebin for SARS-CoV-2 sequences at http://covbrowser.org/ which also might be of interest to people in this group.  Just let me know what I can do!

Scott

Alex Gener

unread,
Apr 6, 2020, 1:58:27 AM4/6/20
to virtual biohackathon COVID-19 2020
Regarding "Sequencer," I don't think that'd be great. Example, there might be a bug between moving data and the sequencing run, which might cause the run to crash. The way around that now is to finish a run, get data, and then do stuff with the data.

Bonface Munyoki

unread,
Apr 6, 2020, 2:51:12 PM4/6/20
to virtual biohackathon COVID-19 2020
Hi Scott, check out: https://github.com/graph-genome/Schematize/issues/7. It would be nice if you helped out with the pangenome browser. Check out #pangenome_browser on the slack channel to get started if you are interested.
Reply all
Reply to author
Forward
0 new messages