Deliverable: a public sequence resource (new thread)

Pjotr Prins

unread,

Mar 24, 2020, 9:10:15 AM3/24/20

to virtual biohackathon COVID-19 2020

# Deliverable: a public sequence resource

One recurring idea is to create an uploader where raw data from a
sequencer (long reads and short reads) is loaded onto a backend and
mapped using traditional tools as well as the variation
graph/pangenome tools. Next a visualization is generated of the viral
strain in comparison with data we already have in the database.
Furthermore, phenotypes that we have and metadata can be presented at
the same time, to show how this viral strain relates to other strains,
geo info, clinical info, treatment info - anything that we have and
that can be linked out. Obviously the uploaded data becomes part of
the whole.

The justification of such an uploader is easy. Currently there is no
system that handles ontologies well. Currently there is no system that
allows for on-the-fly analysis of raw data.

Mind, this is a pretty large project! But if we split it into small
parts where each group owns subsections we should be able to put it
together and make a working prototype. When the full application works
we can improve after the BioHackathon and encourage data providers to
add their material. As a BioHackathon we can get a high impact paper
out of such a project though that is not the primary goal.

We can discuss subtasks here and ask for group coordinators for each
subtask to work out what needs to be done? Subtasks we identify:

1. Uploader with authentication, uploading fastq or BAM, add known (clinical) phenotypes
2. Create workflow for traditional analysis
3. Create workflow for vgtools
4. Run workflows in cloud/HPC
5. Store results in persistent storage
6. Define and query linked data (wikidata)
7. Create visualisation
8. Create output website

Does that sound reasonable? Other tasks may be

9. Deploy graph store, database, IPFS
10. Deploy cloud/HPC workflow runner
11. Deploy web interfaces

The Galaxy team already has put some things in place and we may be
able to collaborate on this. Galaxy team, wdyt?

The organisers.

On Tue, Mar 24, 2020 at 08:07:58AM -0500, Pjotr Prins wrote:
> # Choosing topics for the BioHackathon
>
> Dear all,
>
> From today we should bring up topics for the virtual BioHackathon on
> COVID-19. The current list can be found at
>
> https://github.com/virtual-biohackathons/covid-19-bh20/wiki
>
> Feel free to create a topic. If you want to join an existing one: add
> your name to the relevant wiki page. Perhaps add a description if that
> is missing.
>
> It is important that
>
> 1. Topics are centered around deliverable(s). A deliverable does not
> have to be a completed product - but it should be stated what a group
> wants to achieve.
> 2. Deliverables have to target things around COVID-19. That is a pretty
> wide description!
>
> We think it is important to deliver something that works for the
> biomedical community and humanity in general. We have access to cloud
> computing and possibly HPC. We can host databases and graph
> stores. This means we can bring something that works the coming year!
>
> We will start a separate thread for a project idea on a raw sequence data
> uploader as a public service. If someone has a similar idea (smaller or larger)
> please create a new thread on this mailing list. We have seen some interesting
> ideas already on data sharing, metadata, policies, text mining etc. Start
> splitting them into tasks. The discussion here is not to dictate what we need
> to do, but to come up with ideas and see if we can realise them!
>
> The organizers
>
> --
> You received this message because you are subscribed to the Google Groups "virtual biohackathon COVID-19 2020" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to virtual-biohacka...@googlegroups.com.
> To view this discussion on the web, visit https://groups.google.com/d/msgid/virtual-biohackathon/20200324130758.tahftcw6wst7lubv%40thebird.nl.
>

Pjotr Prins

unread,

Mar 24, 2020, 10:50:06 AM3/24/20

to Pjotr Prins, virtual biohackathon COVID-19 2020

I have hosted the description at

https://github.com/virtual-biohackathons/covid-19-bh20/wiki/PublicSequenceResource

The following tasks need a coordinator from one of the groups:

1. Uploader with authentication, uploading fastq or BAM, add known (clinical) phenotypes

Requires a Python/Javascript champion!

6. Define and query linked data (wikidata)

Requires a linked data champion!

7. Create visualisation

Requires a Javascript champion!

8. Create output website

I guess this is tied in with 1.

Ping me if you want to help out with coordination. We also need an ontology
champion to get the metadata going.

> To view this discussion on the web, visit https://groups.google.com/d/msgid/virtual-biohackathon/20200324131014.ovsybbhfbotscvpi%40thebird.nl.
>

José María Fernández

unread,

Mar 24, 2020, 1:24:21 PM3/24/20

to Pjotr Prins, virtual biohackathon COVID-19 2020

Hi, Pjotr,
I have just realized today about Phenopackets standard (http://phenopackets.org/), and it could be used in the bundle of all the metadata a contributor is willing (and allowed) to share through the uploader.

Best,
José María

--

"There is no reason why anybody would want a computer in their home" -
	Ken Olson, founder of DEC 1977
"640K ought to be enough for anybody" - Bill Gates, 1981 
"Nobody will ever outgrow a 20Mb hard drive." - ???

"Premature optimization is the root of all evil." - Donald Knuth
"Los ordenadores son inútiles. Sólo pueden darte respuestas" - Pablo Ruíz Picasso

José María Fernández González
Senior Research Scientist
e-mail: jose.m.f...@bsc.es
INB Node, Life Sciences Department
Torre Girona Building, 1st floor, Barcelona Supercomputing Center
C/. Jordi Girona, 31
Zip Code: 08034				Barcelona (Spain)
Phone: (+34) 934117074

Björn Grüning

unread,

Mar 27, 2020, 4:15:29 PM3/27/20

to virtual biohackathon COVID-19 2020

Hi Pjotr,

> The Galaxy team already has put some things in place and we may be
able to collaborate on this. Galaxy team, wdyt?

Sure, anything you need? We do have automatic updated sequence retrivals and collect accession number for public datasets which we then process.

Anything, else that you need?

You find more information here: https://covid19.galaxyproject.org

Ciao,

Bjoern

> To unsubscribe from this group and stop receiving emails from it, send an email to virtual-biohackathon+unsub...@googlegroups.com.

Reply all

Reply to author

Forward