# Deliverable: a public sequence resource
One recurring idea is to create an uploader where raw data from a
sequencer (long reads and short reads) is loaded onto a backend and
mapped using traditional tools as well as the variation
graph/pangenome tools. Next a visualization is generated of the viral
strain in comparison with data we already have in the database.
Furthermore, phenotypes that we have and metadata can be presented at
the same time, to show how this viral strain relates to other strains,
geo info, clinical info, treatment info - anything that we have and
that can be linked out. Obviously the uploaded data becomes part of
the whole.
The justification of such an uploader is easy. Currently there is no
system that handles ontologies well. Currently there is no system that
allows for on-the-fly analysis of raw data.
Mind, this is a pretty large project! But if we split it into small
parts where each group owns subsections we should be able to put it
together and make a working prototype. When the full application works
we can improve after the BioHackathon and encourage data providers to
add their material. As a BioHackathon we can get a high impact paper
out of such a project though that is not the primary goal.
We can discuss subtasks here and ask for group coordinators for each
subtask to work out what needs to be done? Subtasks we identify:
1. Uploader with authentication, uploading fastq or BAM, add known (clinical) phenotypes
2. Create workflow for traditional analysis
3. Create workflow for vgtools
4. Run workflows in cloud/HPC
5. Store results in persistent storage
6. Define and query linked data (wikidata)
7. Create visualisation
8. Create output website
Does that sound reasonable? Other tasks may be
9. Deploy graph store, database, IPFS
10. Deploy cloud/HPC workflow runner
11. Deploy web interfaces
The Galaxy team already has put some things in place and we may be
able to collaborate on this. Galaxy team, wdyt?
The organisers.
On Tue, Mar 24, 2020 at 08:07:58AM -0500, Pjotr Prins wrote:
> # Choosing topics for the BioHackathon
>
> Dear all,
>
> From today we should bring up topics for the virtual BioHackathon on
> COVID-19. The current list can be found at
>
>
https://github.com/virtual-biohackathons/covid-19-bh20/wiki
>
> Feel free to create a topic. If you want to join an existing one: add
> your name to the relevant wiki page. Perhaps add a description if that
> is missing.
>
> It is important that
>
> 1. Topics are centered around deliverable(s). A deliverable does not
> have to be a completed product - but it should be stated what a group
> wants to achieve.
> 2. Deliverables have to target things around COVID-19. That is a pretty
> wide description!
>
> We think it is important to deliver something that works for the
> biomedical community and humanity in general. We have access to cloud
> computing and possibly HPC. We can host databases and graph
> stores. This means we can bring something that works the coming year!
>
> We will start a separate thread for a project idea on a raw sequence data
> uploader as a public service. If someone has a similar idea (smaller or larger)
> please create a new thread on this mailing list. We have seen some interesting
> ideas already on data sharing, metadata, policies, text mining etc. Start
> splitting them into tasks. The discussion here is not to dictate what we need
> to do, but to come up with ideas and see if we can realise them!
>
> The organizers
>
> --
> You received this message because you are subscribed to the Google Groups "virtual biohackathon COVID-19 2020" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
virtual-biohacka...@googlegroups.com.
> To view this discussion on the web, visit
https://groups.google.com/d/msgid/virtual-biohackathon/20200324130758.tahftcw6wst7lubv%40thebird.nl.
>