Biomarkers and website

55 views
Skip to first unread message

Pjotr Prins

unread,
Mar 26, 2020, 8:19:32 PM3/26/20
to virtual biohackathon COVID-19 2020
Hi all,

As part of this biohackathon we are going to create a universal
sequence uploader - we'll allow for both sequence and raw data. This
sequence will be compared to existing sequences in our dataset with
stunning visualisations (Josiah, right!).

We also have an opportunity to ask for clinical data that comes with
the sequence. That clinical data may serve as predictors for the
disease when we have enough information. We can think of simple things
like temperature development, heart rate, blood pressure, but also
more advanced inputs such as results of blood tests, cytokines at
certain time points etc. Obviously not everyone has that data, but by
creating an input form we can give people ideas.

I think this falls in the biomarker section and connects between
website developers one one end and metadata/onthology developers on
the other. Who here would like to coordinate such an initiative?
Vanessasaurus here is coordinating web development, who will take on
biomarkers? And I am sure we can get someone from onthologies to help
out with coordination too (Thomas? Mark?).

I note someone added Phenopackets, maybe a good point to explain what
that is here and how it can serve this effort.

Pj.

PS we also need someone to develop a suitable license for uploaded
(clinical and viral) data. Who knows about this stuff?

v

unread,
Mar 26, 2020, 8:24:10 PM3/26/20
to Pjotr Prins, virtual biohackathon COVID-19 2020
For my 0.02 - the workflows and interaction with sequencer APIs are first priority for development. Until there is a result thing (in some object storage) and a workflow that knows how to finish and ping some endpoint for metadata, it wouldn't be easy to design some web interface (plus database, API, etc). Who on this list can comment on the sequencing machines and APIs? Is this programmatic connection absolutely essential, or is there a "manual upload" approach?

--
You received this message because you are subscribed to the Google Groups "virtual biohackathon COVID-19 2020" group.
To unsubscribe from this group and stop receiving emails from it, send an email to virtual-biohacka...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/virtual-biohackathon/20200327001930.6wij7xtlr3ccjthw%40thebird.nl.

Pjotr Prins

unread,
Mar 26, 2020, 8:39:37 PM3/26/20
to v, Pjotr Prins, virtual biohackathon COVID-19 2020
On Thu, Mar 26, 2020 at 06:23:43PM -0600, v wrote:
> For my 0.02 - the workflows and interaction with sequencer APIs are
> first priority for development. Until there is a result thing (in some
> object storage) and a workflow that knows how to finish and ping some
> endpoint for metadata, it wouldn't be easy to design some web interface
> (plus database, API, etc). Who on this list can comment on the
> sequencing machines and APIs? Is this programmatic connection
> absolutely essential, or is there a "manual upload" approach?

Unless I misunderstand you, I don't think we are going to connect to
sequencing machines directly. People will make their data available by
uploading onto our system. Next we run standard pipelines (probably in
the Cloud) and we get results back.

So the web interface will have to cater for uploading data from the
virus - either fasta, fastq or BAM formats. Next we feed those files
into the pipeline(s).

Clinical data and biomarkers are the next step. To track a sample with
its metadata we can return an identifier ID after the sequence is
uploaded. Using this ID we can create/generate input forms that people
can fill in related to the sample. Once we have enough data the
machine learning/statistics people may come up with predictors.

After the pipelines complete, we'll have some data to display for that
ID. The data will be public, so the presentation will be visible to
everyone.

I don't think we can handle non-public data.

I think it is time for some of the experts to come up with ideas :)

> On Thu, Mar 26, 2020 at 6:19 PM Pjotr Prins <[1]pjot...@gmail.com>
> [2]virtual-biohacka...@googlegroups.com.
> To view this discussion on the web, visit
> [3]https://groups.google.com/d/msgid/virtual-biohackathon/2020032700
> 1930.6wij7xtlr3ccjthw%40thebird.nl.
>
> References
>
> 1. mailto:pjot...@gmail.com
> 2. mailto:virtual-biohacka...@googlegroups.com
> 3. https://groups.google.com/d/msgid/virtual-biohackathon/20200327001930....@thebird.nl

v

unread,
Mar 26, 2020, 8:53:44 PM3/26/20
to Pjotr Prins, virtual biohackathon COVID-19 2020
Gotcha! I just read too much into this first bit:

image.png

I read "loaded onto a backend" to imply that the sequencer was interacting directly with the tool. It's a *much* easier implementation to do to have an authenticated interface where a user can login with some kind of OAuth2, upload data, and then submit it to run via some workflow. Actually, freegenes already serves a similar api and I created a command line client wrapper for it. It looks like there are workflows linked, so I can definitely start hacking on this soon! (but not today, I'm pooped).

Fields, Christopher J

unread,
Mar 26, 2020, 9:19:37 PM3/26/20
to v, Pjotr Prins, virtual biohackathon COVID-19 2020

Maybe standardize on a common format (or formats) as input?  Pretty much every sequencer I’ve worked with (Illumina, Oxford, PacBio) produces standard FASTQ in some way, and conversion to FASTA is a very fast single step that many tools can perform.

 

Otherwise you start getting into the hairy mess of platform-specific formats (FAST5, HDF, PacBio-formatted BAM, BCL). 

 

chris

 

From: <virtual-bi...@googlegroups.com> on behalf of v <vso...@gmail.com>
Date: Thursday, March 26, 2020 at 7:53 PM
To: Pjotr Prins <pjot...@gmail.com>
Cc: virtual biohackathon COVID-19 2020 <virtual-bi...@googlegroups.com>
Subject: Re: Biomarkers and website

 

Gotcha! I just read too much into this first bit:

 

To unsubscribe from this group and stop receiving emails from it, send an email to virtual-biohacka...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/virtual-biohackathon/CAM%3Dpu%2B%2BgmQS%3DCQWoADW%3D0iNDC%3DjMvWGzFgW0BAVyW%3DvArjWuvg%40mail.gmail.com.

v

unread,
Mar 26, 2020, 9:23:18 PM3/26/20
to Fields, Christopher J, Pjotr Prins, virtual biohackathon COVID-19 2020
The platform might be agnostic to the upload type (other than validating some header for some subset of types) but where we’d need to match file types is with the requested workflows for processing said file types. It’s probably easiest to start with a basic, dummy workflow that accepts one simple, standard file type, get the upload -> run -> finish working well, and then add more types.

v

unread,
Mar 27, 2020, 3:59:37 AM3/27/20
to Fields, Christopher J, Pjotr Prins, virtual biohackathon COVID-19 2020
Can anyone comment on what is missing from Galaxy?


It wouldn't make sense to implement a new platform if there is already something out there running similar workflows and being able to handle authentication, sharing data, etc.

Venkata P. Satagopam

unread,
Mar 30, 2020, 6:26:21 PM3/30/20
to Pjotr Prins, virtual biohackathon COVID-19 2020, Danielle WELTER
Pjotr, we can easily setup a REDCap instance with WHO ISARIC COVID-19 Case Record Forms (CRF)  to collect the clinical data and link to associated sequence, RT-PCR, serology etc data.

@Dani, would you like to help with metadata/onthology development?

Best,
Venkata



--
You received this message because you are subscribed to the Google Groups "virtual biohackathon COVID-19 2020" group.
To unsubscribe from this group and stop receiving emails from it, send an email to virtual-biohacka...@googlegroups.com.


Dr. Venkata Satagopam
Bioinformatics Core
Luxembourg Centre For Systems Biomedicine (LCSB)
University of Luxembourg
Campus Belval, House of Biomedicine II
6, avenue du Swing
L-4367 Belvaux

T +352-466-644-6421
F +352-466-644-36421
venkata....@uni.lu  or sata...@gmail.com
http://lcsb.uni.lu
-----
This message is confidential and may contain privileged information. It is intended for the named recipient only. If you receive it in error please notify me and permanently delete the original message and any copies.
-----








Danielle WELTER

unread,
Mar 31, 2020, 3:11:43 AM3/31/20
to Venkata P. Satagopam, Pjotr Prins, virtual biohackathon COVID-19 2020
Hi,

I would be interested in helping with the metadata and ontology side.

Thanks
Danielle

Pjotr Prins

unread,
Mar 31, 2020, 7:50:25 AM3/31/20
to Danielle WELTER, Venkata P. Satagopam, Pjotr Prins, virtual biohackathon COVID-19 2020
On Tue, Mar 31, 2020 at 07:11:41AM +0000, Danielle WELTER wrote:
> Hi,
>
> I would be interested in helping with the metadata and ontology side.
>
> Thanks
>
> Danielle

Excellent. Just keep track of communications and add your name to the
group page when you want.

Pj.
Reply all
Reply to author
Forward
0 new messages