Case database(s)

93 views
Skip to first unread message

Hilmar Lapp

unread,
Mar 20, 2020, 4:17:12 PM3/20/20
to virtual-bi...@googlegroups.com
Is anyone aware of even only as much as FAIR database(s) of cases? The only one I can find is for China (ironic, perhaps).

There are plenty of nice online viz apps, such as this one from Johns Hopkins:

The data for the above are here:

The terms of use say "This GitHub repo and its contents herein, including all data, mapping, and analysis, copyright 2020 Johns Hopkins University, all rights reserved, is provided to the public strictly for educational and academic research purposes.” Assuming the data aren’t made up in some creative way, i.e., are facts of nature, in a jurisdiction where sweat of the brow does not count for IP eligibility, how can these be legally copyrighted?

Wikipedia editors have produced some nice charts, but seem to be pulling the data by hand each day from each source?

Perhaps I’m looking in the wrong places, but if the hypothesis is true (and I believe it to be true) that unfree, unusable, and unavailable data is one of the major hindrances to the advancement of science, it looks like institutions all across are repeating the same old behaviors, which therefore must have the same results, namely needless slowing down the advancement of science, so that they can hoard the credits.

But, perhaps I’m just not looking in the right places, and the data are all there for anyone who has an idea what to do with them?

 -hilmar

-- 
Hilmar Lapp -:- genome.duke.edu -:- lappland.io


Hilmar Lapp

unread,
Mar 20, 2020, 4:30:49 PM3/20/20
to virtual-bi...@googlegroups.com
I posted a comment if someone wanted to chime in:

  -hilmar

--
You received this message because you are subscribed to the Google Groups "virtual biohackathon COVID-19 2020" group.
To unsubscribe from this group and stop receiving emails from it, send an email to virtual-biohacka...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/virtual-biohackathon/859978D5-7576-401F-9B52-E489FE5B0E66%40duke.edu.

Pjotr Prins

unread,
Mar 20, 2020, 5:32:05 PM3/20/20
to Hilmar Lapp, virtual-bi...@googlegroups.com
On Fri, Mar 20, 2020 at 08:17:06PM +0000, Hilmar Lapp wrote:
> Is anyone aware of even only as much as FAIR database(s) of cases? The
> only one I can find is for China (ironic, perhaps).

Thanks Hilmar!

I am already asking institutes to give us raw sequence data of the
virus to put the public domain (with attribution). We are not
interested in post-processed sequences - we have enough of those.

What we want to achieve is that when someone sequences the data
somewhere they should be able to upload the data and instantly see how
it compares to other variants/strains and what is known about nearest
neighbours in terms of variants, phenotypes, treatments etc.

I want to ask everyone here to ask for the same. Find
bioinformaticians you know, explain what we do, and have them ask the
clinical people. There is absolutely reason share viral sequencing
data.

To find working tests, protein-molecule predictions and candidate
treatments we need the *raw* data.

If it contains human sequence we can filter it out for them. We'll
make that part of the uploader.

Pj.

Pjotr Prins

unread,
Mar 20, 2020, 5:42:20 PM3/20/20
to Pjotr Prins, Hilmar Lapp, virtual-bi...@googlegroups.com
I am also getting feedback that people are not aware of GISAID
restrictions. They think it is OK to upload.
> --
> You received this message because you are subscribed to the Google Groups "virtual biohackathon COVID-19 2020" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to virtual-biohacka...@googlegroups.com.
> To view this discussion on the web, visit https://groups.google.com/d/msgid/virtual-biohackathon/20200320213203.gl2fjgzrgrqverka%40thebird.nl.
>

Hilmar Lapp

unread,
Mar 20, 2020, 6:02:24 PM3/20/20
to Pjotr Prins, virtual-bi...@googlegroups.com
I think that’s part of the issue here. Yes, we can use guerrilla tactics and try to secure raw data through our networks enough to accomplish whatever goals we have.

However, I guess I want to submit the same here that I’ve been trying to explain to friends etc. This isn’t about us. Or at least, we can choose not to make it about us. As regards data, by and large everyone is following the same patterns, behaviors, and impulses  that they’ve always done, both at an individual and institutional level (that misguided and shaky ground “terms of reuse” assertion being a prime example). Sure, we can try to loosen some restrictions temporarily, only for everyone to go back to business as usual once the pandemic is over. But maybe there’s an opportunity here to force a sea change.

Perhaps I need some exercise.

 -hilmar

Hilmar Lapp

unread,
Mar 21, 2020, 3:41:23 PM3/21/20
to virtual-bi...@googlegroups.com
Found two resources that look much like what I was looking for:

- https://covidtracking.com includes raw data spreadsheet and a csv/json API. Also GraphQL. Nice!! (Can’t be that hard to get from there to a triple store and SPARQL?)
- https://covidbase.com/ nice collection for efforts of various kinds

The latter doesn’t list this hackathon yet; perhaps it should?

  -hilmar

On Mar 20, 2020, at 4:17 PM, Hilmar Lapp <Hilma...@duke.edu> wrote:

--
You received this message because you are subscribed to the Google Groups "virtual biohackathon COVID-19 2020" group.
To unsubscribe from this group and stop receiving emails from it, send an email to virtual-biohacka...@googlegroups.com.

Rutger Vos

unread,
Mar 23, 2020, 6:58:24 AM3/23/20
to Hilmar Lapp, virtual-bi...@googlegroups.com
Hilmar,

you are of course absolutely right, as usual. I'm surprised by how hard it is to come by the data and how badly formatted it is. I guess what should be publicly available without intervening points & clicks and logins, are:
  1. raw sequencing data which will be in a variety of forms because people are sequencing on a bunch of different platforms. I'm seeing quite a lot of nanopore, for example.
  2. assembled genomes as fasta, with relatively uniform headers. These can be had from GISAID but after logging in and clicking around.
  3. case metadata. These can be head from GISAID as PDF but after logging in and clicking around.
I am in favour of guerilla tactics to address this.

Rutger



--

Met vriendelijke groet,

Dr. Rutger A. Vos
Researcher / Bioinformatician






+31717519600 - +31627085806
Darwinweg 2, 2333 CR Leiden
Postbus 9517, 2300 RA Leiden










David Yu Yuan

unread,
Mar 23, 2020, 8:43:27 AM3/23/20
to Rutger Vos, Hilmar Lapp, virtual-bi...@googlegroups.com
Can ENA be a starting point for the prototypes to be built? It seems to have the required data in good shape. It is a public repository run by a public organisation.


Please check it out. If it is not good, we can rule it out quickly at least.


Best regards,

David Yuan

Fields, Christopher J

unread,
Mar 23, 2020, 9:29:10 AM3/23/20
to David Yu Yuan, Rutger Vos, Hilmar Lapp, virtual-bi...@googlegroups.com

Image removed by sender.







+31717519600 - +31627085806

Darwinweg 2, 2333 CR Leiden

Postbus 9517, 2300 RA Leiden

 

Image removed by sender.







 

 

 

--
You received this message because you are subscribed to the Google Groups "virtual biohackathon COVID-19 2020" group.
To unsubscribe from this group and stop receiving emails from it, send an email to virtual-biohacka...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/virtual-biohackathon/CAATi6nmSA57cy05pDDmuyNPvX8hG4_r4yV2-DPE_bbmnT-0zBw%40mail.gmail.com.

 

--

You received this message because you are subscribed to the Google Groups "virtual biohackathon COVID-19 2020" group.
To unsubscribe from this group and stop receiving emails from it, send an email to virtual-biohacka...@googlegroups.com.

Hilmar Lapp

unread,
Mar 23, 2020, 10:26:11 AM3/23/20
to Rutger A. Vos, virtual-bi...@googlegroups.com
I would add georeferenced (to at least county or, say, 25km square) confirmed case and putative cause data. Right now these seem to be mostly scraped from a collection of local newspapers.

  -hilmar

Daniel Mietchen

unread,
Mar 23, 2020, 11:18:49 AM3/23/20
to virtual biohackathon COVID-19 2020
Hi Hilmar,
yes, regarding the Wikipedia workflows, there is a related discussion at https://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_COVID-19#Where_should_the_data_live%3F .
In short, the data are curated multiple times (per Wikipedia language), with little coordination, and with a strong manual component. The curation could in principle be shared via hosting the data on Wikidata (though it is not ideal for time series) or Wikimedia Commons, but there are no workflows in place to do this automatically, and even if there were, they would have to be designed such that whoever maintains the technical backend is not a bottleneck when it comes to enabling updates.
Would be great if we could work out something that addresses these issues, ideally both for the concrete use case and for later ones.
Daniel

Rutger Vos

unread,
Mar 23, 2020, 12:00:47 PM3/23/20
to Daniel Mietchen, virtual biohackathon COVID-19 2020
By the way, if we want to be useful then one way would be to make these data available in a way that is compatible with what nextstrain are doing. Their workflow is very informative and they do have a fair amount of case metadata (including localities) but they can't redistribute the sequence data because they scrape it from gisaid.

--
You received this message because you are subscribed to the Google Groups "virtual biohackathon COVID-19 2020" group.
To unsubscribe from this group and stop receiving emails from it, send an email to virtual-biohacka...@googlegroups.com.

Hilmar Lapp

unread,
Mar 23, 2020, 12:33:00 PM3/23/20
to Rutger A. Vos, Daniel Mietchen, virtual biohackathon COVID-19 2020
Agreed. Is someone from the Nextstrain collaboration going to participate?

  -hilmar

Rutger Vos

unread,
Mar 23, 2020, 12:52:08 PM3/23/20
to Hilmar Lapp, Daniel Mietchen, virtual biohackathon COVID-19 2020
They gave me push access because I've been doing some chores for them but I suppose we can let them know so that some insiders show up as well. 
Reply all
Reply to author
Forward
0 new messages