A Biotech Use-Case for Neo4j to be Featured at StampedeCon2015

48 views
Skip to first unread message

Tim Williamson

unread,
May 9, 2015, 9:20:17 AM5/9/15
to neo4j-...@googlegroups.com
My group uses Neo4j extensively to manage genetic ancestry data in an active biotech pipeline. One of my colleagues, Jason Clark, will giving an invited talk about this work at StampedeCon 2015. I thought that this might be of interest to the broader community of those interested in using Neo4j for biotech use-cases.

http://stampedecon.com/

Managing Genetic Ancestry at Scale with Neo4j and Kafka

The global Monsanto R&D pipeline produces millions of new plant populations every year; each which contributes to a dataset of genetic ancestry spanning several decades. Historically the constraints of modeling and processing this data within an RDBMS has made drawing inferences from this dataset complex and computationally infeasible at large scale. Fortunately, the genetic history of any plant population forms a naturally occurring directed acyclic graph, a property that has allowed us to utilize graph theory to re-imagine how ancestral lineage data is modeled, stored, and queried.

In this talk we present our solutions to these problems, as realized using a graph-based approach within Neo4j. We will discuss our learnings around using Neo4j in a production setting that includes transactional and high-throughput computation, including how we transitioned from recursive JOIN queries to using Cypher and the Neo4j traversal framework to take full advantage of index-free adjacency. Our approach to polyglot persistence will be discussed via our use of a distributed commit log, Apache Kafka, to feed our graph store from sources of live transactional data. Finally, we will touch upon how we are using these technologies to annotate our genetic ancestry dataset with molecular genomics data in order to build a pipeline-scale genotype imputation platform with core algorithms built using Apache Spark.
Reply all
Reply to author
Forward
0 new messages