Advice: Neo4j or Titan

57 views
Skip to first unread message

wen yu

unread,
May 6, 2014, 4:30:16 PM5/6/14
to bio4j...@googlegroups.com

Hi, I'm new to graph DB, but really fascinated about what bio4j is about and what it can do for my app. My requirements include:

1. Able to create new nodes that can be linked to existing protein nodes in bio4j
2. Easy to update bio4j distribution without breaking the links to my own nodes. If my nodes and bio4j are linked in a graph, how do I update the bio4j when a new distribution comes out?
3. Able to to work with data in HBase tables within the same Java program. i.e. I need to query Hbase for something and look it up in Neo4j/Titan for more information
4. Need a text indexing engine to support wild-card search
5. Would be nice to do range query on the edge property. e.g. fetch neighboring nodes that have link-score > 0.5.

Should I go with Neo4j or Titan as the DB backend for Bio4J? 

Thanks!
Wen Yu

Pablo Pareja Tobes

unread,
May 7, 2014, 3:55:45 AM5/7/14
to bio4j...@googlegroups.com
Hi Wen,

Great to hear about your interest in Bio4j. 

> 1. Able to create new nodes that can be linked to existing protein nodes in bio4j
No problem with that, you can create any new nodes you want and link them to any of the existing nodes in Bio4j.

> 2. Easy to update bio4j distribution without breaking the links to my own nodes. If my nodes and bio4j are linked in a graph, how do I update the bio4j when a new distribution comes out?
In order to do that you should import your data on top of the new bio4j distribution. If your code for importing your data into bio4j is organized enough this shouldn't pose a problem.

> 3. Able to to work with data in HBase tables within the same Java program. i.e. I need to query Hbase for something and look it up in Neo4j/Titan for more information
That's the job of your own program, I don't see any reason why you couldn't do that.

> 4. Need a text indexing engine to support wild-card search
You can create your own indices if you need so from the bio4j properties you may be interested in.

> 5. Would be nice to do range query on the edge property. e.g. fetch neighboring nodes that have link-score > 0.5.
You can create your own range indices for the data you add to bio4j database.

>Should I go with Neo4j or Titan as the DB backend for Bio4J? 
I would recommend going for Titan in the case you're interested on the whole Bio4j.

Cheers,

Pablo


--
Has recibido este mensaje porque estás suscrito al grupo "bio4j-user" de Grupos de Google.
Para anular la suscripción a este grupo y dejar de recibir sus mensajes, envía un correo electrónico a bio4j-user+...@googlegroups.com.
Para acceder a más opciones, visita https://groups.google.com/d/optout.



--
Pablo Pareja Tobes

wen yu

unread,
May 7, 2014, 7:28:47 PM5/7/14
to bio4j...@googlegroups.com
Wonderful! I'd start with Titan as you'd suggested. 

For question#2, do you mean that I had to clear out the graph, load the new bio4j release and re-import my own data on top of it? My own data will be generated by the users incrementally, so it may not be trivial to recreate them. Can I do a "diff" and update only bio4j node/edge that change? Uniprot or Refseq might move to a new build, but should only contain minor changes.

Thanks!
Wen

Pablo Pareja Tobes

unread,
May 20, 2014, 3:46:22 AM5/20/14
to bio4j...@googlegroups.com
Hi Wen,

Maybe the best thing to do could be:

1. keep your data
2. "cut" the edges linking to the old version of Bio4j and temporarily store them somehow
3. remove old version of Bio4j
4. import new version of Bio4j
5. reattach your data to the corresponding nodes of the new version

Cheers,

Pablo
Reply all
Reply to author
Forward
0 new messages