Advice: Neo4j or Titan

wen yu

unread,

May 6, 2014, 4:30:16 PM5/6/14

to bio4j...@googlegroups.com

Hi, I'm new to graph DB, but really fascinated about what bio4j is about and what it can do for my app. My requirements include:

1. Able to create new nodes that can be linked to existing protein nodes in bio4j

2. Easy to update bio4j distribution without breaking the links to my own nodes. If my nodes and bio4j are linked in a graph, how do I update the bio4j when a new distribution comes out?

3. Able to to work with data in HBase tables within the same Java program. i.e. I need to query Hbase for something and look it up in Neo4j/Titan for more information

4. Need a text indexing engine to support wild-card search

5. Would be nice to do range query on the edge property. e.g. fetch neighboring nodes that have link-score > 0.5.

Should I go with Neo4j or Titan as the DB backend for Bio4J?

Thanks!

Wen Yu

Pablo Pareja Tobes

unread,

May 7, 2014, 3:55:45 AM5/7/14

to bio4j...@googlegroups.com

Hi Wen,

Great to hear about your interest in Bio4j.

> 1. Able to create new nodes that can be linked to existing protein nodes in bio4j

No problem with that, you can create any new nodes you want and link them to any of the existing nodes in Bio4j.

> 2. Easy to update bio4j distribution without breaking the links to my own nodes. If my nodes and bio4j are linked in a graph, how do I update the bio4j when a new distribution comes out?

In order to do that you should import your data on top of the new bio4j distribution. If your code for importing your data into bio4j is organized enough this shouldn't pose a problem.

> 3. Able to to work with data in HBase tables within the same Java program. i.e. I need to query Hbase for something and look it up in Neo4j/Titan for more information

That's the job of your own program, I don't see any reason why you couldn't do that.

> 4. Need a text indexing engine to support wild-card search

You can create your own indices if you need so from the bio4j properties you may be interested in.

> 5. Would be nice to do range query on the edge property. e.g. fetch neighboring nodes that have link-score > 0.5.

You can create your own range indices for the data you add to bio4j database.

>Should I go with Neo4j or Titan as the DB backend for Bio4J?

I would recommend going for Titan in the case you're interested on the whole Bio4j.

Cheers,

Pablo

--
Has recibido este mensaje porque estás suscrito al grupo "bio4j-user" de Grupos de Google.
Para anular la suscripción a este grupo y dejar de recibir sus mensajes, envía un correo electrónico a bio4j-user+...@googlegroups.com.
Para acceder a más opciones, visita https://groups.google.com/d/optout.

--
Pablo Pareja Tobes

LinkedIn http://www.linkedin.com/in/pabloparejatobes

Twitter http://www.twitter.com/pablopareja

http://about.me/pablopareja

http://www.ohnosequences.com

wen yu

unread,

May 7, 2014, 7:28:47 PM5/7/14

to bio4j...@googlegroups.com

Wonderful! I'd start with Titan as you'd suggested.

For question#2, do you mean that I had to clear out the graph, load the new bio4j release and re-import my own data on top of it? My own data will be generated by the users incrementally, so it may not be trivial to recreate them. Can I do a "diff" and update only bio4j node/edge that change? Uniprot or Refseq might move to a new build, but should only contain minor changes.

Thanks!

Wen

Pablo Pareja Tobes

unread,

May 20, 2014, 3:46:22 AM5/20/14

to bio4j...@googlegroups.com

Hi Wen,

Maybe the best thing to do could be:

1. keep your data

2. "cut" the edges linking to the old version of Bio4j and temporarily store them somehow

3. remove old version of Bio4j

4. import new version of Bio4j

5. reattach your data to the corresponding nodes of the new version

Cheers,

Pablo

Reply all

Reply to author

Forward