Just Neo4j or combined with a document store?

1,615 views
Skip to first unread message

Wesley Hall

unread,
Feb 15, 2012, 10:20:52 AM2/15/12
to Neo4j
Hi folks,

I am just in the process of making some initial technology decisions
on an early stage project. In general terms we have a requirement for
fairly large, free-text documents, and a sophisticated index that
maintains complex relationships between these documents.

Consider it like a academic paper database, with references between
documents as well as other relationships.

My initial thought is to spike using Neo4j as the repository for
relationship between entries with vertices holding a reference to an
entry in a document store like MongoDB. I have some experience with
Mongo but I am fairly new to Neo4j but my initial impression is that
such an approach would be using each technology according to it's
particular strength, there is, however, some overhead involved in
maintaining and managing two different storage engines.

A co-worker has asked me whether Mongo would be needed, and perhaps it
makes sense to store the text as properties on the Neo4j vertex, and
stick to a single persistence mechanism. It does not *seem* to me that
this is a 'proper' use for Neo4j, but it is also very possible I have
the complete wrong impression... :)

Does anyone have any experience with this kind of use-case? Any
recommendations on whether to split storage engines in the way
described or to stick with a single one? Is it reasonable to store
large documents in vertex properties in the way described? Thoughts
appreciated.

Regards

Wesley Hall

Marko Rodriguez

unread,
Feb 15, 2012, 10:23:17 AM2/15/12
to ne...@googlegroups.com
Hi,

> A co-worker has asked me whether Mongo would be needed, and perhaps it
> makes sense to store the text as properties on the Neo4j vertex, and
> stick to a single persistence mechanism. It does not *seem* to me that
> this is a 'proper' use for Neo4j, but it is also very possible I have
> the complete wrong impression... :)

I don't know how big your text is, but I have used Neo4j properties to store individual comments in a discussion thread. Thus, text that size of emails.

> Does anyone have any experience with this kind of use-case? Any
> recommendations on whether to split storage engines in the way
> described or to stick with a single one? Is it reasonable to store
> large documents in vertex properties in the way described? Thoughts
> appreciated.

I would NOT split storage engines. Try and keep your system as simple as possible.

HTH,
Marko.

http://markorodriguez.com

James Thornton

unread,
Feb 15, 2012, 10:47:18 AM2/15/12
to ne...@googlegroups.com
Are these documents PDFs?

If so you could look at using something like IndexTank (http://indextank.com/) to index them, store the attributes and relationships in Neo4j, and then link to the actual documents stored in file system.

- James

Peter Neubauer

unread,
Feb 15, 2012, 11:05:57 AM2/15/12
to ne...@googlegroups.com
Yes,
and if you wanna contribute, an IndexTank Indexprovider to do this
transactionally would be a very good addition.

Cheers,

/peter neubauer

G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

Neo4j 1.6 released                 - dzone.com/6S4K
The Neo4j Heroku Challenge   - http://neo4j-challenge.herokuapp.com/

Axel Morgner

unread,
Feb 15, 2012, 1:47:49 PM2/15/12
to ne...@googlegroups.com
Hi Wesley,

Neo4j isn't optimized for large binary or String properties (even if it supports byte[] as a property data type). Maybe you could store data chunks in a kind of tree structure (we didn't try that yet).
In structr [1], we copy binary files to/from the filesystem with Apache Common's IOUtils, keeping only the path on disk in a String property at the file node. Works pretty well.

Greetings
Axel

Niels Hoogeveen

unread,
Feb 15, 2012, 9:01:17 PM2/15/12
to Neo4j
In my CMS, I store html text, zipped as byte[] properties. So far, I
have not encountered performance issues with this approach.

On Feb 15, 7:47 pm, Axel Morgner <a...@morgner.de> wrote:
> Hi Wesley,
>
> Neo4j isn't optimized for large binary or String properties (even if it supports byte[] as a property data type). Maybe you could store data chunks in a kind of tree structure (we didn't try that yet).
> In structr [1], we copy binary files to/from the filesystem with Apache Common's IOUtils, keeping only the path on disk in a String property at the file node. Works pretty well.
>
> Greetings
> Axel
>
> [1]  f.e.seehttps://github.com/structr/structr/blob/develop/structr/structr-core/...
> > Wesley Hall- Hide quoted text -
>
> - Show quoted text -

James Thornton

unread,
Feb 15, 2012, 9:05:54 PM2/15/12
to ne...@googlegroups.com
In my CMS, I store html text, zipped as byte[] properties. So far, I 
> have not encountered performance issues with this approach.

Neo4j caches properties in memory when they are accessed so won't the large files consume your cache unnecessarily?  

- James

Michael Hunger

unread,
Feb 15, 2012, 9:59:04 PM2/15/12
to ne...@googlegroups.com
Only when they are accessed, String and array properties that are not inlined are only loaded on first access.

Michael

ghjunior

unread,
Feb 15, 2012, 11:17:15 PM2/15/12
to Neo4j
Would you guys consider article size text like blog posts, etc ok
candidates for node property values?

On Feb 16, 3:59 pm, Michael Hunger <michael.hun...@neotechnology.com>
wrote:

Axel Morgner

unread,
Feb 16, 2012, 3:05:31 AM2/16/12
to ne...@googlegroups.com

> Would you guys consider article size text like blog posts, etc ok
> candidates for node property values?
>
Yes, that should be ok. That way, you can put it into an index and
search it, too.

Axel

Reply all
Reply to author
Forward
0 new messages