Trying to Get a Better Understanding of Neo4j usage

Buddy Lindsey, Jr.

unread,

Apr 26, 2012, 2:10:59 PM4/26/12

to ne...@googlegroups.com

I am still very very new to Neo4j. If you read my blog post about the
book 7 databases in 7 weeks (http://bit.ly/IHwqpR) which includes
Neo4j you can see how much of a stretch non-relational databases are
for me, for now. However, I have been trudging along because for a
project I have been wanting to do for a few years it looks like a
great fit for a graph database.

So I really would like to describe my project a bit and ask if it is a
"good" fit for neo4j, and based on my current limited understanding,
since I am still learning, how to "solve" a couple of the blockers my
brain has put in front of me.

The application I have been designing is a language analysis
application for Japanese to help people, like me, who are learning
Japanese.

A small user story of what I want to do, and my first goal, is I want
a user to go to "the site" and enter a Japanese sentence. From there
it goes to the back-end and starts breaking down the sentence like so.

1) Break up all the kanji and kana and create a new relationships with
the sentence and the specific kanji and kana.
2) Analyze the sentence looking for words, a unique issue since there
are no spaces in Japanese sentences, and creating new relationships
with the words and the sentence.
later on) find conjugations of words and create some relationship
between the root conjugation type, the word, and the sentence.

One of the goals of this is so the more sentences are entered into
"the site" you can follow paths to new sentences based on breakdowns
of your current sentence to find similarities and alternate examples.

I have done some initial work on this using an RDBMS, but the results
were not... favorable.

So at this point I am sure my first question can be answered. Which
leads me to my mental blocks of neo4j itself and not knowing what to
search for to find the answers.

Based on what I am wanting to do if I take an approach of getting each
character in a sentence and finding it in the database I am going to
have to have to search the database anywhere from 5 to 60 times at
least per sentence. That is a lot of querying, and to try to do this
at a reasonable speed I am not sure how neo4j would handle doing that,
or if it can query for a single result.

My thought for a solution to that was use something like redis and
store the kanji as a key, and the value is the specific node in neo4j
thus saving time querying neo4j. Is this a good approach and can neo4j
grab a single node based on its "number"? Is that what getting a node
by index is?

Finally, are there any examples of taking a basic neo4j database and
seeing how it grabs the data and "prettily" displays it to the user in
a web page or an application? My thought on this is I have to set some
kind of word or phrase that is associated with a relationship type to
display the result to the user in a easy to understand way, is that
right?

Sorry this is so long I have a lot of thoughts all at once and not
familiar enough yet with neo4j to know what terms to search for to get
some of these answers in a form I understand.

Peter Neubauer

unread,

Apr 26, 2012, 2:40:33 PM4/26/12

to ne...@googlegroups.com

Buddy,
actually, there is right now a very similar project in Germany going
on, using the Google ngrams dataset on Neo4j, see
http://www.rene-pickhardt.de/paul-wagner-and-till-speicher-won-state-competition-jugend-forscht-hessen-and-best-project-award-using-neo4j/,
cool stuff by Paul Wagner and Till Speicher. Might be useful to look
at?

Cheers,

/peter neubauer

G: neubauer.peter
S: peter.neubauer
P: +46 704 106975
L: http://www.linkedin.com/in/neubauer
T: @peterneubauer

If you can write, you can code - @coderdojomalmo
If you can sketch, you can use a graph database - @neo4j

Buddy Lindsey, Jr.

unread,

Apr 26, 2012, 10:28:08 PM4/26/12

to ne...@googlegroups.com

That is cool to go through, kind of. Hasn't really answered my
questions as a lot of it is over my head at the moment, but I have
bookmarked it for later.

Buddy Lindsey, Jr.

unread,

Apr 28, 2012, 7:57:24 PM4/28/12

to ne...@googlegroups.com

I was curios if anyone else wanted to take a whack at trying to answer
my 2 questions.

Thanks,
Buddy

Duane Nickull

unread,

Apr 28, 2012, 11:44:16 PM4/28/12

to ne...@googlegroups.com

I have some thoughts but I am a newbie as you are. Inline.
******************

COO and Director
Uberity Technology Corporation
"LiveCycle ES and Mobile Specialists"
http://www.uberity.com
@uberity @duanechaos

On 12-04-28 4:57 PM, "Buddy Lindsey, Jr." <perc...@gmail.com> wrote:

>I was curios if anyone else wanted to take a whack at trying to answer
>my 2 questions.
>>>>

>>>> 1) Break up all the kanji and kana and create a new relationships with
>>>> the sentence and the specific kanji and kana.

DN: Since Kanji can use combinations of characters overlaid on each other,
I think defining the relationships is probably a key issue here. In the
UML metamodel, there are constructs such as Aggregates and Composition
relationships. Take Forest Fire as an example. This is composed of two
words "Yama" and "Hi (h~ee)". One could create two nodes for these then
create a special aggregate (not composite since each can exist without the
whole) relationship that allows you to create the Kanji symbol Yama-hi.
THere may be some composite relationships too. My Japanese is not
advanced enough to know.

The Yama Kanji character is 山 (one node), and the other for fire is 火. As
you are probably well aware, the latter refers to a fire that is in
control so the relationship would not be as simple as creating an overlay
aggregating the two words. Much like the bazaar counting systems
(IMHO-YAMMV), the Japanese have several types of Fire. Fires that are in
control, out of control, fires of certain colour, type etc. so this get
very confusing. The problem is that you also had to model the contexts
and Japanese language is full of them. In the context of an official sign
warning people, they would probably use the literal phrase "Mountain fire
thing" which could be written as 山火事. The fact that A + B means C must
be
added is a very unique situation that I am not sure how to do. Neo4J is
probably your best bet though.

Now consider the wider sets of contexts and it gets ugly really quickly.
Was the fire something they want you to see, want you to be cautious of (注意
or "Caution") or an observation.

I think this project would require a mix of Ontology majors, native
Japanese speakers, Neo4J gurus and context majors. My sincerest wishes to
you to solve this.

>>>> 2) Analyze the sentence looking for words, a unique issue since there
>>>> are no spaces in Japanese sentences, and creating new relationships
>>>> with the words and the sentence.
>>>> later on) find conjugations of words and create some relationship
>>>> between the root conjugation type, the word, and the sentence.

DN: THis is probably even more problematic since many sentences, even
speaking the exact same words, rely on the context in which the exchange
happens. Consider meeting someone. Consider this exchange:

Person A:Hajimemashite, <Person A> desu. Douzo yoroshiku.
Person B:Hajimemashite, <Person B> desu. Douzo yoroshiku.

What person A is really saying by Hajimemashite is "Hello, I am pleased to
make your acquaintance" but this can only be said the first time. Saying
it again has a different meaning (again depending on context). It could
end up being an insult like "you were so un-memorable I didn't even
remember you. An the Person B response means "Yes - please be kind to me"
but again is steeped in context.

This would involve literally having nodes and relationships to be able to
effectively trace a complete context between the two participants before
the analysis since the very meaning is different. Neo4J would have to know
the complete context before creating a hypothesis graph. What is correct
for one context might be wrong for a different.

Sorry to discourage you but it is a difficult problem you are trying to
solve. Having say that, Frigging brilliant!!!! If you do this, a lot of
Computational Intelligence folks will be looking your way. I will hoist
chilled fermented vegetable beverages with my friends in your honour!

Duane Nickull
>>>>

Peter Neubauer

unread,

Apr 30, 2012, 4:50:47 AM4/30/12

to ne...@googlegroups.com

Buddy,
so, Neo4j was actually inspired when we looked at how WordNet is
structuring data in Concepts and Synonyms etc etc. From that
perspective I can understand your mental model :)

When it comes to pretty printing, you can try the Neo4j Webadmin (just
download and start neo4j server, a full circle example on heroku is
http://vimeo.com/33032604), or just head over to console.neo4j.org for
some bascis tinkering.

For trying out some stuff, I think starting with an index like Redis
is a good idea, and from there you can then do traversals, see
http://docs.neo4j.org/chunked/snapshot/cypher-query-lang.html or
http://docs.neo4j.org/chunked/snapshot/gremlin-plugin.html for
inspiration.

As for the neo4j IDs, these are not an index but the actual "primary
keys" of the neo4j data elements and as such very fast to grab. These
are what you put into redis as values referring to neo4j nodes.

Hope that makes sense, feel free to ask more if you have any questions!

Cheers,

/peter neubauer

G: neubauer.peter
S: peter.neubauer
P: +46 704 106975
L: http://www.linkedin.com/in/neubauer
T: @peterneubauer

If you can write, you can code - @coderdojomalmo
If you can sketch, you can use a graph database - @neo4j

Duane Nickull

unread,

Apr 30, 2012, 9:03:58 PM4/30/12

to ne...@googlegroups.com

Peter:

This is astounding. Has anyone ever done any serious ontology development
with it? I suggested it to the Ontolog Forum but my post digressed into
YARW (Yet another religious war). I see huge potential for someone to
actually lay something like word net and map SUMO (Adam Pease's work) in
seconds. From there domain specific ontologies could be easily generated
while maintaining lineage to the FOL merged ontology.

Duane

******************

COO and Director
Uberity Technology Corporation
"LiveCycle ES and Mobile Specialists"
http://www.uberity.com
@uberity @duanechaos

On 12-04-30 1:50 AM, "Peter Neubauer" <peter.n...@neotechnology.com>
wrote:

James Thornton

unread,

Apr 30, 2012, 9:40:38 PM4/30/12

to ne...@googlegroups.com

On Monday, April 30, 2012 8:03:58 PM UTC-5, Überity wrote:

Peter:

This is astounding. Has anyone ever done any serious ontology development
with it? I suggested it to the Ontolog Forum but my post digressed into
YARW (Yet another religious war). I see huge potential for someone to
actually lay something like word net and map SUMO (Adam Pease's work) in
seconds. From there domain specific ontologies could be easily generated
while maintaining lineage to the FOL merged ontology.

I have a Python program that loads WordNet into Neo4j -- it uses Bulbs (http://bulbflow.com) and is called Wordgraph).

- James

Michael Hunger

unread,

May 14, 2012, 3:20:44 PM5/14/12

to ne...@googlegroups.com

Buddy,

answers inline.

Am 26.04.2012 um 20:10 schrieb Buddy Lindsey, Jr.:
>
> So at this point I am sure my first question can be answered. Which
> leads me to my mental blocks of neo4j itself and not knowing what to
> search for to find the answers.

>
> Based on what I am wanting to do if I take an approach of getting each
> character in a sentence and finding it in the database I am going to
> have to have to search the database anywhere from 5 to 60 times at
> least per sentence. That is a lot of querying, and to try to do this
> at a reasonable speed I am not sure how neo4j would handle doing that,
> or if it can query for a single result.
>
> My thought for a solution to that was use something like redis and
> store the kanji as a key, and the value is the specific node in neo4j
> thus saving time querying neo4j. Is this a good approach and can neo4j
> grab a single node based on its "number"? Is that what getting a node
> by index is?

It depends how you identify the characters and words you're looking for, index lookup is one way, building up a tree that leads to the interesting nodes (as leaves) using some sensible branch selection would be another.

And it would probably the way that often-used characters (regardless of how they were retrieved initially) will be held in an in-memory cache (LRU-map) anyway so you can access them immediately (the nodes will still be connected to the DB and reflect all the changes that happend underneath).

> Finally, are there any examples of taking a basic neo4j database and
> seeing how it grabs the data and "prettily" displays it to the user in
> a web page or an application? My thought on this is I have to set some
> kind of word or phrase that is associated with a relationship type to
> display the result to the user in a easy to understand way, is that
> right?

This is rendering of your query or traversal. Usually what you do is to take the intermediate nodes and relationships (in a path) and return them as well from your queries or traversals. And then you can use this information to highlight the
relevant nodes in the visualization of the graph.

This is for instance used in the neo4j console, see this example (highlight in red): http://tinyurl.com/c59osy7

Of course this can also be built up and also visualized incrementally.

Cheers

Michael

Reply all

Reply to author

Forward