I have some thoughts but I am a newbie as you are. Inline.
******************
COO and Director
Uberity Technology Corporation
"LiveCycle ES and Mobile Specialists"
http://www.uberity.com
@uberity @duanechaos
On 12-04-28 4:57 PM, "Buddy Lindsey, Jr." <
perc...@gmail.com> wrote:
>I was curios if anyone else wanted to take a whack at trying to answer
>my 2 questions.
>>>>
>>>> 1) Break up all the kanji and kana and create a new relationships with
>>>> the sentence and the specific kanji and kana.
DN: Since Kanji can use combinations of characters overlaid on each other,
I think defining the relationships is probably a key issue here. In the
UML metamodel, there are constructs such as Aggregates and Composition
relationships. Take Forest Fire as an example. This is composed of two
words "Yama" and "Hi (h~ee)". One could create two nodes for these then
create a special aggregate (not composite since each can exist without the
whole) relationship that allows you to create the Kanji symbol Yama-hi.
THere may be some composite relationships too. My Japanese is not
advanced enough to know.
The Yama Kanji character is 山 (one node), and the other for fire is 火. As
you are probably well aware, the latter refers to a fire that is in
control so the relationship would not be as simple as creating an overlay
aggregating the two words. Much like the bazaar counting systems
(IMHO-YAMMV), the Japanese have several types of Fire. Fires that are in
control, out of control, fires of certain colour, type etc. so this get
very confusing. The problem is that you also had to model the contexts
and Japanese language is full of them. In the context of an official sign
warning people, they would probably use the literal phrase "Mountain fire
thing" which could be written as 山火事. The fact that A + B means C must
be
added is a very unique situation that I am not sure how to do. Neo4J is
probably your best bet though.
Now consider the wider sets of contexts and it gets ugly really quickly.
Was the fire something they want you to see, want you to be cautious of (注意
or "Caution") or an observation.
I think this project would require a mix of Ontology majors, native
Japanese speakers, Neo4J gurus and context majors. My sincerest wishes to
you to solve this.
>>>> 2) Analyze the sentence looking for words, a unique issue since there
>>>> are no spaces in Japanese sentences, and creating new relationships
>>>> with the words and the sentence.
>>>> later on) find conjugations of words and create some relationship
>>>> between the root conjugation type, the word, and the sentence.
DN: THis is probably even more problematic since many sentences, even
speaking the exact same words, rely on the context in which the exchange
happens. Consider meeting someone. Consider this exchange:
Person A:Hajimemashite, <Person A> desu. Douzo yoroshiku.
Person B:Hajimemashite, <Person B> desu. Douzo yoroshiku.
What person A is really saying by Hajimemashite is "Hello, I am pleased to
make your acquaintance" but this can only be said the first time. Saying
it again has a different meaning (again depending on context). It could
end up being an insult like "you were so un-memorable I didn't even
remember you. An the Person B response means "Yes - please be kind to me"
but again is steeped in context.
This would involve literally having nodes and relationships to be able to
effectively trace a complete context between the two participants before
the analysis since the very meaning is different. Neo4J would have to know
the complete context before creating a hypothesis graph. What is correct
for one context might be wrong for a different.
Sorry to discourage you but it is a difficult problem you are trying to
solve. Having say that, Frigging brilliant!!!! If you do this, a lot of
Computational Intelligence folks will be looking your way. I will hoist
chilled fermented vegetable beverages with my friends in your honour!
Duane Nickull
>>>>