Translation of English to Gremlin

228 views
Skip to first unread message

Marco Zanchi

unread,
Jul 18, 2020, 8:08:33 AM7/18/20
to Gremlin-users
Hi all,

I have recently been given access to the GPT-3 NLP model from OpenAI. That thing is really amazing. Their current model has been exposed to a lot of content from the internet, but sadly not much Gremlin. So although it can automatically generate JavaScript, and translate relatively well from English to SQL, it still cannot process from English to Gremlin well.

This is a big game changer. If we could have a tranlation tool from English to a graph database query language, so many possibilities open up for applications that the mind boggles.

I am particularly interested in this as I am trying to create a tool called The Mersheen, that serves as a cognitive assistant. But the problem of translating English to Gremlin is useful to everyone, so I decided to make the dataset itself open source, and anyone can train their own model, or simply allow OpenAI to incorporate it into their own GPT-3 model (they seem to be good guys, not an evil corporation, and are being very careful on who they give access to it for now, so that it is not abused by evil-doers)

Anyway, I created a repo here: https://github.com/Mersheen/gremlin-gpt. If you can, please contribute.

Thanks!

Marco

HadoopMarc

unread,
Jul 19, 2020, 7:30:32 AM7/19/20
to Gremlin-users
Hi Marco,

Interesting idea, at least from a scientific point of view. If I understand right, it would be more precise to say that the translation would be from a tiny subset of english to gremlin in a particular schema. Do you have good resources that show how good and usable the translation from english to javascript is and how the training set looks like? How does the output hold against a linter?

Best wishes,   Marc

Op zaterdag 18 juli 2020 om 14:08:33 UTC+2 schreef Marco Zanchi:

Marco Zanchi

unread,
Jul 19, 2020, 9:19:05 AM7/19/20
to gremli...@googlegroups.com
Hi Marc,

The GPT-3 has been trained on a huge dataset captured from the internet. It has an intrinsic understanding about entities and relationships, without any specific context. It's a statistical model, so the output is not guaranteed to be correct, but the expectation is that as it consumes more data it gets stronger, to the point that it might eventually be comparable to human expertise. If you search Twitter for GPT-3 you will find many interesting examples of people creating automatic code generators that are quite fascinating.

I believe that if we fine-tune the model with enough examples from many schema, it might have a better chance of inferring what is the structure in Gremlin that corresponds to the statement. If I prime the model with the following examples:

""" 

What is the full name of Alice? 

g.V().hasLabel(<object>).has(<identifier>, ‘Alice’).values(<result>); 

result="full name" 

<end> 

""" 

Who is the boss of Alice? 

g.V().hasLabel(<object>).has(<identifier>, ‘Alice’).in(<relationship>).values(‘name’); 

relationship="is the boss" 

<end>

""" 

Who is the manager of Bob? 

g.V().hasLabel(<object>).has(<identifier>, ‘Bob’).in(<relationship>).values(‘name’); 

relationship="is the manager" 

<end> 

""" 

What is the birthday of Alice? 

g.V().hasLabel(<object>).has(<identifier>, ‘Alice’).values(<result>); 

result="birthday" 

<end>

 """ 

What is the current salary of Charlie? 

g.V().hasLabel(<object>).has(<identifier>, ‘Charlie’).values(<result>); 

result="current salary"

 <end>

 """


Then when I ask the question:

What is the first name of Mrs. Jackson?


It replies with:

g.V().hasLabel(<object>).has(<identifier>, ‘Mrs. Jackson’).values(<result>); result="first name"


So even with a tiny dataset it can already start understanding patterns. If we could feed it more complex examples, the model would become much stronger.


I have also experimented with trying a sort of discourse representation structure, and also with some form of semantic understanding. The result was quite promising too. With the prompt:
Q: What is the birthday of Alice?
A: Alice is a person. Alice has a birthday. What is the birthday of Alice?
Q: What is the current salary of Robert?
A: Robert is a person. Robert has a salary. Of all salaries, Robert has a current salary. What is the current salary of Robert?
Q: Who is the boss of Charlie?
A: Charlie is a person. Charlie has a boss. Charlie's boss is a person. What is the name of Charlie's boss?
Q: Who is the director of Pulp Fiction?
A: Pulp Fiction is a movie. Pulp Fiction has a director. Pulp Fiction's director is a person. What is the name of the director of Pulp Fiction?
Q: What are the tasks of Douglas?
A: Douglas is a person. Douglas has tasks assigned. Douglas' assigned tasks have titles. What are the titles of Douglas' tasks?
Q: How many people work on team Yankee?
A: Yankee is a team. The team Yankee has people working on it. What is the number of people working on team Yankee?

Then when I ask questions it generates the answers by itself:
Q: What are the sizes of all teams in department Delta?
A: Delta is a department. The department Delta has teams in it. All teams in department Delta have sizes. What are the sizes of all teams in department Delta?
Q: What is the most important task I need to work on?
A:  I am a person. I have tasks assigned. I need to work on tasks. What is the most important task I need to work on?
Q: Who is my next appointment with?
A:  I am a person. I have appointments scheduled. I have an appointment scheduled with someone. Who is my next appointment with?
Q: What are Josephine's main skills?
A:  Josephine is a person. Josephine has skills. What are Josephine's main skills?
Q: What is the meaning of life? A:  Life is a concept. Life has a meaning. What is the meaning of life?
Q: What is the last thing I talked to with Mary?
A:  I am a person. I talked to someone. I talked to someone last. What is the last thing I talked to with Mary?

Not 100%, but quite impressive, imho.

--
You received this message because you are subscribed to a topic in the Google Groups "Gremlin-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gremlin-users/u2kLgspmwXU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/3c0d3a28-fb7c-4950-9e39-685c5ecb89e2n%40googlegroups.com.

Joshua Shinavier

unread,
Jul 20, 2020, 2:21:46 PM7/20/20
to gremli...@googlegroups.com
I would also be interested in seeing some examples of English to JavaScript. It seems to me that Gremlin only makes sense as a target for translation from English if the mapping is relatively direct (perhaps requiring the user to learn a controlled dialect of English). This sort of eyes-free graph programming would have applications for the Gremlin power user.

Otherwise, what advantages would Gremlin offer over a more Watson-like approach, in which multiple strategies compete to match a user's input against a knowledge base, without a human-readable query in the middle?




You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/CALcc%3DfPx6nMSeTc1yf1VEiQGn_oPFAxiRWOg-oaT%3DmChOw9u3A%40mail.gmail.com.

Marco Zanchi

unread,
Jul 20, 2020, 9:37:31 PM7/20/20
to gremli...@googlegroups.com
Well, Watson is proprietary, from IBM. I don't think they will release their code base open source any time soon.

More importantly though, the advances in NLP are phenomenal, and will only get better. So if we have a dataset to fine tune a model to translate to something that can be better structured as a query, it can be quite powerful in the near future.

I started experimenting with a controlled English first, but the newer NLP models are so good that they don't need that limitation. I actually am almost convinced that we can train the model to transform an English sentence into a discourse representation structure, just by providing examples of the grammar.

Reply all
Reply to author
Forward
0 new messages