Turning Natural Language Text into a SPARQL Query for Wikidata with GPT3

1,375 views
Skip to first unread message

Marco Neumann

unread,
Jul 30, 2022, 7:33:50 AM7/30/22
to ontolo...@googlegroups.com
I think this is a rather nice application for GPT3. You still may call this just a toy but it allows you to transform natural text into SPARQL query to access wikidata content and play with it.


Enjoy,
Marco
--


---
Marco Neumann


John F Sowa

unread,
Jul 31, 2022, 12:47:58 AM7/31/22
to ontolo...@googlegroups.com
Marco,
 
I agree that the GPT-3 technology, which is capable of doing translations from one language to another, can be used to translate a natural language to a formal language, such as SPARQL, or SQL, or Common Logic, or many other kinds of notations.
 
That is indeed useful.  But as I said, it must be supported by some symbolic checks on its accuracy.  For a query language, the most important check must be an echo, which translates the target language (SPARQL, SQL, CL...) back to some NL sentence or paragraph.  Then before it runs the query or the command, it must ask one simple question:  "Is this what you mean?"
 
Since the formal language is precisely defined, it is possible to translate it to a formally defined "Controlled Natural Language" (CNL) by purely symbolic methods.  That kind of translation is trivial.  Any good programmer who has taken a course on compiler design can do that.  It's not AI.  It's just good old fashioned computer science (GOFCS).
 
Bottom line:  A neural-net without a symbolic check is just a toy or a dangerous temptation for a disaster.  In order to make sure that the symbolic thing is safe, secure, and accurate, you must have a symbolic component that includes a warning or an echo to the human user about what it is about to perpetrate.
 
And by the way, the single most important course in the comp. sci  curriculum is about writing a translator from one formal language to another.  If you can do that, you can do the symbolic work that makes those NN toys do something useful.
 
John
 

From: "Marco Neumann" <marco....@gmail.com>
Sent: Saturday, July 30, 2022 7:34 AM
To: ontolo...@googlegroups.com
Subject: [ontolog-forum] Turning Natural Language Text into a SPARQL Query forWikidata with GPT3
--
All contributions to this forum are covered by an open-source license.
For information about the wiki, the license, and how to subscribe or
unsubscribe to the forum, see http://ontologforum.org/info/
---
You received this message because you are subscribed to the Google Groups "ontolog-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ontolog-foru...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ontolog-forum/CABWJn4SRMrfNJaJZgnRfceYUcYexhJo%3Dx9gFfiH%2B%3D%2B59hbp5nQ%40mail.gmail.com.
 

Marco Neumann

unread,
Jul 31, 2022, 3:49:30 AM7/31/22
to ontolo...@googlegroups.com
I am more than happy to take your advice John. For now I just call it SPARQL whac-a-mole. It helps me to formulate quick queries for wikidata. It still needs and improves very much with expert supervision of course

I remember a demo by Denny Vrandecic at SemTech 2012 in San Francisco who built a similar Voice-to-Text pipeline for SPARQL and wikidata based on work by Basil Ell and Thomas Tanon.

While you mention a curriculum John, it would be a good idea to compile your expertise and that of others in the forum into a formal curriculum. I am thinking mostly in terms of a taxonomy to capture the most important building blocks. I would think much of the content could be linked to existing literature and other readily available resources.  We could call it The Ontolog Curriculum. This would be a great resource for future researchers and interested parties.

Marco




--


---
Marco Neumann


Alex Shkotin

unread,
Jul 31, 2022, 5:02:05 AM7/31/22
to ontolo...@googlegroups.com, Kaarel Kaljurand, Norbert E. Fuchs
John,

Absolutely agree with your "symbolic check", but this process (aka verbalizing) is not trivial as you wrote "That kind of translation is trivial", otherwise any CNL2FL would have a feature FL2CNL. Is there a CL 2 CNL component?
I know of one project [1] to partially verbalize OWL 2 to CNL (ACE).
And I have in mind to check if it suitable for the ontologies of OBO [2] project, as CNL is the interlingua for all formal ontology languages.
It would be great if KK and NEF say something :-)

Alex



вс, 31 июл. 2022 г. в 07:47, John F Sowa <so...@bestweb.net>:

Alex Shkotin

unread,
Jul 31, 2022, 5:11:14 AM7/31/22
to ontolo...@googlegroups.com
Marco,

As far as I know our great OntoForum+OntoSummit community, the idea of curriculum is more suitable for IAOA [1].

Alex


вс, 31 июл. 2022 г. в 10:49, Marco Neumann <marco....@gmail.com>:

Marco Neumann

unread,
Jul 31, 2022, 7:10:13 AM7/31/22
to ontolo...@googlegroups.com
thanks Alex, but it looks like IAOA hasn't done anything on a curriculum yet. will add them to my to watch list.

Marco



--


---
Marco Neumann


alex.shkotin

unread,
Jul 31, 2022, 10:55:29 AM7/31/22
to ontolog-forum
By the way, it's a pity great DOL does not have any CNL in it's stack http://rest.hets.eu/

Alex

воскресенье, 31 июля 2022 г. в 12:02:05 UTC+3, alex.shkotin:

alex.shkotin

unread,
Jul 31, 2022, 11:46:17 AM7/31/22
to ontolog-forum
Marco,

If the Temperature is 1 you may get different code time from time - it may be interesting.

Alex 
суббота, 30 июля 2022 г. в 14:33:50 UTC+3, marco.neumann:

Elisa Kendall

unread,
Jul 31, 2022, 1:57:59 PM7/31/22
to ontolo...@googlegroups.com
Alex,

So, by CNL do you mean something like the constrained English that is represented in an informative part of SBVR?  The challenge with respect to adding controlled natural language to DOL is that we have focused our efforts on other standards (i.e., UML, CL, OWL) to the degree possible. The controlled English in SBVR is the only case I'm aware of where some sort of controlled natural language for FOL and/or some other fragment of logic has been standardized, and even in SBVR it is considered informative.  Mark Linehan and I created a mapping from the SBVR controlled English to OWL years ago, but the mapping was not included in the standard, though some of the OWL we generated was added to SBVR.

If you have references for an international standard for controlled natural language for a logic or modeling language covered by DOL, I would be happy to pass that along to the revision task force.

Best regards,

Elisa

From: ontolo...@googlegroups.com <ontolo...@googlegroups.com> on behalf of alex.shkotin <alex.s...@gmail.com>
Sent: Sunday, July 31, 2022 7:55 AM
To: ontolog-forum <ontolo...@googlegroups.com>
Subject: Re: [ontolog-forum] Turning Natural Language Text into a SPARQL Query forWikidata with GPT3
 

Adrian Walker

unread,
Jul 31, 2022, 3:18:57 PM7/31/22
to ontolog-forum
HI Elisa & All,

The following presentation is about an unconventional approach to practical question answering.


There's an online system that implements the approach.  

                     https://www.executable-english.com.

Friendly comments welcome.   Thanks,  -- Adrian

Ravi Sharma

unread,
Jul 31, 2022, 3:41:11 PM7/31/22
to ontolo...@googlegroups.com
Elisa
I was happy to be a part of your OMG SBVR efforts, especially in calendar and time as well as in ODM for UML2 and it was later also called XMI and BPMN2.
I did not update my knowledge about the CNL connection which you shared here.
Regards,
Thanks.
Ravi
(Dr. Ravi Sharma, Ph.D. USA)
Chair, Ontology Summit 2022
Senior Enterprise Architect
Particle and Space Physicist
Elk Grove CA


On Sun, Jul 31, 2022 at 10:57 AM Elisa Kendall <eken...@thematix.com> wrote:

Elisa Kendall

unread,
Jul 31, 2022, 4:20:42 PM7/31/22
to ontolo...@googlegroups.com
Hi Adrian,

Thanks for your slides. The most recent update to SBVR was in 2019, so I suspect it has changed somewhat from the version you reference. They have a considerable user base and a number of tools support it now, which has helped with streamlining and improving usability. Some of the tools, such as Trisotech's tool, support multiple natural languages and vocabularies expressed using SBVR, and they've been quite active in healthcare, among other markets. 

Having said this, I just checked, and the structured English annex is still informative, not a normative part of the standard (only 20 pages of the 300+ page specification), so I am not sure that the DOL revision task force will map to it. If your approach to executable English has been standardized by ISO or another standards development organization, please let me know and I'll pass that along so that they can consider it in their next review pass.

Best,

Elisa

From: ontolo...@googlegroups.com <ontolo...@googlegroups.com> on behalf of Adrian Walker <adrian...@gmail.com>
Sent: Sunday, July 31, 2022 12:18 PM

Adrian Walker

unread,
Jul 31, 2022, 6:41:10 PM7/31/22
to ontolog-forum, Edward Barkmeyer
Hi Elisa & List,

Many thanks for your comments.

I'm guessing that the Executable English engine is the only one available that treats rules declaratively,  in the sense that authors only need to specify "what is", rather than painstakingly programming** rule firing sequences.  This design could be an advantage for business SMEs wishing to prototype specifications in a tight loop without assistance from programmers.

Does anyone know of moves to write standards taking such a highly declarative approach into account ? 

Best regards,  -- Adrian

** An example of issues in programming procedural rules:  The rules are processed by the Jena RETE engine in forward chaining in the following manner. First, the queues of pending inserts/deletes are processed until they are empty. Then, when there is nothing more to inject, a non-monotonic rule is processed from the conflict set. During the processing of non-monotonic rule, if a triple of the head is already present in the context, it is removed -- line 171 in RETEConflictSet.java
I am unable to understand the reason for the removal. Can anyone help me on this? Won't this removal affect other rules in the conflict set?

John F Sowa

unread,
Jul 31, 2022, 11:55:25 PM7/31/22
to ontolo...@googlegroups.com
Elisa and Alex,
 
For an overview of CNLs of various kinds, see the slides "Controlled Natural Languages for Semantic Systems,  http://jfsowa.com/talks/cnl4ss.pdf
 
 
Re standards for CNLs:  The most widely used CNLs are specialized for particular applications.   If you have ever tried to make a plane reservation by telephone, you have encountered a system that expects you to use a version of controlled English.  They are fairly good for simple reservations, but if you have any complications, you have to shout "AGENT."
 
Elisa> If you have references for an international standard for controlled natural language for a logic or modeling language covered by DOL, I would be happy to pass that along to the revision task force.
 
There is a lot of R & D necessary to develop a general purpose CNL system that is easy to implement, easy to use, easy to extend, and easy to learn.by users with no previous training and by developers with a minimal knowledge of the technology under the covers,
 
Alex>  By the way, it's a pity great DOL does not have any CNL in it's stack http://rest.hets.eu/
 
It's fairly easy to implement a simple CNL for a special purpose.  But it's nontrivial to implement a good general purpose system for creating versions for an open-ended variety of applications.
 
People don't need a user manual to talk with other people.  And they don't want to read a user manual before they can talk to the computer.
 
John

John F Sowa

unread,
Aug 1, 2022, 12:09:57 AM8/1/22
to ontolo...@googlegroups.com
Marco,
 
If you have a solid understanding of the target language (SPARQL in the case), you can do the symbolic checks yourself.
 
But if the target language is complex or there's lot of it, even an expert can miss little details. Remember the programmer's song:  "Yesterday, all my data was here to stay..."
 
John

Adrian Walker

unread,
Aug 1, 2022, 1:23:57 AM8/1/22
to ontolog-forum
Hi All,

John Sowa::  It's fairly easy to implement a simple CNL for a special purpose.  But it's nontrivial to implement a good general purpose system for creating versions for an open-ended variety of applications.

Adrian Walker:  Yes, if you require the language to be controlled, you are stuck with creating and maintaining an open ended collection of grammar and dictionary versions for all human languages and jargons -- a moving target.  And you have to be smart about how the system should respond to input it cannot parse. 

However, if you leave the language mainly open,  you can write whatever you like and also avoid the grammar and dictionary maintenance problems.  That's counter-intuitive, but it's the approach taken in the online Executable English system [1].  You can see it working with many examples. and also add and run your own examples.

Apologies to folks who have seen this discussion before, but it took many years to persuade some heavyweight "thought leaders" that the system is not a CNL.

Adrian Walker
Executable English LLC  
San Jose, CA, USA
USA 860-830-2085 (California time)




--
All contributions to this forum are covered by an open-source license.
For information about the wiki, the license, and how to subscribe or
unsubscribe to the forum, see http://ontologforum.org/info/
---
You received this message because you are subscribed to the Google Groups "ontolog-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ontolog-foru...@googlegroups.com.

Alex Shkotin

unread,
Aug 1, 2022, 4:39:10 AM8/1/22
to ontolo...@googlegroups.com, Norbert E. Fuchs, Kaarel Kaljurand, Tobias Kuhn
Elisa,

For me, the benchmark for many years is the Attempto project [1] with ACE language. I am sure it will need some small enhancements (for example the possibility to point to particular vocabulary like this "mw:notdeterministic"), but I am strongly sure that great DOL should have CNL in his long list of formal languages in use.
And if I am not mistaken some years ago ACE was something like standard for EC.
Please ask Prof. Norbert E. Fuchs, Kaarel Kaljurand and others for more.
They have a nice overview of 50+ CNL English issued some years ago.
The idea of ISO Standard for Controlled English sounds great! 

Best regards,

Alex


вс, 31 июл. 2022 г. в 20:57, Elisa Kendall <eken...@thematix.com>:

Alex Shkotin

unread,
Aug 1, 2022, 5:08:32 AM8/1/22
to Norbert E. Fuchs, ontolo...@googlegroups.com, Kaarel Kaljurand
Colleagues,

My question to the ACE-team was to evaluate this JFS point "Since the formal language is precisely defined, it is possible to translate it to a formally defined "Controlled Natural Language" (CNL) by purely symbolic methods.  That kind of translation is trivial."
As they really have this kind of experience.
And by the way one way to add ACE to DOL is to create ACE2CASL and CASL2ACE components.
May I point out one more time? In OBO foundry there are a lot of huge ontologies and one way to check it by experts is to convert it to CNL.
Actually any OWL-axiom should have a reference to a book, article where formalization comes from.

Alex

вс, 31 июл. 2022 г. в 14:59, Norbert E. Fuchs <fu...@ifi.uzh.ch>:
> It would be great if KK and NEF say something :-)

Alex

Here are NEF's comments.

1. As you wrote, Attempto Controlled English (ACE) offers the bidirectional translation "subset of ACE <-> OWL2". Since ACE is more powerful than OWL only a subset of it can be translated into OWL.

2. As an answer to John Sowa's mail: The ACE parser APE generates not only a translation of ACE texts in the first-order logic language DRS (ACE –> DRS), but also an independently generated paraphrase in ACE, i.e. DRS –> ACE.  In some cases the paraphrase is identical to the input. In other cases the paraphrase uses syntactically different, but semantically identical ACE constructs. The parser and the paraphraser together constitute a bidirectional translation ACE <–> ACE.

Regards.

   --- nef



> On 31 Jul 2022, at 11:01 , Alex Shkotin <alex.s...@gmail.com> wrote:
>
> John,
>
> Absolutely agree with your "symbolic check", but this process (aka verbalizing) is not trivial as you wrote "That kind of translation is trivial", otherwise any CNL2FL would have a feature FL2CNL. Is there a CL 2 CNL component?
> I know of one project [1] to partially verbalize OWL 2 to CNL (ACE).
> And I have in mind to check if it suitable for the ontologies of OBO [2] project, as CNL is the interlingua for all formal ontology languages.
> It would be great if KK and NEF say something :-)
>
> Alex
>
> [1] http://attempto.ifi.uzh.ch/site/docs/verbalizing_owl_in_controlled_english.html
> [2] https://obofoundry.org/
>
>
> вс, 31 июл. 2022 г. в 07:47, John F Sowa <so...@bestweb.net>:
> Marco,

> I agree that the GPT-3 technology, which is capable of doing translations from one language to another, can be used to translate a natural language to a formal language, such as SPARQL, or SQL, or Common Logic, or many other kinds of notations.

> That is indeed useful.  But as I said, it must be supported by some symbolic checks on its accuracy.  For a query language, the most important check must be an echo, which translates the target language (SPARQL, SQL, CL...) back to some NL sentence or paragraph.  Then before it runs the query or the command, it must ask one simple question:  "Is this what you mean?"

> Since the formal language is precisely defined, it is possible to translate it to a formally defined "Controlled Natural Language" (CNL) by purely symbolic methods.  That kind of translation is trivial.  Any good programmer who has taken a course on compiler design can do that.  It's not AI.  It's just good old fashioned computer science (GOFCS).

> Bottom line:  A neural-net without a symbolic check is just a toy or a dangerous temptation for a disaster.  In order to make sure that the symbolic thing is safe, secure, and accurate, you must have a symbolic component that includes a warning or an echo to the human user about what it is about to perpetrate.

> And by the way, the single most important course in the comp. sci  curriculum is about writing a translator from one formal language to another.  If you can do that, you can do the symbolic work that makes those NN toys do something useful.

> John

Marco Neumann

unread,
Aug 1, 2022, 5:28:13 AM8/1/22
to ontolo...@googlegroups.com
John, yes of course that's a good point, data does dynamically change especially on the web and I suspect updates to the LLM will be expensive again.

The text to SPARQL query (Semantic SPARQL parsing/KGQQ) work is an active field of research and Ricardo Usbeck made me aware of a few recent results in attempting to evaluate these systems:

Modern Baselines for SPARQL Semantic Parsing
Debayan Banerjee, Pranav Ajit Nair, Jivat Neet Kaur, Ricardo Usbeck, Chris Biemann
2022
https://dl.acm.org/doi/pdf/10.1145/3477495.3531841


Marco





--
All contributions to this forum are covered by an open-source license.
For information about the wiki, the license, and how to subscribe or
unsubscribe to the forum, see http://ontologforum.org/info/
---
You received this message because you are subscribed to the Google Groups "ontolog-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ontolog-foru...@googlegroups.com.


--


---
Marco Neumann


Alex Shkotin

unread,
Aug 1, 2022, 5:47:43 AM8/1/22
to ontolo...@googlegroups.com
as http://rest.hets.eu/ does not work, let me give another one http://hets.eu/


вс, 31 июл. 2022 г. в 17:55, alex.shkotin <alex.s...@gmail.com>:

Alex Shkotin

unread,
Aug 1, 2022, 6:10:26 AM8/1/22
to ontolog-forum, Norbert E. Fuchs
Colleagues,

This letter from Prof. Fuchs was bounced by google-group and I forward it on my own.

Alex

---------- Forwarded message ---------
От: Norbert E. Fuchs <fu...@ifi.uzh.ch>
Date: вс, 31 июл. 2022 г. в 14:59
Subject: Re: [ontolog-forum] Turning Natural Language Text into a SPARQL Query forWikidata with GPT3
To: Alex Shkotin <alex.s...@gmail.com>
Cc: <ontolo...@googlegroups.com>, Kaarel Kaljurand <kalj...@gmail.com>


> It would be great if KK and NEF say something :-)

Alex

Here are NEF's comments.

1. As you wrote, Attempto Controlled English (ACE) offers the bidirectional translation "subset of ACE <-> OWL2". Since ACE is more powerful than OWL only a subset of it can be translated into OWL.

2. As an answer to John Sowa's mail: The ACE parser APE generates not only a translation of ACE texts in the first-order logic language DRS (ACE –> DRS), but also an independently generated paraphrase in ACE, i.e. DRS –> ACE.  In some cases the paraphrase is identical to the input. In other cases the paraphrase uses syntactically different, but semantically identical ACE constructs. The parser and the paraphraser together constitute a bidirectional translation ACE <–> ACE.

Regards.

   --- nef


> On 31 Jul 2022, at 11:01 , Alex Shkotin <alex.s...@gmail.com> wrote:
>
> John,
>
> Absolutely agree with your "symbolic check", but this process (aka verbalizing) is not trivial as you wrote "That kind of translation is trivial", otherwise any CNL2FL would have a feature FL2CNL. Is there a CL 2 CNL component?
> I know of one project [1] to partially verbalize OWL 2 to CNL (ACE).
> And I have in mind to check if it suitable for the ontologies of OBO [2] project, as CNL is the interlingua for all formal ontology languages.
> It would be great if KK and NEF say something :-)
>
> Alex
>
> [1] http://attempto.ifi.uzh.ch/site/docs/verbalizing_owl_in_controlled_english.html
> [2] https://obofoundry.org/
>
>
> вс, 31 июл. 2022 г. в 07:47, John F Sowa <so...@bestweb.net>:
> Marco,

> I agree that the GPT-3 technology, which is capable of doing translations from one language to another, can be used to translate a natural language to a formal language, such as SPARQL, or SQL, or Common Logic, or many other kinds of notations.

> That is indeed useful.  But as I said, it must be supported by some symbolic checks on its accuracy.  For a query language, the most important check must be an echo, which translates the target language (SPARQL, SQL, CL...) back to some NL sentence or paragraph.  Then before it runs the query or the command, it must ask one simple question:  "Is this what you mean?"

> Since the formal language is precisely defined, it is possible to translate it to a formally defined "Controlled Natural Language" (CNL) by purely symbolic methods.  That kind of translation is trivial.  Any good programmer who has taken a course on compiler design can do that.  It's not AI.  It's just good old fashioned computer science (GOFCS).

> Bottom line:  A neural-net without a symbolic check is just a toy or a dangerous temptation for a disaster.  In order to make sure that the symbolic thing is safe, secure, and accurate, you must have a symbolic component that includes a warning or an echo to the human user about what it is about to perpetrate.

> And by the way, the single most important course in the comp. sci  curriculum is about writing a translator from one formal language to another.  If you can do that, you can do the symbolic work that makes those NN toys do something useful.

> John

Reply all
Reply to author
Forward
0 new messages