Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Accelerating Knowledge Graph and Ontology Engineering with Large Language Models

160 views
Skip to first unread message

Pascal Hitzler

unread,
Nov 14, 2024, 1:45:38 PM11/14/24
to ontolog-forum

Given the currently ongoing ISWC2024 conference and all the discussions around this neurosymbolic topic: Link to our (with Cogan Shimizu) position paper on this: https://kastle-lab.github.io/assets/publications/2024-LLMs4KGOE.pdf

The developments are really exciting!

Pascal.

-- 
Pascal Hitzler
Lloyd T. Smith Creativity in Engineering Chair
Director, Center for AI and Data Science CAIDS
Director, Inst. for Digital Agriculture and Adv. Analyt. ID3A
Kansas State University   http://www.pascal-hitzler.de
http://www.daselab.org    http://www.semantic-web-journal.net
http://k-state.edu/ID3A   https://neurosymbolic-ai-journal.com

John F Sowa

unread,
Nov 16, 2024, 9:19:13 PM11/16/24
to ontolo...@googlegroups.com, CG
Pascal,

I just read your paper (cited below).   I agree that LLM technology is good for finding important and valuable information.  But as you know, there are serious issues about evaluating that information to avoid irrelevant, erroneous, or even hallucinogenic data.  I didn't see much attention devoted to evaluation and testing.

As I often mention, our old VivoMind company was doing large volumes of high-speed  knowledge extraction, analysis, evaluation, and processing over 20 years ago.  For a description of that system with some examples of large applications, see https://jfsowa.com/tallks/cogmem.pdf .  The systems described there are just a small sample of the applications, since our customers do not want their data or methods publicized.

I also noticed that you are using OWL for ontology.  We use a high-speed version of Prolog, which is much richer, more powerful, and faster than OWL,  which implements a tiny subset of the logic that Tim Berners-Lee had proposed for the Semantic Web.

Some of our customers were among the sponsors of the IKRIS project, funded from 2004 to 2006, to support a much larger and more powerful version of what Tim BL had proposed.  For an overview of IKRIS with links to some of the original publications, see https://jfsowa.com/ikl .

The IKL technology does not replace LLM, but it is valuable for evaluating the results generated by LLM, detecting errors and avoiding irrelevant, erroneous, or even hallucinogenic data.  When processing high volumes of data at high speed, human checking is not possible.  High quality computer checking is necessary to eliminate 99% or more of the bad or even dangerous data.

Human checking would only be required for the tiny percentage of data for which the computational methods are uncertain.  For a more recent talk, see https://jfsowa.com/talks/eswc.pdf .

John

 


From: "Pascal Hitzler' via ontolog-forum" <ontolo...@googlegroups.com>

Mike Bergman

unread,
Nov 16, 2024, 10:10:53 PM11/16/24
to ontolo...@googlegroups.com

Hi John,

Why do you consistently do this? Someone (Hitzler) presents a link that may be of interest (or not) to others, and then you proceed to dismiss the points in the paper? I truly do not understand why you persist in denigrating or dismissing the points made. Further, these attacks are often accompanied by your own self-references to your own posts or for-profit endeavors.

From my perspective, I would prefer that you engage on substantive discussions or refrain from denigration and self-promotion. Why do you continue to do this?

You know you have been banned from the Peirce list for these tendencies. Just let this stuff ride. If you have a positive point to offer, I'm all ears and would like to hear it. But I really dislike your tendency to pedantically dismiss points of view for which you may disagree.

(I would prefer it to be the) Best, Mike

--
All contributions to this forum are covered by an open-source license.
For information about the wiki, the license, and how to subscribe or
unsubscribe to the forum, see http://ontologforum.org/info
---
You received this message because you are subscribed to the Google Groups "ontolog-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ontolog-foru...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/ontolog-forum/9eb9de13e607433db37275710e71d00d%4097de68b4a5654ca0933eba435b9eb3ae.
-- 
__________________________________________

Michael K. Bergman
319.621.5225
http://mkbergman.com
http://www.linkedin.com/in/mkbergman
__________________________________________ 

John F Sowa

unread,
Nov 17, 2024, 9:38:56 PM11/17/24
to ontolo...@googlegroups.com
Mike,

I did not dismiss Pascal's article.  I did what I hope other people would do about any or all articles or notes that I write:  point out issues that require revisions and corrections.   And I love citations.  If somebody finds any of my mistakes, I would love to get as many examples and citations as they can find.

I've known Pascal for years, and I have seen a great deal of his work that i believe is quite good.  I have cited some of his work in talks and slides that I have presented.  But the entire field of publications about GPT and related systems is riddled with claims about 80 or 90% accuracy.
 
Unfortunately, many of the applications are to subjects where traditional programming technology demands 100% accuracy (or at lease 99.99% accuracy with fallback methods for catching and correcting errors).  Banking and tax collecting are typical example.   Airlines and space flights are another.  Nobody would fly in a plane that is controlled by GPT

But I keep emphasizing a critical issue about LLMs:  They are good at finding hypotheses that may or may not be correct.  The technical term is abduction.  When an expert uses LLMs to make guesses or to find previous research, that expert does the evaluations by means of deductions and related methods.

But widespread use of the technology by people who do not have detailed knowledge of the subject can lead to disasters.  Many people have been burned by trusting the technology.  I cited a great many references that I hope would be beneficial to Pascal,  his coauthor, and other colleagues.

John
 


From: "Mike Bergman" <mi...@mkbergman.com>
Sent: 11/16/24 10:11 PM
To: ontolo...@googlegroups.com
Subject: Re: [ontolog-forum] Accelerating Knowledge Graph and Ontology Engineering with Large Language Models

Hi John,

Why do you consistently do this? Someone (Hitzler) presents a link that may be of interest (or not) to others, and then you proceed to dismiss the points in the paper? I truly do not understand why you persist in denigrating or dismissing the points made. Further, these attacks are often accompanied by your own self-references to your own posts or for-profit endeavors.

From my perspective, I would prefer that you engage on substantive discussions or refrain from denigration and self-promotion. Why do you continue to do this?

(I would prefer it to be the) Best, Mike

Alex Shkotin

unread,
Nov 18, 2024, 5:44:00 AM11/18/24
to ontolo...@googlegroups.com

John,


A kind of knowledge processing genAI can do is not only abduction. 

Look at this example [1] where I have talked with Gemini in Russian but asked to translate one part in English.

I am not about translation itself, though for me this is not abduction. I am about knowledge processing by Gemini.


Alex

[1] https://gemini.google.com/app/a59f16a9a7044419

THE TEXT IS GEMINI PRODUCTION

English Translation of the Problem and Solution

Problem: Let's work with only natural numbers. One number is 35 more than another number. A third number is such that, if it is subtracted from the first number and added to the second number, the first and second numbers become equal. What is the third number?

Solution: Let's denote:

  • First number: A

  • Second number: B

  • Third number: C

Based on the problem, we can form the following equations:

  • A = B + 35 (since the first number is 35 more than the second)

  • A - C = B + C (if the third number is subtracted from the first and added to the second, they become equal)

Substituting the first equation into the second:

  • (B + 35) - C = B + C

Simplifying the equation:

  • 35 - C = C

  • 2C = 35

  • C = 35 / 2 = 17.5

However, we are only considering natural numbers. 17.5 is not a natural number.

Conclusion: There is no natural number that satisfies the given conditions.

Why is this the case? Imagine two numbers on a number line. If one number is 35 more than the other, to make them equal, we need to "shift" the difference (35) equally between them. In this case, that would be 35/2 = 17.5. But since we are limited to natural numbers, we cannot "split" this difference evenly.

Answer: There is no such natural number.

Additional Notes: If we were considering all rational numbers (fractions), the answer would be 17.5. However, since the problem is restricted to natural numbers, there is no exact solution.

It's possible that there might be a typo or error in the original problem statement. Please double-check the given information.



пн, 18 нояб. 2024 г. в 05:38, John F Sowa <so...@bestweb.net>:

Mike Bergman

unread,
Nov 18, 2024, 11:56:44 AM11/18/24
to ontolo...@googlegroups.com, John F Sowa

John,

As best as I can tell, your strawman points in this response and your first email are completely orthogonal to the subject of the Shimizua and Hitzler paper: namely, the use of LLMs to aid the engineering of knowledge graphs. There is no discussion of the use of LLMs for the applications you cite, nor are there any claims as to LLM accuracies. Further, there are no claims about removing humans from the loop when employing LLMs. Your "great many references" in fact were only four of your own links.

The paper itself provides this summation with which I complete agree and is counter to your inaccurate assertions: "While at this point in time, their [LLM] reliability in terms of accuracy of content in their responses remains problematic, it is quite apparent and widely reported that working with an LLM can save significant time and effort provided there is a (human) topic expert available as a check on factual accuracy." The accurate assessment of the paper is that LLMs can semi-automatically speed ontology development and engineering using a modular design and human interaction.

Did you even read this paper?

As an ontology developer myself, I agree with Pascal that the emerging use of LLMs for ontology engineering is indeed "really exciting!" In my view, your snarky comments are misleading and totally miss the point.

Mike

Ray Martin

unread,
Nov 18, 2024, 4:37:18 PM11/18/24
to ontolo...@googlegroups.com, John F Sowa
I have found this to be so much of the case that I barely follow ontolog forum any longer.  So tiring - one fellow always tooting his horn and putting everyone else down. So tiresome! And then come across as though he does not know he is annoying. Cannot be that oblivious.

Sent from my iPhone

On Nov 18, 2024, at 11:56 AM, Mike Bergman <mi...@mkbergman.com> wrote:



John F Sowa

unread,
Nov 18, 2024, 7:19:56 PM11/18/24
to ontolo...@googlegroups.com
Ray and Mike,

I am not tooting my own horn, and I am not asking anyone to trust or believe anything I say.  I emphasize the high value of LLM-based technology for what it does, but I also emphasize the DANGER of using it without proper evaluation.

Re the article by Shimizua and Hitzler:  Yes, I did see that they expect humans to do the evaluation.   That is essential for small-scale applications for which knowledgeable and careful humans can do the checking.  But human checking of computer systems is highly unreliable.  Most people don't have the skills, even highly skilled people make mistakes, and there are very good hybrid methods that can do the evaluation far faster than any human or group of humans.

Please note that I do have high praise for systems that implement methods for doing the evaluation.  Among them are systems by Wolfram and Kingsley Idehen.  Wolfram uses LLM methods for translating English to the formal notation of his system.   The system is based on precise, powerful, and secure mathematical methods, and the LLMs support a user-friendly front end.

Kingsely devoted years to developing his technology, and he adopted LLMs to support a safe, secure, and easy to use interface.  Please note that Kingsley and I may emphasize different aspects, but we have always come to an agreement after one or two email exchanges.  Just look at our emails.

As for my slides, DO NOT TRUST ME.  Read the citations by the authors that I cite.  They go into much more detail about very important issues.  I don't care whether or not you use any technology that my colleagues and I have developed.  But I believe that you should consider the experts I cite.  Except for my coauthor, Arun Majumdar, I don't have any vested interest in any of the ideas or tools that other experts have discovered or invented.

And Mike, I never snark.  I sincerely believe that every warning I post is essential for helping people evaluate ongoing developments.  There are many people at Google and other companies that are becoming disillusioned with the hype that surrounds technology that they themselves have been developing.

Fundamental principle:   It's impossible to understand the potential for any technology if you don't recognize it's limitations.  In this regard the AGI proponents are the worst offenders.  Put the blame where it belongs.

John

PS:  I will now do some tooting of my own horn.   You can ignore everything below.

1. My book Conceptual Structures, which went on sale at the 1983 IJCAI, was the first publication in AI that introduced the word ontology,   Before that, AI researchers used the word epistemology for the study of  knowledge.   Today, you won't find that word used in AI or computer science.

2. In 1987, I taught a graduate-level course in the Stanford computer science department.  It could also be taken for credit by students in the linguistics department.  At the end of each course, students could fill out questionnaires for evaluating courses and instructors.  Following is the description of the course in the Stanford catalog and the results in comparison to other courses:  https://jfsowa.com/pubs/su309a.pdf  .  Note the comparison of my ratings to the average for the computer science department,

3.  In the following article, I praise some methods and strongly criticize others:  Fads and Fallacies about Logic, https://jfsowa.com/pubs/fflogic.pdf .  It was published in a journal for which Jim Hendler was the editor.   He knew that I was critical of OWL, and he thought that he would hate my article.  But he was pleasantly surprised, and he said that he liked it and agreed with it,  He admitted that OWL was far weaker than the proposal by Tim Berners Lee.

4. I wrote a review article that cites a large number of projects in AI and related areas of computer science from the 1980s, the Semantic Web, and later.  I give short summaries of each with references to the original articles and follow-ons.  I emphasize the importance of my citations to the original sources.  You don't have to believe anything I say.  Just read the original articles. 
 


From: "Ray Martin" <marsa...@gmail.com>

John F Sowa

unread,
Nov 19, 2024, 5:21:02 PM11/19/24
to ontolo...@googlegroups.com
Alex,

I agree that there are many hybrid systems that combine LLMs with more traditional kinds of processing.

Google developed the LLM technology for machine translation (MT), and that is its best and most reliable application.  Since the source and target language are just one step away, the error rate is quite low.

However, the LLM algorithms cannot be used without further processing even for MT in cases where absolute accuracy is essential.  Critical issues are translations at the UN and EU.  International treaties must use exact translations of tiny details.

For many applications of LLM technology, hybrid methods are used, which can make errors in certain kinds of combinations,    Your example of arithmetic is one such special case.  Another example involves problems on exams, where almost every textbook uses very similar methods. As a result the computer gets a score of 98% on an exam.  That's good enough for an A.

But that 2% error, on a design of an airplane or a rocket, could cause a disaster.     A score of 98% can be fatal. 

Another serious problem is that the larger the volume of text that is processed, the more likely that some weird or unusual application somewhere might insert some strange little item.  That one item might then link to some other strange application.  After a few steps, you can get a serious error.  Hallucinations are extremely bad, but they are obviously wrong.  But those tiny little things often go unnoticed.  They  can be more dangerous than a hallucination.

John
 


From: "Alex Shkotin" <alex.s...@gmail.com>

mco...@boninc.com

unread,
Nov 22, 2024, 9:49:07 PM11/22/24
to ontolo...@googlegroups.com

Thank you for the link.  Much appreciated for such paper.    Recently, I’m experimenting to augment GPT4.0 with Medical Subject Headings RDF. 

 

My Coyne

 

 

--

All contributions to this forum are covered by an open-source license.
For information about the wiki, the license, and how to subscribe or
unsubscribe to the forum, see http://ontologforum.org/info
---
You received this message because you are subscribed to the Google Groups "ontolog-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ontolog-foru...@googlegroups.com.

Mike Bergman

unread,
Nov 27, 2024, 10:13:14 PM11/27/24
to ontolo...@googlegroups.com

Hi Cogan and Pascal,

The potential for LLMs to enhance ontologies and knowledge graphs is, as you say, "really exciting". The fact you have done this work means you understand much about the status of this space. I have one direct and three implied questions about your findings:

  1. I like the coverage of ontology engineering tasks. Do some of these areas stand out as more ready for LLMs than others?

The next three are not the direct basis of your paper but given the scope of research may be something for which you have opinions:

  1. What exposure did you have to how ontologies/knowledge graphs may themselves be a source to fine-train LLMs (I'm thinking for domain-specific tasks)?
  2. Are LLMs being used for mapping between KGs?
  3. What are your comments about modularity architectures? We have the ecosystem model of the OBO biomedical ontologies (also others such as OMG or GLAM); with KBpedia, we have a 'typology' approach; or other modularity models?

These are broader scope questions that may take much time to better understand. The questions likely also warrant multiple threads if they are to be pursued to any meaningful degree. But it would be wonderful to hear your thoughts.

Best, Mike

Pascal Hitzler

unread,
Nov 29, 2024, 3:29:41 PM11/29/24
to ontolo...@googlegroups.com, Shimizu, Cogan Matthew
On 11/22/2024 11:01 PM, Mike Bergman wrote:
> Hi Cogan and Pascal,
>
> The potential for LLMs to enhance ontologies and knowledge graphs is, as
> you say, "really exciting". The fact you have done this work means you
> understand much about the status of this space. I have one direct and
> three implied questions about your findings:
>
> 1. I like the coverage of ontology engineering tasks. Do some of these
> areas stand out as more ready for LLMs than others?

Well the results in https://arxiv.org/abs/2404.10329 make me really
excited about complex ontology alignment (provided you have modularity).
To me, it was rather surprising that it worked so well - but of course
follow-up investigations are needed. (And for simple alignments LLMs
seem to be doing as well as the state of the art other methods.) Things
may be less clear for niche subjects ...

I'm also rather optimistic about making inroads in LLM-assisted ontology
building as well as population given our preliminary work, but of course
it remains to be seen how much work can really be saved if the goal is
to obtain quality ontologies. And niche subjects may still be a roblem.

For entity disambiguation/co-reference resolution I think the road ahead
is a bit longer. With ontologies built from (good!) micropatterns, plus
these micropatterns endowed with some additional info helpful for
disambiguation (once we know what that would be). And there's again the
niche subjects problem...

> The next three are not the direct basis of your paper but given the
> scope of research may be something for which you have opinions:
>
> 1. What exposure did you have to how ontologies/knowledge graphs may
> themselves be a source to fine-train LLMs (I'm thinking for domain-
> specific tasks)?

I think that's kind-of orthogonal to the work discussed in the position
paper. In the end, any useful data can be helpful for fine-tuning.
(Well-done) ontologies/knowledge graphs should provide an advantage
simply because it's data that's generally easier to re-use for a new
purpose (in this case, fine-tuning).

> 2. Are LLMs being used for mapping between KGs?

Well that's the ontology alignment theme above for the schema
(underlying ontology) and the disambiguation as mentioned above once you
have the schema mapped. It you're looking to establish new (named)
relationships then I believe what's known as "link prediction" is also
relevant, and there's a lot of work on that including deep learning (not
only LLMs I believe). In the end, realistic data+schema mapping will
need all of it, the splitting into different aspects is partially just a
good vehicle to think more systematically about this, and to structure
research.

> 3. What are your comments about modularity architectures? We have the
> ecosystem model of the OBO <https://obofoundry.org/> biomedical
> ontologies (also others such as OMG <https://www.omg.org/> or GLAM
> <https://github.com/ncarboni/awesome-GLAM-semweb>); with KBpedia, we
> have a 'typology <https://kbpedia.org/docs/30-typologies/>' approach
> <https://kbpedia.org/docs/30-typologies/>; or other modularity models?

I'm just not too familiar with these, so should better not venture out
of my comfort zone here :)

Best Regards,

Pascal.

> These are broader scope questions that may take much time to better
> understand. The questions likely also warrant multiple threads if they
> are to be pursued to any meaningful degree. But it would be wonderful to
> hear your thoughts.
>
> Best, Mike
>
> On 11/20/2024 6:51 AM, mco...@boninc.com wrote:
>>
>> Thank you for the link.  Much appreciated for such paper.    Recently,
>> I’m experimenting to augment GPT4.0 with Medical Subject Headings RDF.
>>
>> My Coyne
>>
>> *From: *'Pascal Hitzler' via ontolog-forum <ontolog-
>> fo...@googlegroups.com>
>> *Date: *Thursday, November 14, 2024 at 1:45 PM
>> *To: *ontolog-forum <ontolo...@googlegroups.com>
>> *Subject: *[ontolog-forum] Accelerating Knowledge Graph and Ontology
>> Engineering with Large Language Models
>>
>> Given the currently ongoingISWC2024 conference and all the discussions
>> around this neurosymbolic topic: Link to our (with Cogan Shimizu)
>> position paper on this: https://kastle-lab.github.io/assets/
>> publications/2024-LLMs4KGOE.pdf
>>
>> The developments are really exciting!
>>
>> Pascal.
>>
>> --
>> Pascal Hitzler
>> Lloyd T. Smith Creativity in Engineering Chair
>> Director, Center for AI and Data Science CAIDS
>> Director, Inst. for Digital Agriculture and Adv. Analyt. ID3A
>> Kansas State Universityhttp://www.pascal-hitzler.de
>> http://www.daselab.org http://www.semantic-web-journal.net
>> http://k-state.edu/ID3A https://neurosymbolic-ai-journal.com
>>
>> --
>> All contributions to this forum are covered by an open-source license.
>> For information about the wiki, the license, and how to subscribe or
>> unsubscribe to the forum, see http://ontologforum.org/info
>> ---
>> You received this message because you are subscribed to the Google
>> Groups "ontolog-forum" group.
>> To unsubscribe from this group and stop receiving emails from it, send
>> an email to ontolog-foru...@googlegroups.com.
>> To view this discussion visit https://groups.google.com/d/msgid/
>> ontolog-forum/b7296e88-cc50-4abf-a441-4e3768fb71a4%40googlemail.com
>> <https://groups.google.com/d/msgid/ontolog-forum/b7296e88-cc50-4abf-
>> a441-4e3768fb71a4%40googlemail.com?utm_medium=email&utm_source=footer>.
>>
>> --
>> All contributions to this forum are covered by an open-source license.
>> For information about the wiki, the license, and how to subscribe or
>> unsubscribe to the forum, see http://ontologforum.org/info
>> ---
>> You received this message because you are subscribed to the Google
>> Groups "ontolog-forum" group.
>> To unsubscribe from this group and stop receiving emails from it, send
>> an email to ontolog-foru...@googlegroups.com.
>> To view this discussion visit https://groups.google.com/d/msgid/
>> ontolog-forum/
>> DM6PR05MB60104451CD82A2CE33E647FDF4212%40DM6PR05MB6010.namprd05.prod.outlook.com <https://groups.google.com/d/msgid/ontolog-forum/DM6PR05MB60104451CD82A2CE33E647FDF4212%40DM6PR05MB6010.namprd05.prod.outlook.com?utm_medium=email&utm_source=footer>.
>
> --
> __________________________________________
>
> Michael K. Bergman
> 319.621.5225
> http://mkbergman.com
> http://www.linkedin.com/in/mkbergman
> __________________________________________
>
> --
> All contributions to this forum are covered by an open-source license.
> For information about the wiki, the license, and how to subscribe or
> unsubscribe to the forum, see http://ontologforum.org/info <http://
> ontologforum.org/info>
> ---
> You received this message because you are subscribed to the Google
> Groups "ontolog-forum" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to ontolog-foru...@googlegroups.com <mailto:ontolog-
> forum+un...@googlegroups.com>.
> To view this discussion visit https://groups.google.com/d/msgid/ontolog-
> forum/2bf22fa8-42ab-4a58-87ad-16c3706af575%40mkbergman.com <https://
> groups.google.com/d/msgid/ontolog-
> forum/2bf22fa8-42ab-4a58-87ad-16c3706af575%40mkbergman.com?
> utm_medium=email&utm_source=footer>.

Mike Bergman

unread,
Dec 3, 2024, 11:28:15 AM12/3/24
to ontolo...@googlegroups.com, Shimizu, Cogan Matthew
Hi Pascal,

Thanks for your thoughtful responses to these questions. Please see below.
OK; thanks.
>
>> The next three are not the direct basis of your paper but given the
>> scope of research may be something for which you have opinions:
>>
>>  1. What exposure did you have to how ontologies/knowledge graphs may
>>     themselves be a source to fine-train LLMs (I'm thinking for domain-
>>     specific tasks)?
>
> I think that's kind-of orthogonal to the work discussed in the
> position paper. In the end, any useful data can be helpful for
> fine-tuning. (Well-done) ontologies/knowledge graphs should provide an
> advantage simply because it's data that's generally easier to re-use
> for a new purpose (in this case, fine-tuning).

Yes, I knew the question was somewhat orthogonal, but it is also the
case that 'KG-enhanced LLMs' are also mentioned as a complement to
'LLM-augmented KGs' (your focus) in many of the state-of-the-art review
papers. I did some poking in my own references and here are some that
address the 'KG-enhanced LLMs' topic; I only provide examples and quotes
from sources that everyone can access themselves. The Pan et al. article
marked with an asterisk (*) is cited in your own paper:

Dai, X., Hua, Y., Wu, T., Sheng, Y., & Qi, G. (2024). Counter-intuitive:
Large Language Models Can Better Understand Knowledge Graphs Than We
Thought (arXiv:2402.11541). arXiv. http://arxiv.org/abs/2402.11541

"Contrary to our initial expectations, our analysis revealed that LLMs
effectively handle messy, noisy, and linearized KG knowledge,
outperforming methods that employ well-designed natural language (NL)
textual prompts."

Kau, A., He, X., Nambissan, A., Astudillo, A., Yin, H., & Aryani, A.
(2024). Combining Knowledge Graphs and Large Language Models
(arXiv:2407.06564). arXiv. http://arxiv.org/abs/2407.06564

"This work collected 28 papers outlining methods for KG-powered LLMs,
LLM-based KGs, and LLM-KG hybrid approaches. . . . One of the
significant strengths we have identified is the performance improvement
brought by the utilisation of KGs and LLMs in a joint fashion,
especially in the knowledge-driven domain. Models combining KGs and LLMs
typically display a better semantic understanding of knowledge, thus
enabling them to perform tasks like entity typing better."

Pan, J. Z., Razniewski, S., Kalo, J.-C., Singhania, S., Chen, J.,
Dietze, S., Jabeen, H., Omeliyanenko, J., Zhang, W., Lissandrini, M.,
Biswas, R., de Melo, G., Bonifati, A., Vakaj, E., Dragoni, M., & Graux,
D. (2023, August 11). Large Language Models and Knowledge Graphs:
Opportunities and Challenges. arXiv.Org. https://arxiv.org/abs/2308.06374v1

"Firstly, KGs can be used as training data for LLMs. Secondly, triples
in KGs can be used for prompt construction. Last but not least, KGs can
be used as external knowledge in retrieval augmented language models."

* Pan, S., Luo, L., Wang, Y., Chen, C., Wang, J., & Wu, X. (2023, June
14). Unifying Large Language Models and Knowledge Graphs: A Roadmap.
arXiv.Org. https://arxiv.org/abs/2306.08302v2

"In this article, we present a forward-looking roadmap for unifying both
LLMs and KGs, to leverage their respective strengths and overcome the
limitations of each approach, for various downstream tasks."

[Section 4 is especially helpful.]

Here is another less-useful reference:

Li, X., Zhao, R., Chia, Y. K., Ding, B., Bing, L., Joty, S., & Poria, S.
(2023, May 22). Chain of Knowledge: A Framework for Grounding Large
Language Models with Structured Knowledge Bases. arXiv.Org.
https://arxiv.org/abs/2305.13269v1

>
>>  2. Are LLMs being used for mapping between KGs?
>
> Well that's the ontology alignment theme above for the schema
> (underlying ontology) and the disambiguation as mentioned above once
> you have the schema mapped. It you're looking to establish new (named)
> relationships then I believe what's known as "link prediction" is also
> relevant, and there's a lot of work on that including deep learning
> (not only LLMs I believe). In the end, realistic data+schema mapping
> will need all of it, the splitting into different aspects is partially
> just a good vehicle to think more systematically about this, and to
> structure research.
OK; thanks.
>
>>  3. What are your comments about modularity architectures? We have the
>>     ecosystem model of the OBO <https://obofoundry.org/> biomedical
>>     ontologies (also others such as OMG <https://www.omg.org/> or GLAM
>>     <https://github.com/ncarboni/awesome-GLAM-semweb>); with KBpedia, we
>>     have a 'typology <https://kbpedia.org/docs/30-typologies/>' approach
>>     <https://kbpedia.org/docs/30-typologies/>; or other modularity
>> models?
>
> I'm just not too familiar with these, so should better not venture out
> of my comfort zone here :)

I've gone back and looked more closely at what you and Cogan have
described in other papers as the MOMo methodology. As I basically
understand the approach, each sub-module is a more-or-less
self-contained construct of classes, relations and attributes that are
potentially reusable across multiple ontologies. These can be either
like domain slices (health v economics as an example, what we in our own
work have called 'typologies') or activity slices (trajectories for
goods movements as a example). Is that basically correct? Is there a
link that might show a diagram of multiple modules combined into an
overall ontology?

Given that one of your key findings is the usefulness of modularity,
this seems to be an important design aspect when combining LLMs and KGs.

Thanks!

Best, Mike

Pascal Hitzler

unread,
Dec 3, 2024, 5:08:47 PM12/3/24
to ontolo...@googlegroups.com, Mike Bergman, Shimizu, Cogan Matthew
It seems to me that the 'KG-enhanced LLMs' theme is mostly driven by RAG
type of approaches, right now, but that stimulates the general idea that
other combination approaches may also be promising. I'm not so familiar
with that corner - from the references you give, it seems that you know
more about it than I do :)

On this:

>>> 3. What are your comments about modularity architectures? We have the
>>> ecosystem model of the OBO <https://obofoundry.org/> biomedical
>>> ontologies (also others such as OMG <https://www.omg.org/> or GLAM
>>> <https://github.com/ncarboni/awesome-GLAM-semweb>); with
KBpedia, we
>>> have a 'typology <https://kbpedia.org/docs/30-typologies/>'
approach
>>> <https://kbpedia.org/docs/30-typologies/>; or other modularity
>>> models?
>>
>> I'm just not too familiar with these, so should better not venture out
>> of my comfort zone here :)
>
> I've gone back and looked more closely at what you and Cogan have
> described in other papers as the MOMo methodology. As I basically
> understand the approach, each sub-module is a more-or-less self-
> contained construct of classes, relations and attributes that are
> potentially reusable across multiple ontologies.

Yes although re-use doesn't have to be entirely verbatim. I believe that
most re-uses may need adjustments.

> These can be either
> like domain slices (health v economics as an example, what we in our own
> work have called 'typologies') or activity slices (trajectories for
> goods movements as a example).

And they can also be much more concrete. E.g. in

https://github.com/kastle-lab/ontology-modules-for-supply-chain-tracing/blob/master/modules-documentation.pdf

we have something like an "ownership change event", in

https://docs.enslaved.org/ontology/v2/Enslaved_Documentation_V2_0-2.pdf

we have something like an "AgeRecord".


> Is that basically correct? Is there a
> link that might show a diagram of multiple modules combined into an
> overall ontology?

Yes see both of the documents above, towards the end (pages 32+33 for
enslaved, page 28 for the other one).

> Given that one of your key findings is the usefulness of modularity,
> this seems to be an important design aspect when combining LLMs and >
KGs.

That's what we conjecture :)

Pascal.
Kansas State University http://www.pascal-hitzler.de

Chris Mungall

unread,
Dec 3, 2024, 5:46:49 PM12/3/24
to ontolo...@googlegroups.com
Hi Mike,

You asked about the OBO model. OBO ontologies are typically medium to large terminological classifications, with moderate axiomatization and light KG-like properties. We performed an evaluation of the use of RAG to enhance 10 OBO ontologies, focusing on 4 different term creation tasks: Adding subsumption edges (analogous to what is performed by an OWL reasoner), adding existential restrictions (common in many biomedical ontologies, similar to edges in a KG), Adding logical definitions (OWL equivalence axioms), and generating textual definitions. For the last task we recruited biocurators to evaluate the generated text against manually authored text.

The results are described in:  Toro, S., Anagnostopoulos, A.V., Bello, S.M. et al. Dynamic Retrieval Augmented Generation of Ontologies using Artificial Intelligence (DRAGON-AI). J Biomed Semant 15, 19 (2024). https://doi.org/10.1186/s13326-024-00320-3

The results indicate that the RAG approach is accurate enough that we feel it would be useful to integrate into ODEs (analogous to how co-pilot is integrated into IDEs). The approach is able to integrate additional unstructured knowledge, from github issues, and the literature.

However, just as with codegen, there are reasons to be cautious. Evaluators ranked AI-generated definitions as being slightly below human-generated definitions. However, when we take the expertise/confidence of the evaluator into consideration the gap widens. In other words, more experienced domain experts were better able to detect being "gaslit" by the AI (many of the evaluators used this language).

See Figure 3:

image.png

Pascal Hitzler

unread,
Dec 3, 2024, 5:49:59 PM12/3/24
to ontolo...@googlegroups.com, Chris Mungall, Shimizu, Cogan Matthew, Sanaz Saki Norouzi

Thanks, Chris - this is really interesting work I wasn't yet aware of.

Pascal.

Mike Bergman

unread,
Dec 3, 2024, 6:43:31 PM12/3/24
to ontolo...@googlegroups.com, Chris Mungall, Shimizu, Cogan Matthew, Sanaz Saki Norouzi

+1, including your cautions.

Best, Mike

Pascal Hitzler

unread,
Feb 6, 2025, 10:23:54 PMFeb 6
to ontolog-forum, Shimizu, Cogan Matthew
FYI The published version of the paper is now available at https://www.sciencedirect.com/science/article/pii/S1570826825000022

Pascal.

Pascal Hitzler
communication from mobile
voice recognition may distort spelling

Sankalp Srivastava

unread,
Mar 3, 2025, 11:31:13 PMMar 3
to ontolog-forum
Thanks for sharing your research here, Pascal. I found your approach of constructing Modules prior to constructing the entire ontology matching schema useful. Before I devise an entire ontology-matching approach for Indian "legal-tech" applications, I considered it best to know more about this since my own approach to constructing named entities using a version of semantic search is highly similar to this - I have described it in part here - https://github.com/sankalpsrv/Draft-PrivacyPolicyAnalyser. Your paper was an inspiring read as it introduces concepts such as Ontology Design Patterns quite clearly.

I am using semantic search to programmatically look through pre-defined triple definitions and then populating them (mostly) after constructing (something like) a Module.  

Although not a computer science student, I wanted to devise such an approach considering the development opportunities in interlinkage between semantic web and LLMs. 
Reply all
Reply to author
Forward
0 new messages