The Science of Detecting LLM-Generated Text

Alex Shkotin

unread,

Apr 10, 2024, 3:54:13 AM4/10/24

to ontolog-forum

Thanks to Mike Peters we know from Wikipedia that there are 715 sciences.

But now following the rule of noosphere: if something exists then there is somebody who study this something; we have new science coming:

"Conclusion

The detection of LLM-generated text is an expanding and dynamic field, with numerous newly developed techniques emerging continuously. This survey provides a precise categorization and in-depth examination of existing approaches to help the research community comprehend the strengths and limitations of each method. Despite the rapid advancements in LLM-generated text detection, significant challenges still must be addressed. Further progress in this field will require developing innovative solutions to overcome these challenges."

Alex

Mike Peters

unread,

Apr 10, 2024, 3:14:27 PM4/10/24

to ontolog-forum

Hi All

As background, The Google sheet Alex referred to is a dump from a database I built as part of PIPI. This dump is to enable input and comments from others. It contains the data from 3 out of 35+ tables in that database.

1. Primitives: These are similar to OBO > COB and/or

2. Data sources: These are where the primitives were found

3. Sciences: These were imported directly from the Wikipedia article "Index of branches of Science".

Notes:
The science table could use an endpoint from DBPedia by using some of Kingsley's tooling.

The intended purpose is to anchor each primitive in a science or sciences.

Like OBO > COB, Pipi will use each primitive to anchor ontologies.

Happy to have input from others.

Mike Peters

Ajabbi
htttps://www.blog.ajabbi.com
New Zealand

John F Sowa

unread,

Apr 10, 2024, 9:12:31 PM4/10/24

to ontolo...@googlegroups.com

The source of any info, no matter who or what generates it should be indicated. LLMs are sometimes accurate, and sometimes hallucinogenic. That is also true of people.

There are certain former public officials whose output is 98.44% lies. Any LLMs that use their input -- even by accident -- are almost as bad.

John

From: "Alex Shkotin" <alex.s...@gmail.com>

Reply all

Reply to author

Forward