FWD: university examiners fail to spot ChatGPT answers in real-world test

20 views
Skip to first unread message

John F Sowa

unread,
Jun 29, 2024, 4:13:21 PM (8 days ago) Jun 29
to ontolo...@googlegroups.com
More evidence that ChatGPT can cheat on exams.  When a huge amount of texts, including the latest research publications, are available, it can find more answers more quickly than unaided humans.  Some people call that intelligence.  Others call it cheating.
  
Major problem:  ChatGPT does not state its sources.  Therefore, it can just as quickly find erroneous or deliberately falsified information -- and the humans who need to use that information have no basis for evaluating its reliability.  

John
______________________________________

University examiners fail to spot ChatGPT answers in real-world test
ChatGPT-written exam submissions for a psychology degree mostly went undetected and tended to get better marks than real students’ work

James Davenport

unread,
Jun 29, 2024, 5:57:08 PM (8 days ago) Jun 29
to ontolo...@googlegroups.com
Notice that ChatGPT was somewhat better than the students, by about 0.5 s.d. - certainly non-trivial, but not earth shattering. 
James

Sent from my iPhone

On 29 Jun 2024, at 22:13, John F Sowa <so...@bestweb.net> wrote:



CAUTION:  This email came from outside of the University. To keep your account safe, only click on links and open attachments if you know the person who sent the email, or you expected to receive this communication.

 

--
All contributions to this forum are covered by an open-source license.
For information about the wiki, the license, and how to subscribe or
unsubscribe to the forum, see http://ontologforum.org/info/
---
You received this message because you are subscribed to the Google Groups "ontolog-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ontolog-foru...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ontolog-forum/27c4da521d8b4e16a1644b2a4f02ca73%40bestweb.net.

Alex Shkotin

unread,
Jun 30, 2024, 5:47:36 AM (8 days ago) Jun 30
to ontolo...@googlegroups.com
The point is what kind of exams we need for GenAI to get that it is good for what?

Consider we need some AI with very good knowledge just Arithmetic - theory of natural numbers but in full power may be up to ability to prove complex theorems.
Can we train GenAI for this? I think no :-)

Do we have this kind of exam for GenAI: prove this or those statements using this or that premise?
But we remember project The debater from IBM.

Alex

вс, 30 июн. 2024 г. в 00:57, 'James Davenport' via ontolog-forum <ontolo...@googlegroups.com>:

John F Sowa

unread,
Jul 2, 2024, 2:12:56 PM (5 days ago) Jul 2
to ontolo...@googlegroups.com
Alex,

For mathematics, Wolfram showed an excellent way to use LLMs:   Support a front-end in English (or other natural languages) to his Mathematica system.

Users with any level of mathematical background, from students to advanced researchers, can state their problems in ordinary language (supplemented with any mathematical terms or notations the may know).

Then the LLMs translate NLs to and from the notations for any of the huge number of mathematical theories that Mathematica supports.  Mathematica supports far more powerful reasoning methods than anything that OpenGPT or other tools can find on the WWW.

But note that Wolfram does not claim that his Mathematica system uses AI.  For doing mathematics, GOFOL (Good Old Fashioned First Order Logic) plus traditional mathematical methods are far more powerful than LLMs that use whatever they can find on the WWW.

Furthermore, Mathematica can run on a good laptop.  You don't need an immense supercomputer to outperform any mathematical reasoning that OpenGPT (or other such systems) can do.

John
 


From: "Alex Shkotin" <alex.s...@gmail.com>

The point is what kind of exams we need for GenAI to get that it is good for what?

Consider we need some AI with very good knowledge just Arithmetic - theory of natural numbers but in full power may be up to ability to prove complex theorems.
Can we train GenAI for this?  I think no :-)

Alex

Alex Shkotin

unread,
Jul 3, 2024, 6:10:11 AM (5 days ago) Jul 3
to ontolo...@googlegroups.com

John,


Exactly! And why only math?

Have a look at this usage of Wolfram Alpha for Mechanics https://www.wolframalpha.com/input?i=diameter+5mm+and+force+2N%2C+what+is+the+mechanical+stress


And even more: for students they have step-by-step solutions for a particular task but it is paywalled: https://www.wolframalpha.com/input?i=prove+2*2%3D4


But Wolfram language is not a GOFOL at all. Just another programming language I must admit.


Unfortunately, for me their use of NL is rudimentary [1]  https://www.wolframalpha.com/input?i=What+is+the+maximum+charge+of+an+atomic+nucleus+obtained+experimentally%3F

And what is important: they have it https://www.wolframalpha.com/input?i=oganeson

And meta.ai is just not an expert [2]. But this is a problem with these agents: it knows but responds wrong!!!

Claude.ai is fine [3].


Let's check another NL-example: In an isosceles triangle, the angle at the base is 30 degrees. What is the angle at the vertex of the triangle?

WAlpha: [4] - BAD, claude.ai [5] - FANTASTIC.


🐺α is a great project, but the usage of NL is rudimentary! 

They should use the operational syntax of English [0] and frameworks for knowledge concentration [-1].


Alex


[-1] https://www.researchgate.net/publication/374265191_Theory_framework_-_knowledge_hub_message_1

https://www.researchgate.net/publication/380576198_Specific_tasks_of_Ugraphia_on_a_particular_structure_formulations_solutions_placement_in_the_framework 

[0] https://www.researchgate.net/publication/366216531_English_is_a_HOL_language_message_1X 

[1]

 


[2]

[3]

[4]

[5]




 

вт, 2 июл. 2024 г. в 21:12, John F Sowa <so...@bestweb.net>:
--
All contributions to this forum are covered by an open-source license.
For information about the wiki, the license, and how to subscribe or
unsubscribe to the forum, see http://ontologforum.org/info
---
You received this message because you are subscribed to the Google Groups "ontolog-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ontolog-foru...@googlegroups.com.

Alex Shkotin

unread,
Jul 3, 2024, 6:46:16 AM (5 days ago) Jul 3
to ontolo...@googlegroups.com
IN ADDITION:

Look how claude.ai works with "diameter 5mm and force 2N, what is the mechanical stress" using language of 🐺α:




ср, 3 июл. 2024 г. в 13:09, Alex Shkotin <alex.s...@gmail.com>:

John F Sowa

unread,
Jul 3, 2024, 3:33:30 PM (4 days ago) Jul 3
to ontolo...@googlegroups.com
Alex,

My Summary:   The Wolfram diagrams and explanations are beautiful, elegant, and understandable by anybody who took the usual high-school courses in math and science.   And most importantly, they're guaranteed to be correct.  For people who learned (and remember) that material, Wolfram is excellent.

As the Google and OpenGPT researchers have emphasized, LLMs cannot do any kind of reasoning.  What they do is search for human reasoning in the vast volumes of texts on the WWW.  They often do good job of finding reasoning patterns for one-step problems and using the patterns they find.   But they cannot guarantee correctness.

Sometimes, Claude and other systems can find multi-step proofs.  But when they combine multiple patterns for more complex problems, they often make mistakes.  And the probability of correctness decreases exponentially with the number of steps.

Most important:  In mathematics, guaranteed correctness is essential.  Anything that uses LLMs, is probabilistic.  Wolfram guarantees correctness for any number of steps (no limit on the number).  I trust Wolfram.  I have ZERO trust in Claude or any other LLM-based reasoner.  

For any reasoning system that does not use precise deduction to check a proof, there is a clear classification:  a GUESS.   For a person who understands the limitations, an educated guess (AKA abduction) can be useful.   Mike Bergman, for example, understands the issues.  He knows how to evaluate the guesses (abductions).   But most users don't have enough background, skepticism, or patience to do the testing and verification.

Danger with Claude and other LLM-based systems that do not use DEDUCTION to evaluate its answers:  People might believe them.  You never want to fly in an airplane or drive a car that was designed by people who use Claude to do their reasoning.

John
_______________________________________

Alex Shkotin

unread,
Jul 4, 2024, 6:06:53 AM (4 days ago) Jul 4
to ontolo...@googlegroups.com

John,


We don't know a whole stack of technologies used by Huge Language Models (aka LLMs). Have a look for example at https://www.constitutional.ai/

LLM is just most hyped.

All around the Globe are trying to use it. Look, not TRUST, just USE.

To evaluate the GenAI output is where formalization and algorithms come to first place.

Consider the task from the Statics area of Mechanics I cited here and have in mind to formalize a solution.

The advantage of a formal solution is that we have an algorithm to check its correctness, i.e. verify it.

So I have a task, I get a solution. I verified it by computer.

The task is [1].

The solution from claude.ai is this [2]. If I have formalized one, I need just an algorithm to check it.

But to create this algorithm we need formal theory of the application domain.

Formal theory framework may be a good way to keep our theoretical knowledge.

This [0] is an example of ugraph theory formalization.

I am on the way to formalizing Statics a little.

How to check the claude.ai solution? That is the question.


Alex


[0] (PDF) Theory framework - knowledge hub message #1 

[1]

A weightless beam is held in a horizontal position by a hinged-fixed support at point A and a vertical rod BC.

At point D, a concentrated force F = 30 kN is applied to the beam at an angle of 50° down to the right.

Dimensions: AB=0.6m, BD=0.4m.

Calculate the reaction forces of the supports acting on the beam.

[2]

claude:

To solve this problem, we'll need to analyze the forces acting on the beam and use the principles of static equilibrium. Let's approach this step-by-step:


1) First, let's define our coordinate system:

   - Horizontal axis: positive to the right

   - Vertical axis: positive upwards


2) Now, let's identify the unknown forces:

   - At point A (hinged support): Vertical reaction Ay and horizontal reaction Ax

   - At point C (vertical rod): Vertical reaction Cy (no horizontal component as it's a vertical rod)


3) Let's break down the given force F into its horizontal and vertical components:

   Fx = F * cos(50°) = 30 * cos(50°) = 19.28 kN (to the right)

   Fy = F * sin(50°) = 30 * sin(50°) = 22.98 kN (downward)


4) Now we can set up our equilibrium equations:


   a) Sum of forces in x-direction = 0

      Ax + 19.28 = 0

      Ax = -19.28 kN


   b) Sum of forces in y-direction = 0

      Ay + Cy - 22.98 = 0

      Ay + Cy = 22.98 kN  (Equation 1)


   c) Sum of moments about point A = 0

      (Cy * 0.6) - (22.98 * 0.4) - (19.28 * 0.4 * tan(50°)) = 0

      0.6Cy - 9.192 - 9.192 = 0

      0.6Cy = 18.384

      Cy = 30.64 kN


5) Now we can solve for Ay using Equation 1:

   Ay + 30.64 = 22.98

   Ay = -7.66 kN


Therefore, the reaction forces are:

- At point A: Ax = -19.28 kN (to the left), Ay = -7.66 kN (downward)

- At point C: Cy = 30.64 kN (upward)


Would you like me to explain any part of this solution in more detail?


screenshot of beginning



ср, 3 июл. 2024 г. в 22:33, John F Sowa <so...@bestweb.net>:
--
All contributions to this forum are covered by an open-source license.
For information about the wiki, the license, and how to subscribe or
unsubscribe to the forum, see http://ontologforum.org/info
---
You received this message because you are subscribed to the Google Groups "ontolog-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ontolog-foru...@googlegroups.com.

Philip Jackson

unread,
Jul 4, 2024, 9:08:15 AM (4 days ago) Jul 4
to ontolo...@googlegroups.com
Alex, a nit to pick:

Your example starts with the sentence:

"A weightless beam is held in a horizontal position by a hinged-fixed support at point A and a vertical rod BC."

In reality, on Earth, physical beams are not weightless. I suppose you mean that the actual weight is not important.

Or would your example work if the physical setup were weightless, in freefall orbit around Earth?

Phil


From: ontolo...@googlegroups.com <ontolo...@googlegroups.com> on behalf of Alex Shkotin <alex.s...@gmail.com>
Sent: Thursday, July 4, 2024 6:06 AM
To: ontolo...@googlegroups.com <ontolo...@googlegroups.com>
Subject: Re: [ontolog-forum] FWD: university examiners fail to spot ChatGPT answers in real-world test
 

Alex Shkotin

unread,
Jul 4, 2024, 12:31:53 PM (3 days ago) Jul 4
to ontolo...@googlegroups.com

Phil,


It is important for me to emphasize that this is a task from a specific textbook on strength of materials, i.e. I didn't invent it. And although tasks written for people have to be clarified in order to formalize them correctly, the term “weightless beam” is a physical term. And yes, that's how they put it to say that weight can be neglected.

Just as we discussed a little earlier, a material point is a solid whose dimensions can be neglected in some task. For example, the sizes of the Earth and the Sun when calculating the trajectory of the former around the latter.


Alex



чт, 4 июл. 2024 г. в 16:08, Philip Jackson <philipcj...@hotmail.com>:

Philip Jackson

unread,
Jul 4, 2024, 1:32:13 PM (3 days ago) Jul 4
to ontolo...@googlegroups.com
Alex,

OK, thanks for the explanation.

Phil

Sent: Thursday, July 4, 2024 12:31 PM

John F Sowa

unread,
Jul 4, 2024, 6:39:29 PM (3 days ago) Jul 4
to ontolo...@googlegroups.com
The answers to your questions are very simple, and I believe obvious.

Alex:  I am on the way to formalizing Statics a little.  How to check the claude.ai solution? That is the question.

1. For statistics, there is more than a century of applications, analyses, and formalizations.  Instead of inventing YAFOS (Yet Another Formalization Of Statistics), I recommend a search through the mountain of books and articles about statistics and recommending one or more.

2. For Claude and other applications of LLMs, there is another mountain of books and articles about logic and ways of using logic in all branches of cognitive science (philosophy, psychology, linguistics, AI, neuroscience, and anthropology).  I recommend a search through that mountain and recommending one or more.

I don't believe that Ontolog Forum can or should attempt to repeat the huge amount of research that is available online.   It's much better to search and recommend publications that are freely available online.

John
 


From: "Alex Shkotin" <alex.s...@gmail.com>

Alex Shkotin

unread,
Jul 5, 2024, 5:15:18 AM (3 days ago) Jul 5
to ontolog-forum
Unfortunately the initial message was blocked. 
image.png

---------- Forwarded message ---------
От: Alex Shkotin <alex.s...@gmail.com>
Date: пт, 5 июл. 2024 г. в 12:10
Subject: Re: [ontolog-forum] FWD: university examiners fail to spot ChatGPT answers in real-world test


John,


Unfortunately, even Euclidean geometry has not been formalized, although we have a dozen of axiomatizations.

What can we say about physics? Just search on the Isabelle website to get:

https://isabelle.in.tum.de/website-Isabelle2009-1/projects.html

"Jacques Fleuriot has mechanized the ultrapower construction of the hyperreals from Nonstandard Analysis (NSA) in Isabelle. Concepts from NSA and geometry theorem proving have been combined and applied to the mechanization of Propositions from Newton's Principia for his PhD thesis. This includes the famous Proposition Kepleriana. The framework has also been used to mechanize real analysis using nonstandard techniques."


We are only at the beginning of theoretical knowledge formalization, at least we have axiomatic theory for Lagrangian mechanics.

And formal ontologies are a kind of theoretical knowledge formalization we have.


If Isabelle or Coq do not have formal theory I doubt we have it at all.


Alex



Alex Shkotin

unread,
Jul 5, 2024, 5:34:40 AM (3 days ago) Jul 5
to ontolo...@googlegroups.com

Phil,


You touched on an important topic that we encounter when we formalize a task formulated for an involved person. It may not be completely written out, because... it is assumed that the person will guess.

In this case, it should be understood that the points (the actual section of the wall and ceiling) are stationary. This should be clearly written out and then formalized.

When transferring a task to a computer, it has to make the implied explicit. By the way, this also applies to definitions.

Here we consider your first case: the entire system is in a uniform gravitational field, but the attachment points in it are motionless.


The idea of a possible world so loved by formal philosophers can be applied here to look at how to formalize it. 

For example in OWL2.


Alex



чт, 4 июл. 2024 г. в 20:32, Philip Jackson <philipcj...@hotmail.com>:

John F Sowa

unread,
Jul 5, 2024, 2:05:00 PM (2 days ago) Jul 5
to ontolo...@googlegroups.com
Alex,

What I have been trying to say is very simple:  

1. The issues about formalization are, in one sense, simple:  state your subject matter in GOFOL (Good Old-fashioned First Order Logic).

2. But the issues about defining a universal formalization of everything are huge.  

3. There is nothing that Ontolog Forum can do that can solve point #2.

4. The best that Ontolog Forum can do is to find and recommend some texts about those issues.

John
 


From: "Alex Shkotin" <alex.s...@gmail.com>

Unfortunately the initial message was blocked. 


Alex Shkotin

unread,
Jul 6, 2024, 6:08:43 AM (yesterday) Jul 6
to ontolo...@googlegroups.com

John,


Exactly! We exchange our ideas, problems, progress and references here.

And first of all: what kind of web-entry point we have to serve us.


About #1. I prefer First Order LANGUAGE. As there are so many logics right now. And by the way, the FOL framework (as we discussed after Barwise) does not have numbers of any kind.


About #2. If I ever will have in mind this project I'll ask E.N. Zalta the author of Principia Logico-Metaphysica to help.


About #3. We need a group of 1000x Bourbaki power to formalize all science and technology. But we do it right now creating formal ontologies in projects like OBO Foundry and others.


About #4. Oh yes! For example, David Leal has pointed here to an important article by Tom Gruber issued 30 years ago [1] (Should we celebrate this?). Where we have:

"Other engineering ontologies that build on the EngMath family--for describing component structure, design tasks, discrete events, and specific analysis domains such as kinematics--are being developed."

My statics is just a subset, IMHO.


And just a little bit about web-entry points:

-Hets, the DOLiator http://rest.hets.eu/ --where it is possible to play with CL

-webprotege https://webprotege.stanford.edu/ --to play with OWL2 ontologies 

-https://www.wolframalpha.com/input?i=ugraph

-https://www.meta.ai/ 

-https://claude.ai/new 

-framework --framework for Ugraph theory 


Is there a web-entry point to check Claude 3S solution for statics problem? No.

It will 🦔


Alex


[1] https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=a7f7348d483eed62a0094d7441c87e6372fb0c40

https://tomgruber.org/writing/an-ontology-for-engineering-mathematics

https://web.archive.org/web/20061109050920/http://ksl-web.stanford.edu/knowledge-sharing/papers/engmath.html Thanks to Mike Peters 🙂


пт, 5 июл. 2024 г. в 21:04, John F Sowa <so...@bestweb.net>:
--
All contributions to this forum are covered by an open-source license.
For information about the wiki, the license, and how to subscribe or
unsubscribe to the forum, see http://ontologforum.org/info
---
You received this message because you are subscribed to the Google Groups "ontolog-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ontolog-foru...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages