Demonstração por IA?

29 views

Skip to first unread message

Joao Marcos

unread,

Jul 11, 2025, 8:56:27 AM7/11/25

to Lista acadêmica brasileira dos profissionais e estudantes da área de LOGICA

A mensagem abaixo pode ser de interesse para membros desta lista.

[]s, JM

---------- Forwarded message ---------
From: Lawrence Paulson
Subject: [isabelle] DeepSeek experiment

Hello all, you may be interested in a little experiment I tried using the
DeepSeek LLM yesterday. I asked whether it could translate a theorem header
from n-dimensional real vectors to abstract Euclidean spaces. My query:

> The following theorem header fixes variable "a" to type "real^'n". Changing
this type to "'a::euclidean_space" requires modifying the header, replacing $
and χ by their analogues for Euclidean spaces.
>
> proposition
> fixes a :: "real^'n"
> assumes "m ≠ n" and ab_ne: "cbox a b ≠ {}" and an: "0 ≤ a$n"
> shows measurable_shear_interval: "(λx. χ i. if i = m then x$m + x$n else x
$i) ` (cbox a b) ∈ lmeasurable"
> and measure_shear_interval: "measure lebesgue ((λx. χ i. if i = m then x
$m + x$n else x$i) ` cbox a b)
> = measure lebesgue (cbox a b)"

On its first attempt it produced something that seemed to be almost right but
clunky and referred to something non-standard. Pushing it a little further, I
got the following, which is far as I can tell is correct. (Isabelle accepts
it.)

> proposition
> fixes a :: "'a::euclidean_space"
> assumes "m ≠ n" and "m ∈ Basis" and "n ∈ Basis"
> and ab_ne: "cbox a b ≠ {}" and an: "0 ≤ a ∙ n"
> shows measurable_shear_interval: "(λx. ∑i∈Basis. (if i = m then
(x ∙ i) + (x ∙ n) else (x ∙ i)) *⇩R i) ` (cbox a b)
∈ lmeasurable"
> and measure_shear_interval: "measure lebesgue ((λx. ∑i∈Basis.
(if i = m then (x ∙ i) + (x ∙ n) else
(x ∙ i)) *⇩R i) ` cbox a b)
> = measure lebesgue (cbox a b)”

Everything was accompanied by quantities of plausible sounding narrative, e.g.
"this version is more abstract and elegant”. A sample:

> This version is more type-safe because:
> • No need to worry about index bounds
> • The basis vectors are first-class entities
> • The notation more clearly reflects the mathematical intent
> Note that in this formulation:
> • Basis is the canonical basis of the Euclidean space
> • ∙ is the inner product (giving the component in a particular basis
direction)
> • *⇩R is the scalar multiplication
> • The sum reconstructs the vector from its components
> Would you like me to explain any part of this formulation in more detail? Or
would you like assistance with proving this version of the theorem?

My impression is that it was grabbing material from a variety of places, in
particular the AFP entry Ordinary_Differential_Equations. But the word
"elegant" doesn't appear there and in fact appears in the entire distribution
+ AFP only 42 times.

I recall doing the precise task above in the past and needing the better part
of an hour to get it right. I didn't try to port the proof (which was long and
complicated) and I would be surprised if it could get that right.

I'm not sure how much more can be accomplished with the free version of
DeepSeek. I have also seen Claude Sonnet 4 swallow entire Isabelle codebases
and summarise them. These tools may already be useful for tasks such as
identifying duplicate material.

Larry

Joao Marcos

unread,

Jul 11, 2025, 8:58:47 AM7/11/25

to Lista acadêmica brasileira dos profissionais e estudantes da área de LOGICA

Para contrabalançar a experiência positiva relatada na mensagem
anterior por um expert, acrescento dois causos pessoais:

(1) Neste semestre tive uma aluna que decidiu delegar para uma IA a
solução de exercícios relacionados a demonstrar ou refutar conjecturas
proposicionais, usando o sistema ProofWeb
(https://carol.dimap.ufrn.br/proofweb/) como interface para teorias em
Coq. Para cada conjectura recebida, a aluna apresentou tanto uma
"demonstração" (usando códigos que ela não é capaz de explicar) quanto
uma "refutação" --- sim, ela demonstrou *e* refutou cada um dos
exercícios! Olhando para o código apresentado pela discente, demo-nos
conta de que a maior parte das respostas apresentadas envolviam duas
táticas Coq: "abort" (isto é, "desisto") e "admit" (isto é, "assuma
como axioma"). Obviamente, as táticas usadas resultam
"bem-sucedidas"... Como tem sido amplamente verificado, usuários
despreparados costumam tirar bem menos proveito da IA do que
gostariam.

(2) Outro aluno da mesma turma decidiu consultar um oráculo artificial
para "saber um pouco mais sobre as ferramentas computacionais que
estávamos usando". Ou seja, ao invés de simplesmente nos perguntar a
respeito, em sala de aula ou no fórum, assistir aos vídeos
instrucionais que fizemos, e consultar os artigos que publicamos sobre
o assunto (por exemplo:
https://dimap.ufrn.br/~jmarcos/papers/JM/15-TM-Try.pdf), e ao invés de
simplesmente consultar o material do curso ou fazer uma consulta no
Google, o aluno optou por consultar uma IA qualquer, que respondeu com
fantasias e delírios, a partir das quais o aluno tirou suas próprias
conclusões e enviou seus "questionamentos" para o fórum da disciplina.
O pior: quando apontei ao aluno as incorreções flagrantes nas
alucinações que ele copiou-e-colou no fórum, ele se chateou comigo, e
insistiu que as "fontes" dele estavam corretas!

Que tipos de experiências recentes têm tido os colegas com o (mau) uso
destas ferramentas?

[]s, Joao Marcos