--
You received this message because you are subscribed to the Google Groups "Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metamath+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/metamath/e55ea1ba-b3c2-4018-80a1-a46149ead54fn%40googlegroups.com.
Whoops, this is embarrasing that I didn’t glance very closely at the code.., yes, the $v processing block hardly checks anything.
Implementing a MM verifier as a human, the approach I took was to carefully follow a reference implementation to try to ensure not missing some detail — clearly the LLM doesn’t do that. I am sorry 🙏.
I think it’d be great to unify the test suites. I liked how David Wheeler’s metamath test suite made it easy to test multiple verifiers.
I ran into a few ‘gaps’ in mmverify.py's adherence to the specs (and added a few tests to trigger them, some LLM-generated).
Incidentally, I gave Matthew House’s list to GPT-5, and it found 4 issues that mmverify.py trips up on (and 1 false negative). I’ve uploaded the four test.mm files: https://github.com/zariuq/mmverify.py/tree/pure/adversarial_tests.
This does get me thinking about how to verify the verifiers (aside from actually formally verifying them). While the tests are helpful, to an extent, I guess one falls back on "humans reading the code to judge whether it's really doing what we want"....
— Zar Goertzel