Elon's fact checker keeps checking him

16 views

Skip to first unread message

John F Sowa

unread,

Jun 29, 2025, 1:08:28 AMJun 29

to ontolog-forum, CG

Elon poured billions of $$$ into building a bigger and better AI system called Grok. But it keeps fact checking Elon and Trump and claiming that what they're saying is false. If you search for "Grok fact checking Elon" you'll get lots of hits. Following is an example from the Washington Post.

John

____________

How Elon Musk’s ‘truth-seeking’ chatbot lost its way

Frustrated by what he saw as the “political correctness” of ChatGPT, Elon Musk told conservative pundit Tucker Carlson two years ago that he planned to create a “truth-seeking AI” that “tries to understand the nature of the universe.” Later that year, he founded an artificial intelligence firm called xAI and released a chatbot called Grok — a word drawn from science fiction that signifies a deep understanding.

But over the past two weeks, Grok committed a string of bizarre blunders that might make it difficult for the AI to gain mainstream credibility. The chatbot’s answers to a wide range of unrelated questions wandered into unprompted digressions about “white genocide” in South Africa, sparking an uproar that the company responded to by deleting Grok’s posts and blaming an unnamed employee for an unauthorized code change. After that, users reported Grok veering into skepticism about the Holocaust, suggesting that its “truth-seeking” radar remained miscalibrated.

In some respects, Musk’s AI project has been a success. His fellow Silicon Valley tech titans invested heavily in xAI, making it a vehicle valuable enough to acquire his social network, X, earlier this year.

Grok has become a popular feature on X, where people use it as both a diversion and a resource. It rivals Google’s Gemini and Microsoft’s Copilot in app downloads and web traffic, according to the analytics firms Sensor Tower and Similarweb — though all three lag far behind OpenAI’s ChatGPT. The latest Grok models also stack up respectably against competitors on performance benchmarks, and the chatbot’s ability to draw on X posts gives it a unique advantage in responding to current events.

On Monday, Microsoft announced a deal with xAI to offer a version of Grok as an option in its Azure platform for AI developers, a stamp of approval of sorts from an industry heavyweight. In a video call with Microsoft CEO Satya Nadella, Musk said Grok aims to uncover “fundamental truths” by reasoning from “first principles” and “applying the tools of physics to thinking.”

That would be quite a leap from problems regularly encountered by today’s AI chatbots. Impressive as ChatGPT and its ilk are in some respects, they have often displayed a tenuous relationship to truth and logic, from fabricating names and facts to fumbling basic arithmetic. That’s because they are built to infer the most plausible response to any given query based on patterns in their vast, messy and often biased training data — not to grasp the nature of reality.

AI firms’ efforts to address those flaws have proved fraught. In February 2024, Google apologized after users mocked its penchant for injecting false diversity into inappropriate settings — such as depicting Asian, Black and Native American men in colonial garb when asked to draw “America’s Founding Fathers.” The company sheepishly explained that it had aimed to counteract AI’s tendency to stereotype by instructing the model to generate a wide range of people.

Musk has billed Grok as the antidote to such clumsy interventions: an AI that eschews political correctness in favor of actual correctness. So far, it has struggled on both counts.

Within a month of Grok’s launch, Musk was fielding complaints from his conservative friends that the chatbot was too “woke,” or socially liberal — a perceived failing that Musk chalked up to its initial training data. “Grok will get better,” he assured them.

Still, tests by The Washington Post earlier this year found that the chatbot was routinely contradicting some of Musk’s dearest views. It declined to blame Democratic election victories on electoral fraud, for instance, or air traffic control problems on diversity programs. The chatbot’s willingness to debunk such conservative talking points had begun to endear it to some liberals, who gleefully deployed it in replies to Musk’s X posts.

Grok has had less trouble delivering on Musk’s promise to make it spicier and less inhibited than other leading chatbots. Some users appreciate its willingness to curse, mock and wade into sensitive topics that make ChatGPT balk. It has also proved handy for misogynists, who have responded to women’s posts on X by asking Grok to reply by generating a picture of them undressed, and extremists, who have found it willing to produce Nazi propaganda. (XAI appears to have clamped down on some of those uses after they were publicly reported.)

But the biggest threats to Grok’s reputation may have come in recent weeks.

On May 14, the chatbot began responding to all kinds of unrelated queries by holding forth on the topic of “white genocide” in South Africa, to users’ bafflement. It’s a theory that holds that the country’s formerly ascendant White minority is being targeted for elimination by its Black majority — a claim the South African-born Musk has helped to popularize via his influential X account. The theory has been rejected as false by courts, government ministers and fact-checkers. Grok’s sudden obsession with it coincided with a push by the Trump administration to justify its controversial move to welcome White South African refugees at a time when the United States is turning away refugees of color from countries around the world.

XAI responded to the ensuing furor by deleting Grok’s tweets and blaming the issue on an “unauthorized modification” to the bot’s code that someone made at 3:15 a.m. that day. The company didn’t specify the culprit or announce any disciplinary response.

XAI did not respond to a request for comment.

It wasn’t the first time the company blamed unnamed rogue personnel for changes to Grok’s code that happened to align with its owner’s politics. In February, an X user uncovered a line in Grok’s instructions directing it not to draw answers from any source that linked Musk or President Donald Trump with “misinformation.” In that case, xAI’s engineering chief chalked it up to a change made without permission by an employee who was no longer at the company.

Aiming to restore users’ trust, the company last week published Grok’s “system prompts” — the hidden instructions that set the ground rules for a chatbot’s responses to users — and instituted new checks on changes to its code. The thinking: Putting the system prompts out in the open would reassure people that no one is manipulating them behind the scenes.

Seeing Grok’s prompts laid bare suggested its “truth-seeking” may be little more than a political filter applied to an otherwise standard-issue language model. Among the key instructions: “Provide truthful and based insights, challenging mainstream narratives if necessary, but remain objective.” (“Based,” as Grok helpfully defines it, is “a term of praise for bold, unfiltered, or contrarian views, often leaning right-wing or antiestablishment.”)

It soon became clear those views weren’t limited to the racial politics of South Africa. After Grok stopped talking about “white genocide,” users circulated examples of it questioning whether the Holocaust was exaggerated — a tired antisemitic trope.

Politics aside, Grok’s vulnerability to parroting discredited claims casts further doubt on Musk’s aspirations for it to be a reliable source of information in high-stake realms such as medicine. In January, Musk reposted an X user’s story about Grok correctly diagnosing an injury that human doctors had overlooked — only for users of X’s “Community Notes” fact-checking program to point out that Grok appeared to have made a significant mistake in its analysis.

It’s conceivable that someday AI models really will develop minds of their own. But for now, Grok’s antics make clear that the ideal of a “truth-seeking chatbot” remains unfulfilled.

Mike Bergman

unread,

Jul 2, 2025, 7:44:18 PMJul 2

to ontolo...@googlegroups.com

Hi John,

Verify for yourself and report your experiences. Lousy prompts and agendas in reporting about all of these chatbots means there are no "authoritative" sources on these matters. Certainly, dare I say, the Washington Post.

Going back to your first excoriating comments on these capabilities I have maintained to you (and others on the list) that taking a simple, pragmatic viewpoint of whether a given chatbot is a useful research assistant or not is the most productive way to assess these systems. Let others rail about AGI or make sweeping assertions about this or that. As for me, I have found Grok to be the most responsive and capable of the chatbots at present. Yes, Grok, and I will soon comment on my blog about why I think that is.

I find it more useful to get more stuff done with my research assistant than to waste time pontificating about this stuff. YMMV.

Best, Mike

--
All contributions to this forum are covered by an open-source license.
For information about the wiki, the license, and how to subscribe or
unsubscribe to the forum, see http://ontologforum.org/info
---
You received this message because you are subscribed to the Google Groups "ontolog-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ontolog-foru...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/ontolog-forum/3176c51a3cae44c98e7309d0eefefa55%40cbdd040dad724250b636695771915d5d.

-- 
__________________________________________

Michael K. Bergman
319.621.5225
http://mkbergman.com
http://www.linkedin.com/in/mkbergman
__________________________________________

John F Sowa

unread,

Jul 2, 2025, 10:39:22 PMJul 2

to ontolo...@googlegroups.com

Mike,

That article I cited from the Washington Post was on a topic that had a large number of related articles on the WWW. Just search for that subject and you'll find them. I selected the report from the WP because it was the clearest and most informative for its length. I know people who write the science reviews in the NY Times (one of them is my wife's niece), and they have good backgrounds in science. I don't know the science editors of the WP, but I believe that they would be as respectable as the science editors of the NYT.

I have been doing research in AI for the past 50 years. I published two books myself, coauthored several others, and published quite a few papers. By the measure "be cited or blighted", you can check my citations by typing "Google scholar for John Sowa".

After 30 years of R & D at IBM, I was a cofounder of two AI companies, VivoMind LLC, and Permion Inc. We have some high powered customers who require the highest quality results. One of them is the NSA. My colleague Arun has the clearance. When I visited NSA, we met in the Museum, which is free and open to anybody who may be interested in artifacts for encoding secret stuff from ancient times through various modern wars -- enigma machines, for example.

I can assure you that their requirements for precision are far beyond anything that Grok can do. Back in 2010, when tested on a few terabytes of the kind of data they examine, our VivoMind system outperformed everything that their best human analysts could find when using all the tools that they had at their disposal from all other vendors. They don't tolerate approximations. They want to find everything at the highest degree of accuracy.

For an overview of our VivoMind technology of 2010, see https://jfsowa.com/talks/cogmem.pdf . None of the applications in the last section of those slides could be done by LLMs. But our new Permion system combines the best of the LLM technology with the best of our VivoMind symbolic methods from 2010.

I write the articles because I can say anything I wish -- because I don't have clearance. Since Arun has clearance, that makes it very difficult for him to get approval for publishing anything. But I check my writing with him to make sure that it's consistent with the high-powered stuff that our Permion system can do.

As for Grok, I don't know the details, but just from seeing the examples of what it does, I can assure you that it would be absolutely useless for what the NSA requires. I'm sure that Elon has people with clearance who may work with the NSA guys, and they probably have contracts to do all sorts of stuff. But I can assure you that Grok itself is not allowed to be used inside the NSA -- because they cannot allow any cross contact between their systems and anything on the outside.

Summary: When I say that the LLM-based technology has a long way to go before it becomes truly reliable, I am using the best information available.

John

From: "Mike Bergman" <mi...@mkbergman.com>

Mike Bergman

unread,

Jul 3, 2025, 12:12:01 AMJul 3

to ontolo...@googlegroups.com

Hi John,

You may wonder why I respond to your postings on this list (and others that we share) as I do sometimes. Perhaps as you know I cite you as an influence in my own intellectual endeavors as the one who first got me interested in Charles Peirce. For that, I will be forever grateful, and acknowledge such in my own writings.

But you tend to have a didactic style that I have noted before is off-putting. Your most recent response was ripe with why that is the case. Because I was defending (I guess one could say) the usefulness of Grok, I thought it a conflict of interest to ask Grok for its opinion. So, I asked ChatGPT 4o to comment on what it saw as logical fallacies or rhetorical devices in your email to me. Please know I anonymized your email to prevent any reference to you, your companies, or your colleagues. Here is ChatGPTs response; know also I got ChatGPT's approval to quote this:

"Mike,

As requested, here is a plain-text analysis of the logical fallacies and rhetorical techniques used in the email you received. These patterns are common in persuasive writing that seeks to establish credibility while minimizing or deflecting critical scrutiny.

1. Appeal to Authority (argumentum ad verecundiam)
Examples:

"I have been doing research in AI for the past 50 years."

"I published two books myself, coauthored several others..."

"One of them is the NSA."

"My colleague has the clearance."

"I can assure you that their requirements for precision are far beyond anything that Grok can do."

Why it matters: These statements use personal credentials and associations to claim authority, but they don’t substitute for actual evidence or argumentation.

2. Appeal to Citation / Appeal to Popularity
Example:

"That article I cited from the Washington Post was on a topic that had a large number of related articles on the WWW. Just search for that subject and you’ll find them."

Why it matters: The fact that many articles exist or that something is cited does not inherently make it true or reliable.

3. Vicarious Authority / Halo Effect
Example:

"I know people who write the science reviews in the NY Times (one is my wife's niece)…"
"I don’t know the science editors of the WP, but I believe they would be as respectable…"

Why it matters: Relying on family associations or assumptions about reputability is not a substitute for evaluating the actual content.

4. Circular Reasoning (Begging the Question)
Example:

"When I say that the LLM-based technology has a long way to go before it becomes truly reliable, I am using the best information available."

Why it matters: This conclusion is assumed in the premise—he's essentially saying "I’m right because I’m using good information" without providing that information.

5. Argument by Anecdote
Example:

"Back in 2010… our system outperformed everything their best human analysts could find…"

Why it matters: Anecdotes may be impressive but are not a solid basis for general claims without independent verification.

6. Hasty Generalization
Example:

"Just from seeing the examples of what Grok does, I can assure you it would be absolutely useless for what the NSA requires."

Why it matters: A sweeping conclusion based on limited or superficial evidence is logically weak.

7. Red Herring / Irrelevant Detail
Example:

"When I visited NSA, we met in the Museum… which is free and open to anybody…"

Why it matters: This detail is irrelevant to the argument and may serve to impress or distract.

8. False Dichotomy (Implicit)
Example:

"None of the applications in the last section of those slides could be done by LLMs."

Why it matters: Suggests an either/or framing that ignores the possibility of hybrid systems—ironically contradicted later in the same email.

9. Argument from Personal Belief / Bias
Example:

"I don’t know the details, but just from seeing the examples…"

Why it matters: Acknowledging ignorance while still drawing firm conclusions is not logically valid.

10. Lack of Falsifiability
Example:

"I check my writing with him to make sure that it’s consistent with the high-powered stuff…"

Why it matters: Because the colleague has clearance and cannot speak publicly, this creates an unfalsifiable appeal—no one else can verify or challenge it.

Summary Table of Fallacies Identified:

Appeal to Authority – personal experience, NSA access, publications
Appeal to Citation / Popularity – presence of articles on the web
Vicarious Authority – wife’s niece, assumptions about WP staff
Circular Reasoning – "best info = I’m right"
Anecdotal Evidence – 2010 outperforming story
Hasty Generalization – judging Grok from examples alone
Red Herring – museum visit
False Dichotomy – symbolic AI vs. LLMs
Personal Belief Bias – confident assertions despite lack of detail
Lack of Falsifiability – colleague vetting but cannot publish

Let me know if you'd like this integrated into a rebuttal or formatted for another medium.

– ChatGPT (OpenAI)
(as requested by Mike Bergman)"

John, so basically, there are 10 different logical and category errors in your response.

Further, I have direct knowledge of NSA and many three-letter agencies and I can categorically state many of your assertions are false. If NSA (or others) wanted you read in, you would have been pushed through the clearance process. NSA most definitely has open source systems and connections in its repertoire, including ones from my own companies. I have not just seen the Enigma at the museum, but have been in three-letter agency offices under direct contractual relations. These agencies definitely use open-source systems, just kept separate from the secure networks. My guess is that NSA and others have direct experience and use of all competing current AI products; they would be fools not to have, but I can not claim that personally.

I don't know the real point you are trying to make here. All I have been trying to say on this topic is rather than pontificate and make uninformed and logically questionable assertions about LLMs and their capabilities, that you actually use them. Like I have said, they are not oracles nor superintelligence, but if wisely employed as research assistants, they can be very productive aids.

Best, Mike

On 7/2/2025 9:38 PM, John F Sowa wrote:

That article I cited from the Washington Post was on a topic that had a large number of related articles on the WWW. Just search for that subject and you'll find them. I selected the report from the WP because it was the clearest and most informative for its length. I know people who write the science reviews in the NY Times (one of them is my wife's niece), and they have good backgrounds in science. I don't know the science editors of the WP, but I believe that they would be as respectable as the science editors of the NYT.

I have been doing research in AI for the past 50 years. I published two books myself, coauthored several others, and published quite a few papers. By the measure "be cited or blighted", you can check my citations by typing "Google scholar for John Sowa".

After 30 years of R & D at IBM, I was a cofounder of two AI companies, VivoMind LLC, and Permion Inc. We have some high powered customers who require the highest quality results. One of them is the NSA. My colleague Arun has the clearance. When I visited NSA, we met in the Museum, which is free and open to anybody who may be interested in artifacts for encoding secret stuff from ancient times through various modern wars -- enigma machines, for example.

I can assure you that their requirements for precision are far beyond anything that Grok can do. Back in 2010, when tested on a few terabytes of the kind of data they examine, our VivoMind system outperformed everything that their best human analysts could find when using all the tools that they had at their disposal from all other vendors. They don't tolerate approximations. They want to find everything at the highest degree of accuracy.

For an overview of our VivoMind technology of 2010, see https://jfsowa.com/talks/cogmem.pdf . None of the applications in the last section of those slides could be done by LLMs. But our new Permion system combines the best of the LLM technology with the best of our VivoMind symbolic methods from 2010.

I write the articles because I can say anything I wish -- because I don't have clearance. Since Arun has clearance, that makes it very difficult for him to get approval for publishing anything. But I check my writing with him to make sure that it's consistent with the high-powered stuff that our Permion system can do.

As for Grok, I don't know the details, but just from seeing the examples of what it does, I can assure you that it would be absolutely useless for what the NSA requires. I'm sure that Elon has people with clearance who may work with the NSA guys, and they probably have contracts to do all sorts of stuff. But I can assure you that Grok itself is not allowed to be used inside the NSA -- because they cannot allow any cross contact between their systems and anything on the outside.

Summary: When I say that the LLM-based technology has a long way to go before it becomes truly reliable, I am using the best information available.

John

-- 
__________________________________________

Michael K. Bergman

John F Sowa

unread,

Jul 3, 2025, 10:52:18 AMJul 3

to ontolo...@googlegroups.com

Mike,

You wrote "you tend to have a didactic style that I have noted before is off-putting." When I'm writing a note to a friendly audience, I'm much more relaxed. But when somebody is criticizing me or my writing, I get more tense and more, as you observed, didactic.

As for LLMs, that technology is extremely valuable for what it does. It's a major step forward in generating good, idiomatic expressions in natural languages and many other notations, formal and informal. I would never belittle their importance, but I always remind people that their reasoning power is limited to abduction (educated guesses).

Very frequently, those guesses are correct, but errors are inevitable. I'm sure you know that and make appropriate adjustments. But when a system is using LLMs to process a large amount of data, there must be some method(s) for checking and correcting errors.

And thanks for applying ChatGPT. to analyze my note. I admit that it looks impressive, but I was not impressed.

After reading its comments about my note, I decided that I would not change a single word. Every point i made is appropriate for my response to your note. But ChatGPT did not understand what I wrote or why I wrote it. Following are my comments about each of ChatGPT's comments:

1. Appeal to authority:

I certainly admit that I was making an appeal to authority because you were expressing a doubt about my credentials for criticizing Grok. Therefore, I wanted to emphasize my credentials in logic, AI, and related fields. I would not change one word of that.

2. Appeal to citation/popularity:

You criticized the Washington Post as an authority on a technical subject. I wanted to emphasize that I did a search for articles to select, and I decided that the WP comments were among the best.

I also believe that ChatGPT showed its weakness as a critic by the following sentence: "The fact that many articles exist or that something is cited does not inherently make it true or reliable."

But my point is quite different: I maintain that you won't find any articles in that search that contradict the conclusions in the WP. That is indeed significant.

3. Vicarious Authority / Halo effect.

I mentioned my wife's niece, who wrote for the science section of the Sunday NYT (in the good old days when they had a science section) to point out that I knew some of the people who wrote for that section (I met and talked with several at a social gathering). That supports my knowledge of their qualifications.

4. Circular reasoning.

I was asserting my authority for what I wrote: 50 years of working in AI and still publishing peer-reviewed articles about AI.

5. Argument by anecdote

ChatGPT is clueless. The following fact is proof of the VivoMind power, and I can explain the exact instance:"Back in 2010… our system outperformed everything their best human analysts could find…"

6. Hasty Generalization

I had 10 years of experience in working on the VivoMind system. and I had many visits with the experts at NSA and discussions with them about their requirements. Arun and I sat with one of their experts who was viewing what the VivoMind system was doing while analyzing several terabytes of their data. It found every example that their experts had found while using the best software they had. Our system found every instance that their experts had found plus some that they had missed.

That is not hasty.

7. Red Herring / irrelevant Detail

"When I visited NSA, we met in the Museum… which is free and open to anybody…"

Mike: "many of your assertions are false. If NSA (or others) wanted you read in, you would have been pushed through the clearance process."

I do not lie. For most of our visits to NSA, we met at the museum. But on some occasions when we met with a larger group of people, I did go through the full procedure, which included fingerprints and photos to make sure I wasn't on any list of bad guys.

But most of our meetings were with one or two of their guys at the museum. We also made some presentations to a large group of mixed people from various agencies. Those were held at locations without all the strict security issues.

8. False dichotomy (implicit)

"None of the applications in the last section of those slides could be done by LLMs.

I was stating a fact about what LLMs can do. I would be happy to go into details about how and why LLMs do what they do, and I can point to my slides and to publications by myself and others on these issues.

9. Argument from Personal Belief / Bias:

"I don’t know the details, but just from seeing the examples…"

As an expert on the subject, I don't need to see the details to make a judgment about what Grok was doing.

10. Lack of falsifiability

"I check my writing with him [Arun] to make sure that it’s consistent with the high-powered stuff…"

If you doubt anything I wrote, you can send a note to Arun and ask him. He was on the cc list of some of the notes, and if you don't have a copy, I'll send you his email id upon request.

_____________________

Summary: I am not impressed. ChatGPT did not find a single sentence that I would revise or rewrite.

John

From: "Mike Bergman" <mi...@mkbergman.com>

Hi John,

Mike Bergman

unread,

Jul 3, 2025, 11:52:21 AMJul 3

to ontolo...@googlegroups.com, John F Sowa

Hi John,

No further response; you have the last word.

Best, Mike

--

All contributions to this forum are covered by an open-source license.
For information about the wiki, the license, and how to subscribe or
unsubscribe to the forum, see http://ontologforum.org/info
---
You received this message because you are subscribed to the Google Groups "ontolog-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ontolog-foru...@googlegroups.com.

To view this discussion visit https://groups.google.com/d/msgid/ontolog-forum/cc390554b83547eb87b21d6e1cb7d0e7%40f549dde214764e5bb0827a1b92e43d63.

Reply all

Reply to author

Forward

0 new messages