But over the past two weeks, Grok committed a string of bizarre blunders that might make it difficult for the AI to gain mainstream credibility. The chatbot’s answers to a wide range of unrelated questions wandered into unprompted digressions about “white genocide” in South Africa, sparking an uproar that the company responded to by deleting Grok’s posts and blaming an unnamed employee for an unauthorized code change. After that, users reported Grok veering into skepticism about the Holocaust, suggesting that its “truth-seeking” radar remained miscalibrated.
In some respects, Musk’s AI project has been a success. His fellow Silicon Valley tech titans invested heavily in xAI, making it a vehicle valuable enough to acquire his social network, X, earlier this year.
Grok has become a popular feature on X, where people use it as both a diversion and a resource. It rivals Google’s Gemini and Microsoft’s Copilot in app downloads and web traffic, according to the analytics firms Sensor Tower and Similarweb — though all three lag far behind OpenAI’s ChatGPT. The latest Grok models also stack up respectably against competitors on performance benchmarks, and the chatbot’s ability to draw on X posts gives it a unique advantage in responding to current events.
On Monday, Microsoft announced a deal with xAI to offer a version of Grok as an option in its Azure platform for AI developers, a stamp of approval of sorts from an industry heavyweight. In a video call with Microsoft CEO Satya Nadella, Musk said Grok aims to uncover “fundamental truths” by reasoning from “first principles” and “applying the tools of physics to thinking.”
That would be quite a leap from problems regularly encountered by today’s AI chatbots. Impressive as ChatGPT and its ilk are in some respects, they have often displayed a tenuous relationship to truth and logic, from fabricating names and facts to fumbling basic arithmetic. That’s because they are built to infer the most plausible response to any given query based on patterns in their vast, messy and often biased training data — not to grasp the nature of reality.
AI firms’ efforts to address those flaws have proved fraught. In February 2024, Google apologized after users mocked its penchant for injecting false diversity into inappropriate settings — such as depicting Asian, Black and Native American men in colonial garb when asked to draw “America’s Founding Fathers.” The company sheepishly explained that it had aimed to counteract AI’s tendency to stereotype by instructing the model to generate a wide range of people.
Musk has billed Grok as the antidote to such clumsy interventions: an AI that eschews political correctness in favor of actual correctness. So far, it has struggled on both counts.
Within a month of Grok’s launch, Musk was fielding complaints from his conservative friends that the chatbot was too “woke,” or socially liberal — a perceived failing that Musk chalked up to its initial training data. “Grok will get better,” he assured them.
Still, tests by The Washington Post earlier this year found that the chatbot was routinely contradicting some of Musk’s dearest views. It declined to blame Democratic election victories on electoral fraud, for instance, or air traffic control problems on diversity programs. The chatbot’s willingness to debunk such conservative talking points had begun to endear it to some liberals, who gleefully deployed it in replies to Musk’s X posts.
Grok has had less trouble delivering on Musk’s promise to make it spicier and less inhibited than other leading chatbots. Some users appreciate its willingness to curse, mock and wade into sensitive topics that make ChatGPT balk. It has also proved handy for misogynists, who have responded to women’s posts on X by asking Grok to reply by generating a picture of them undressed, and extremists, who have found it willing to produce Nazi propaganda. (XAI appears to have clamped down on some of those uses after they were publicly reported.)
But the biggest threats to Grok’s reputation may have come in recent weeks.
On May 14, the chatbot began responding to all kinds of unrelated queries by holding forth on the topic of “white genocide” in South Africa, to users’ bafflement. It’s a theory that holds that the country’s formerly ascendant White minority is being targeted for elimination by its Black majority — a claim the South African-born Musk has helped to popularize via his influential X account. The theory has been rejected as false by courts, government ministers and fact-checkers. Grok’s sudden obsession with it coincided with a push by the Trump administration to justify its controversial move to welcome White South African refugees at a time when the United States is turning away refugees of color from countries around the world.
XAI responded to the ensuing furor by deleting Grok’s tweets and blaming the issue on an “unauthorized modification” to the bot’s code that someone made at 3:15 a.m. that day. The company didn’t specify the culprit or announce any disciplinary response.
XAI did not respond to a request for comment.
It wasn’t the first time the company blamed unnamed rogue personnel for changes to Grok’s code that happened to align with its owner’s politics. In February, an X user uncovered a line in Grok’s instructions directing it not to draw answers from any source that linked Musk or President Donald Trump with “misinformation.” In that case, xAI’s engineering chief chalked it up to a change made without permission by an employee who was no longer at the company.
Aiming to restore users’ trust, the company last week published Grok’s “system prompts” — the hidden instructions that set the ground rules for a chatbot’s responses to users — and instituted new checks on changes to its code. The thinking: Putting the system prompts out in the open would reassure people that no one is manipulating them behind the scenes.
Seeing Grok’s prompts laid bare suggested its “truth-seeking” may be little more than a political filter applied to an otherwise standard-issue language model. Among the key instructions: “Provide truthful and based insights, challenging mainstream narratives if necessary, but remain objective.” (“Based,” as Grok helpfully defines it, is “a term of praise for bold, unfiltered, or contrarian views, often leaning right-wing or antiestablishment.”)
It soon became clear those views weren’t limited to the racial politics of South Africa. After Grok stopped talking about “white genocide,” users circulated examples of it questioning whether the Holocaust was exaggerated — a tired antisemitic trope.
Politics aside, Grok’s vulnerability to parroting discredited claims casts further doubt on Musk’s aspirations for it to be a reliable source of information in high-stake realms such as medicine. In January, Musk reposted an X user’s story about Grok correctly diagnosing an injury that human doctors had overlooked — only for users of X’s “Community Notes” fact-checking program to point out that Grok appeared to have made a significant mistake in its analysis.
It’s conceivable that someday AI models really will develop minds of their own. But for now, Grok’s antics make clear that the ideal of a “truth-seeking chatbot” remains unfulfilled.
Hi John,
Verify for yourself and report your experiences. Lousy prompts
and agendas in reporting about all of these chatbots means there
are no "authoritative" sources on these matters. Certainly, dare I
say, the Washington Post.
Going back to your first excoriating comments on these
capabilities I have maintained to you (and others on the list)
that taking a simple, pragmatic viewpoint of whether a given
chatbot is a useful research assistant or not is the most
productive way to assess these systems. Let others rail about AGI
or make sweeping assertions about this or that. As for me, I have
found Grok to be the most responsive and capable of the chatbots
at present. Yes, Grok, and I will soon comment on my blog about
why I think that is.
I find it more useful to get more stuff done with my research assistant than to waste time pontificating about this stuff. YMMV.
Best, Mike
--
All contributions to this forum are covered by an open-source license.
For information about the wiki, the license, and how to subscribe or
unsubscribe to the forum, see http://ontologforum.org/info
---
You received this message because you are subscribed to the Google Groups "ontolog-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ontolog-foru...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/ontolog-forum/3176c51a3cae44c98e7309d0eefefa55%40cbdd040dad724250b636695771915d5d.
-- __________________________________________ Michael K. Bergman 319.621.5225 http://mkbergman.com http://www.linkedin.com/in/mkbergman __________________________________________
Hi John,
You may wonder why I respond to your postings on this list (and others that we share) as I do sometimes. Perhaps as you know I cite you as an influence in my own intellectual endeavors as the one who first got me interested in Charles Peirce. For that, I will be forever grateful, and acknowledge such in my own writings.
But you tend to have a didactic style that I have noted before is off-putting. Your most recent response was ripe with why that is the case. Because I was defending (I guess one could say) the usefulness of Grok, I thought it a conflict of interest to ask Grok for its opinion. So, I asked ChatGPT 4o to comment on what it saw as logical fallacies or rhetorical devices in your email to me. Please know I anonymized your email to prevent any reference to you, your companies, or your colleagues. Here is ChatGPTs response; know also I got ChatGPT's approval to quote this:
"Mike,
As requested, here is a plain-text analysis of the logical fallacies and rhetorical techniques used in the email you received. These patterns are common in persuasive writing that seeks to establish credibility while minimizing or deflecting critical scrutiny.
1. Appeal to Authority (argumentum ad
verecundiam)
Examples:
"I have been doing research in AI for the past 50 years."
"I published two books myself, coauthored several others..."
"One of them is the NSA."
"My colleague has the clearance."
"I can assure you that their requirements for precision are far beyond anything that Grok can do."
Why it matters: These statements use personal credentials and associations to claim authority, but they don’t substitute for actual evidence or argumentation.
2. Appeal to Citation / Appeal to Popularity
Example:
"That article I cited from the Washington Post was on a topic that had a large number of related articles on the WWW. Just search for that subject and you’ll find them."
Why it matters: The fact that many articles exist or that something is cited does not inherently make it true or reliable.
3. Vicarious Authority / Halo Effect
Example:
"I know people who write the science reviews in the NY Times (one is my wife's niece)…"
"I don’t know the science editors of the WP, but I believe they would be as respectable…"
Why it matters: Relying on family associations or assumptions about reputability is not a substitute for evaluating the actual content.
4. Circular Reasoning (Begging the Question)
Example:
"When I say that the LLM-based technology has a long way to go before it becomes truly reliable, I am using the best information available."
Why it matters: This conclusion is assumed in the premise—he's essentially saying "I’m right because I’m using good information" without providing that information.
5. Argument by Anecdote
Example:
"Back in 2010… our system outperformed everything their best human analysts could find…"
Why it matters: Anecdotes may be impressive but are not a solid basis for general claims without independent verification.
6. Hasty Generalization
Example:
"Just from seeing the examples of what Grok does, I can assure you it would be absolutely useless for what the NSA requires."
Why it matters: A sweeping conclusion based on limited or superficial evidence is logically weak.
7. Red Herring / Irrelevant Detail
Example:
"When I visited NSA, we met in the Museum… which is free and open to anybody…"
Why it matters: This detail is irrelevant to the argument and may serve to impress or distract.
8. False Dichotomy (Implicit)
Example:
"None of the applications in the last section of those slides could be done by LLMs."
Why it matters: Suggests an either/or framing that ignores the possibility of hybrid systems—ironically contradicted later in the same email.
9. Argument from Personal Belief / Bias
Example:
"I don’t know the details, but just from seeing the examples…"
Why it matters: Acknowledging ignorance while still drawing firm conclusions is not logically valid.
10. Lack of Falsifiability
Example:
"I check my writing with him to make sure that it’s consistent with the high-powered stuff…"
Why it matters: Because the colleague has clearance and cannot speak publicly, this creates an unfalsifiable appeal—no one else can verify or challenge it.
Summary Table of Fallacies Identified:
Appeal to Authority – personal experience, NSA access, publications
Appeal to Citation / Popularity – presence of articles on the web
Vicarious Authority – wife’s niece, assumptions about WP staff
Circular Reasoning – "best info = I’m right"
Anecdotal Evidence – 2010 outperforming story
Hasty Generalization – judging Grok from examples alone
Red Herring – museum visit
False Dichotomy – symbolic AI vs. LLMs
Personal Belief Bias – confident assertions despite lack of detail
Lack of Falsifiability – colleague vetting but cannot publish
Let me know if you'd like this integrated into a rebuttal or formatted for another medium.
– ChatGPT (OpenAI)
(as
requested by Mike Bergman)"
John, so basically, there are 10 different logical and category errors in your response.
Further, I have direct knowledge of NSA and
many three-letter agencies and I can categorically state many of
your assertions are false. If NSA (or others) wanted you read
in, you would have been pushed through the clearance process.
NSA most definitely has open source systems and connections in
its repertoire, including ones from my own companies. I have not
just seen the Enigma at the museum, but have been in
three-letter agency offices under direct contractual relations.
These agencies definitely use open-source systems, just kept
separate from the secure networks. My guess is that NSA and
others have direct experience and use of all competing current
AI products; they would be fools not to have, but I can not
claim that personally.
I don't know the real point you are trying to make here. All I have been trying to say on this topic is rather than pontificate and make uninformed and logically questionable assertions about LLMs and their capabilities, that you actually use them. Like I have said, they are not oracles nor superintelligence, but if wisely employed as research assistants, they can be very productive aids.
Best, Mike
That article I cited from the Washington Post was on a topic that had a large number of related articles on the WWW. Just search for that subject and you'll find them. I selected the report from the WP because it was the clearest and most informative for its length. I know people who write the science reviews in the NY Times (one of them is my wife's niece), and they have good backgrounds in science. I don't know the science editors of the WP, but I believe that they would be as respectable as the science editors of the NYT.
I have been doing research in AI for the past 50 years. I published two books myself, coauthored several others, and published quite a few papers. By the measure "be cited or blighted", you can check my citations by typing "Google scholar for John Sowa".
After 30 years of R & D at IBM, I was a cofounder of two AI companies, VivoMind LLC, and Permion Inc. We have some high powered customers who require the highest quality results. One of them is the NSA. My colleague Arun has the clearance. When I visited NSA, we met in the Museum, which is free and open to anybody who may be interested in artifacts for encoding secret stuff from ancient times through various modern wars -- enigma machines, for example.
I can assure you that their requirements for precision are far beyond anything that Grok can do. Back in 2010, when tested on a few terabytes of the kind of data they examine, our VivoMind system outperformed everything that their best human analysts could find when using all the tools that they had at their disposal from all other vendors. They don't tolerate approximations. They want to find everything at the highest degree of accuracy.
For an overview of our VivoMind technology of 2010, see https://jfsowa.com/talks/cogmem.pdf . None of the applications in the last section of those slides could be done by LLMs. But our new Permion system combines the best of the LLM technology with the best of our VivoMind symbolic methods from 2010.
I write the articles because I can say anything I wish -- because I don't have clearance. Since Arun has clearance, that makes it very difficult for him to get approval for publishing anything. But I check my writing with him to make sure that it's consistent with the high-powered stuff that our Permion system can do.
As for Grok, I don't know the details, but just from seeing the examples of what it does, I can assure you that it would be absolutely useless for what the NSA requires. I'm sure that Elon has people with clearance who may work with the NSA guys, and they probably have contracts to do all sorts of stuff. But I can assure you that Grok itself is not allowed to be used inside the NSA -- because they cannot allow any cross contact between their systems and anything on the outside.
Summary: When I say that the LLM-based technology has a long way to go before it becomes truly reliable, I am using the best information available.
John
-- __________________________________________ Michael K. Bergman
"When I visited NSA, we met in the Museum… which is free and open to anybody…"
9. Argument from Personal Belief / Bias:
Hi John,
No further response; you have the last word.
Best, Mike
--
All contributions to this forum are covered by an open-source license.
For information about the wiki, the license, and how to subscribe or
unsubscribe to the forum, see http://ontologforum.org/info
---
You received this message because you are subscribed to the Google Groups "ontolog-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ontolog-foru...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/ontolog-forum/cc390554b83547eb87b21d6e1cb7d0e7%40f549dde214764e5bb0827a1b92e43d63.