This Week in Theory to Practice: Building My Trust in AI Outputs

8 views
Skip to first unread message

Liz DiLuzio

unread,
Apr 2, 2026, 10:32:58 AM (12 days ago) Apr 2
to ynpn...@googlegroups.com
͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ 
Image description

By Liz DiLuzio

Last month, I worked with a globally recognized university to analyze results from a pre/post-test completed by participants in a year-long leadership program for business owners. The goal was to understand what had actually changed over the course of the year based on open-ended responses, and they wanted me to use AI to do it.


I hesitated. It wasn't because I doubted AI's ability, but I also didn't trust it. I wasn't worried that it would fail in an obvious way. I was concerned that it would produce something that looked clean, sounded right, and quietly drifted from the data.


For my first pass, I gave the model a single prompt and asked it to identify themes. I wanted to see what it would do without constraints. What came back confirmed the risk.


The output was polished and the themes plausible. It read like something that could move forward without much friction. But as I worked back through the responses, I could see where it had overstepped. Quotes had been fabricated. Patterns had been overstated. The analysis held together on the surface, but it didn’t hold up under scrutiny.


So I adjusted. I didn’t abandon the tool completely, but I did take control of how it engaged in the process by breaking the work into phases. I had it code the data in batches. I asked it to hold off on naming themes until it had read the full dataset. I required that every claim be tied back to a specific response, and that every quote could be verified exactly as written. The second pass was slower, more deliberate, and grounded in the data. It produced something I could stand behind.


That distinction has stayed with me. Not because it says something about AI, but because it reinforces something more general: trust does not come from how polished an output looks. It comes from whether the work behind it can be seen, questioned, and verified.


3 Ideas from Me

1. Structure is what makes AI trustworthy.
AI is fast enough to collapse an entire qualitative analysis into a single step. It can read, code, theme, and summarize in one pass. The problem is that when those steps blur together, there is no way to tell where the analysis is grounded and where it is inferred.

Breaking the work into the phases of qualitative analysis changes that. For example, I chose to deliberately move from ingestion, to batch coding, to codebook development, to full coding, to quantification, and only then to interpretation. At each step, the output could be checked. That structure did more than improve rigor. It made it possible to trust the results, because every conclusion could be traced back to a visible step in the process.


2. Slowing AI down is what prevents hallucination.
Left unconstrained, AI will fill in gaps. It will connect patterns, smooth inconsistencies, and complete partial ideas. While this can be useful in other contexts, it's risky in qualitative analysis and, perhaps more importantly, it made me as the user uncomfortable.

Knowing this, I mitigated its tendencies not a single instruction, but with the decision to slow the process down. Coding was done in batches of roughly 50 change units at a time. Themes were not named until all batches were complete. Interpretation was held until after quantification. At each stage, the prompt focused only on the task at hand, not the end result.

That pacing mattered. It reduced the opportunity for the model to “complete the story” before the data had been fully examined. In practice, hallucination was less about false facts and more about premature synthesis. Slowing the process is what kept that in check.


3. Trust comes from traceability, not output quality.
A well-written summary can still be wrong. That became clear repeatedly. The analysis only became reliable once every claim could be traced back to something concrete: a coded unit, a frequency count, or a verbatim excerpt.

This is where the earlier steps paid off. Because coding, theming, and counting were done systematically, it was possible to question any conclusion and follow it back through the process. When something didn’t hold up, it could be corrected without redoing the entire analysis.

That traceability is what builds trust in AI’s role. Not that it produces polished outputs (we know we can count on it for that!) but that it participates in a process where each step can be verified.


2 Quotes from Others

“In God we trust. All others must bring data.”

   {W. Edwards Deming}


“The first principle is that you must not fool yourself—and you are the easiest person to fool.”

   {Richard Feynman}


1 Question for You

Where in your workflow could documenting your thinking or decisions make your work easier for others to trust? Where might you want that same visibility from someone you manage?

Interested in going deeper? 

I’ll be teaching two Skill Sprints this May on using AI in evaluation work.

One focuses on qualitative analysis, where we’ll practice the exact process described here to generate results you can stand behind. The other takes a broader approach, focusing on how to write better prompts and get more reliable outputs across common evaluation tasks. Early registration is now open. You can learn more and reserve your spot here.

Website
LinkedIn
Instagram
YouTube
WhatsApp

PO Box 728

New York, NY 10116

Know someone who would appreciate this? Feel free to pass it along.

Reply all
Reply to author
Forward
0 new messages