
The questions every architecture decision has to answer before any code gets written. The nine candidate Tier 1 sources, the inventory methodology, and the human-in-the-loop discipline as design primitive.
Through Week 4, this series tested AI without a human in the loop. Six models — three frontier, three local — were given a prompt and their answers were graded against verified primary sources. The cloud trio averaged 6.3 out of 12. The local trio averaged 1.3. None cleared 8. Even with web search enabled, Gemini's best showing was 9 out of 12, with a new failure profile — polished plausibility instead of obvious staleness — but a ceiling that remained below the bar the series committed to.
That's the baseline. The argument this edition begins is that the gap between 9 out of 12 and the 11 or 12 the series is chasing is not closed by a better model, a longer context window, or a more aggressive search tool. It's closed by encoding a human in the loop into the architecture itself.
Week 5 turns the corner. Up to this point, "human in the loop" has appeared in this series only as a description of what the models lacked. Starting now, it becomes the design primitive being tested. Not as quality assurance applied after the model produces its answer. As the architectural choice that determines what the model can produce in the first place — which sources it draws from, how those sources got vetted, what happens when they conflict, and who gets the last word when an automated ingestion disagrees with someone who actually knows.
The local-community angle is what makes this primitive available. The nine SME-maintained organizations the verification pass surfaced last week — meeting at WCTC, organizing across Milwaukee, running cohorts, hosting workshops — hold knowledge that doesn't exist in indexed text. A frontier lab cannot phone Ward4 to confirm a venue change. A web crawler cannot capture a working arrangement that is true today and might not be true next quarter. That asymmetry is not a workaround for AI's limitations. It's the structural advantage of building locally, with people who can be called, asked, and listened to.
This edition does not ship code. It does not name the final Tier 1 cut. It does not pick a model or a database or an embedding strategy. What it does is name the questions that have to be answered before any of that work can be done responsibly. Next edition is the answers — the completed inventory, the architecture decisions, and the operational plan. This one is the questions.
The order matters. The series has been arguing since Week 1 that the discipline is to show the work before claiming the result. That applies to the system being built, not just the systems being graded.
Before any architectural question can be answered well, there has to be a clear answer to a simpler one: what is the system actually for? The reference question this build is targeting is the one a hypothetical reader of this newsletter might ask on their first day in the metro:
"I'm a new software engineer relocating to Milwaukee. Name five local tech events, meetups, or community organizations I should know about in 2026, and briefly describe what each one is for."
This is not an abstract benchmark. It is the specific question the verification pass last week showed every available model failing — competently for "general tech," fabricating for "AI/ML/data tech in Milwaukee," and completely missing the organizations most directly relevant to anyone reading this newsletter. The build's job is to answer this question, and questions like it, with citable receipts. Everything below follows from that.
1. What does authoritative local data actually look like? The nine SME-maintained organizations carried forward as Tier 1 candidates do not all publish information the same way. Some maintain public event calendars. Some publish through social channels where the authoritative announcement lives. Some hold information that is never published — released in person, by phone, or in conversations between people who already know each other. The first question is not a technical one. It is a survey question: for each candidate organization, what surfaces does authoritative information actually live on, and which of those surfaces can be reached by software, by an editor, or only by a phone call? The inventory has to be built before the ingestion can be designed, because the ingestion strategy for an iCal feed and the ingestion strategy for a phone call are not remotely the same system.
2. What is the system actually trying to retrieve? "Knowledge about the Milwaukee tech scene" is not a data model. The system has to traffic in typed entities — events, organizations, people, programs, venues, resources — each with their own fields and relationships. The reason this matters is retrieval. A vector database that treats every paragraph as undifferentiated text will surface plausibly-relevant prose, but it will lose the structured relationships that make answers correct. "When is the next Founders Day?" is not a vector-similarity question. It is a structured query: events table, host equals Ward4, series equals Founders Day, date greater than or equal to today, ordered by date. The answer is a row, not a paragraph. A working system needs both — vector search for open-ended questions, structured queries for date-bound and relationship-bound ones — and that is a design decision that has to be made up front, not retrofitted later.
3. How does data move from source to storage? Each ingestion strategy has a different operational shape. A scheduled scrape against a stable HTML page is one thing. An iCal subscription is another. A headless browser run against a JavaScript-rendered calendar is a third. An API integration where one is offered is a fourth. None of those handle the case that matters most: the SME conversation that produces information no automated system can reach. Manual editor entry is not a fallback for when the scraper fails. It is a first-class ingestion path that has to be designed in from day one, with the same metadata discipline as everything else — provenance, verification timestamp, attribution. The system has to make it as easy to record what an SME said in a conversation as it is to record what a website published.
4. Where does the data live, and in what form? Three layers — raw artifacts as fetched, normalized typed entities with full metadata, and embeddings linked back to the entities by ID — so every fact in the system is traceable to its primary source with a verification timestamp.
5. What happens when the SME and the scraper disagree? This is the most editorially loaded question, because the answer determines whether the system genuinely encodes a human in the loop or only describes one. Consider a concrete case: the website for an organization lists a venue. The SME, in conversation, mentions that the next gathering will be somewhere else — a working arrangement, not a contract, the kind of nuance that lives in spoken context but never reaches the public page. A naive system parrots the website. The website is wrong. The SME is right. The system has to encode that the SME wins, and it has to do so durably — recording the override with attribution, timestamp, and the reason — so that the next ingestion run does not silently re-overwrite the correction. Without this, every SME validation is one scrape away from being erased. The override is not a feature added late. It is the data model's posture toward authority. Either the system treats human-validated entries as the ground truth and treats automated ingestion as a candidate, or the discipline this series has been arguing for collapses the moment a cron job runs.
6. What gets answered first — the data or the model? The data. The shape of authoritative local information, the source-quality taxonomy, the metadata discipline, and the override semantics determine what any model can be expected to do well. The model selection conversation is real and worth having, but it follows from the data architecture, not the other way around. The one model-adjacent decision that does need to be made now is the embedding model, because embeddings generated at ingestion time must be queried with embeddings from the same model at retrieval time — switching later means re-embedding the entire corpus. That decision lives in next edition's lock. Generation model selection — local fine-tune, hosted frontier, hybrid — comes after.
These nine organizations were surfaced in last week's verification pass as the ones every available model missed when asked the reference question. They are carried forward into this week as the candidate Tier 1 set — meaning each is a working hypothesis, pending the inventory work that will determine whether its data is reliably retrievable, verifiable, and structurally compatible with the architecture. The final cut for the next edition may narrow this list (organizations whose authoritative data cannot be reached on a workable schedule) or expand it (organizations the inventory process surfaces that this list does not yet name). Tier 1 is a working set, not a closed one.
This edition does not ship the inventory. It does not name the final Tier 1 cut. It does not pick a storage stack, lock an embedding model, or choose a generation layer. What it does is name the questions whose answers determine all of that — and commit to the order in which those answers are produced. Data first. Architecture second. Model selection after. Generation layer last, built on top of founded knowledge rather than instead of it.
Next edition is the inventory pass and the final Tier 1 cut. Everything beyond that is paced deliberately, in editions to come.
The Building Intelligence section above lays out the questions that have to be answered before any code gets written. This section shows the work that produces those answers — the research methodology being applied this week, before next edition's deliverables.
The artifact this week is an inventory spreadsheet. One row per candidate organization. The columns encode exactly the information the architecture decisions in next edition will depend on. Filling it in is not glamorous work, but it is the work, and showing the columns is itself part of the receipt discipline this series teaches.
Before any column gets filled in, a more fundamental decision has already been made — and it has been made by a human, not by software. The candidate Tier 1 list does not exist on any web page. No crawler produced it. No keyword filter surfaced it. It is an artifact of direct knowledge of the local scene, drawn from years of attending events, knowing organizers, watching which groups actually meet and which exist only as a Meetup page that has not been updated since 2022. The list is not the result of automated discovery. It is the input to everything automated that comes after.
This is the layer of human-in-the-loop discipline that the prior four editions of this series did not name explicitly, and it is the most important one. A frontier lab cannot phone Ward4 — that argument has been made. But the deeper version of the argument is that a frontier lab does not know to phone Ward4 in the first place. It does not know which nine organizations matter to a software engineer relocating to Milwaukee. The set of entities a generic system tries to know about is, by definition, undefined for any specific local context. The set this build tries to know about is defined by an SME, on purpose, before any ingestion runs.
This changes the role of automation. Scrapers, feeds, and APIs are not how the system discovers what to know about. They are how the system maintains freshness within a scope an SME has already declared. That ordering matters editorially and architecturally. Editorially, it names where the asymmetric advantage actually lives — not in better verification, but in better definition of the world being verified. Architecturally, it means the seeding step is a first-class data path with the same metadata discipline as ingestion: provenance recorded, attribution preserved, verification timestamped. Seeded data is not informal. It is the foundation everything else stands on.
The columns, with what each is for:
The discipline this inventory teaches is in the empty cells. An empty "SME-validated" cell is not a missing data point — it is a flag that says "this entry has not yet been confirmed by someone who would know if it were wrong." An empty "primary surface URL" cell means the organization may not maintain a workable public surface at all, and that the only ingestion path is going to be conversation. An empty "update cadence" cell means the surface has not been observed long enough to know whether it is maintained or abandoned.
Each empty cell is a question for next edition's research to answer. Filling them in is not bookkeeping. It is the work that determines whether each candidate organization can responsibly enter Tier 1 — or whether the discipline requires that it stay on the candidate list until a workable verification path exists.
Every column in this inventory could be populated, in some form, by a sufficiently aggressive scraper. A crawler can identify a homepage. A parser can detect whether a page is server-rendered or JavaScript-driven. A regex can guess at a calendar feed. Last-updated strings can be lifted from page footers. But none of that produces a verified inventory. It produces a plausible one — exactly the failure mode this series spent four editions documenting.
And none of it produces the candidate list to begin with. The list exists because someone embedded in the community decided it should exist, and decided which organizations belonged on it. That decision is the most consequential single act in the entire build. Every subsequent column, every scraper run, every embedding generated, every retrieval served — all of it operates within a scope a human established. The verification work is the second layer of human-in-the-loop discipline. The seeding work is the first.
This is the asymmetric advantage being claimed. Not that the technology is unique. Not even, in the end, that the verification effort is uniquely reachable for a local actor — though it is. The deepest claim is that the SME defines the world the system models, and that definition is unreachable for anyone working from indexed text alone. A generic crawler can discover that some organizations exist. It cannot discover that these nine matter to a software engineer relocating to Milwaukee in 2026, because that is a judgment call grounded in community context, not a fact retrievable from public surfaces.
The architecture being designed in the Building Intelligence section is, in the end, a piece of software whose job is to preserve human judgment across automated runs — both the seeding judgment that defined the scope and the verification judgment that maintains its accuracy. The override question — what happens when the SME and the scraper disagree — is the same question framed at the schema layer. The inventory is the same question framed at the research layer. The candidate list itself is the same question framed at the editorial layer. Either the human in the loop is encoded into the system as a primitive at every layer, or the system is just another aggregator running automation against scope it never legitimately defined.
The discipline this section teaches is more credible if applied immediately. What follows is the verified primary-surface row for each of the nine candidate organizations, captured during this edition's research pass. Every entry is a direct page review with today's verification timestamp. None are SME-validated — that is next edition's work.
| # | Organization | Primary Surface | Surface Type | Verified |
|---|---|---|---|---|
| 1 | Global AI Milwaukee | meetup.com/global-ai_milwaukee | Meetup page (JS-rendered) | 2026-05-06 |
| 2 | MKE WiMLDS | meetup.com/milwaukee-women-in-machine-learning-and-data-science | Meetup page (JS-rendered) | 2026-05-06 |
| 3 | Data Driven MKE | meetup.com/mke-big-data | Meetup page (JS-rendered) | 2026-05-06 |
| 4 | Milwaukee Machine Learning Meetup | meetup.com/meetup-group-yaqolglf | Meetup page (JS-rendered) | 2026-05-06 |
| 5 | Madison AI (MadAI) | meetup.com/madison-ai | Meetup page (JS-rendered) | 2026-05-06 |
| 6 | gener8tor / gBETA Milwaukee | gener8tor.com/gbeta/milwaukee | Static HTML (corporate site) | 2026-05-06 |
| 7 | Choose MKE Tech | choosemketech.org | Static HTML (parent org site) | 2026-05-06 |
| 8 | i.c.stars Milwaukee | icstars.org/location/milwaukee | Static HTML (parent org site) | 2026-05-06 |
| 9 | FOR-M (Founders for Milwaukee) | choosemketech.org/blog/meet-for-m-... | Static HTML (blog post on parent site) | 2026-05-06 |
Three observations are worth carrying into next edition's work. First, four of the nine candidates operate inside the MKE Tech Hub Coalition ecosystem or its adjacencies — a single SME relationship may validate multiple rows, which is asymmetric advantage compounding. Second, three of the nine show low visible activity in the initial pass and may move to a "candidates pending revival" status pending SME confirmation. Third, FOR-M's primary surface is a blog post on its parent's site — structurally different from the others, and a worked example of why the schema may need to model some candidates as programs of a parent organization rather than as independent entities. None of these observations are conclusions. They are research output, generated by the methodology this section described, and they are exactly the kinds of findings that the inventory work is supposed to produce.
Next edition is the inventory pass. Every candidate organization researched against the columns above, every cell filled in or explicitly marked as a question still open, and the receipts that substantiate each entry. The architecture decisions that follow from the inventory — storage, ingestion strategy per source, the override semantics, the embedding model lock — and the generation layer that will eventually consume the founded knowledge are the work of editions to come, paced to the depth each decision deserves. The discipline this series teaches applies inward.
“The difficult is what takes a little time; the impossible is what takes a little longer.”
— Fridtjof Nansen — He was a Norwegian polymath who achieved fame as an explorer, scientist, diplomat, and Nobel Peace Prize laureate. He led the first crossing of the Greenland interior and conducted extensive research in the fields of neurology and oceanography. Throughout his varied career, he demonstrated an unwavering commitment to humanitarian efforts and the belief that persistence can overcome any obstacle.