LLMs are a mirror.
Of society in general (the weights; the background awareness).
Of the user specifically (the context; the questions the user brings to it).
They may distort things, but they’re fundamentally a mirror.
ChatGPT’s answers feel to me like being served up a personalized Axios article.
More formatting than substance.
Punchy and “simple” while obscuring nuance.
Presents the answer as an objective, simple truth, as opposed to a nuanced observation as a point of departure and follow up.
Your AI assistant isn't your friend. It's a double agent.
This comes from Bruce Schneier’s excellent AI and Trust.
AmpCode dives into the power of subagents that are spun up to do minor tasks.
But how can you cache the logic of what they discover?
Agents need a coactive substrate to write and read intermediate insights to cache them.
That would allow them to not have to recreate every insight from scratch.
The cache has to be visible to the user to verify it's right and modify themselves.
I thought this critique of MCP was interesting.
It argues that MCP and "everything just talks English" APIs give poor composition.
Vibe-coded bespoke normal code is easier to compose.
Use LLMs to write normal code, don’t use LLMs to replace code.
LLMs are pretty good at writing normal code, but their squishiness is a bad fit for replacing code.
LLMs can do very useful things that code didn’t used to be good at.
Use it for that, not what code is good at.
This to me feels like another example of how the “infinite software” frame changes what is a good strategy.
Chat is a forgiving interface for quality.
If it gets the answer wrong or doesn’t do what you wanted, you can easily ask follow up questions.
One-shot UIs that have to give a perfect answer every time with no second tries are very hard to get to the bar of viability and then improve.
Compare Google Search and (spoken) Google Assistant interactions.
With Google Search, as long as the answer is in the top 10, it’s fine.
Formulating a query is fast, and text results are a broad channel, easy to skim through quickly.
Then, if users are consistently clicking on the third result, the system can twiddle its ranking to be to the top.
This gives a clear gradient to hill climb.
With Google Assistant, formulating the query and getting the response over audio are a narrow channel.
It’s impossible to speed up the readout, so you get one answer, not several.
If the answer isn’t right, users give up.
I loved this socratic dialogue from Geoffrey Litt about why chat is not the final UI for LLMs.
Every form of UX around chatbots today has tried to grapple with the model's limitations.
Then the next model gets better and obviates the UX.
What are the downsides fundamental to chat itself?
Those will help discover the ideal UX for LLMs.
Chat is ephemeral and squishy.
That's what makes it good for starting open-ended tasks without it feeling like a burden.
That’s also what makes tasks as they continue feel like wading through molasses.
If chat isn’t a good UI for LLMs, why is it winning?
To me it's entirely based on the novelty of LLMs, everyone's just experimenting with them right now and starting open-ended tasks, which chat excels at.
It's as the task goes on for longer and gets more structured that chat becomes more obviously not a good fit.
Chat is the poor man's open-ended UI.
LLMs are inherently open-ended, and chat is the most obvious and easiest way to grapple with that open-endedness.
But as we uncover other ways of doing it that lean more on the GUI, chat will look like just a feature, not a paradigm.
With LLM UX it feels like we have all the ingredients but we don't have the cake yet.
Chatbots seem like a local maxima to me.
Chatbots are not the ideal UX.
I think the ideal UX will include the ability to chat, but that will not be the primary interaction that everything orbits around.
A coactive relationship is one that empowers both participants.
“Apps that adapt to you” could be powerful.
Today apps are static.
Now LLMs and this tech allows them to proactively adapt to what you want to do.
Over time you’ll not even reach for apps at all because everything you do will be adapted so much to your situation that the notion of an app fades away.
I want Living Apps.
Apps that are alive, that adapt themselves to me, pulling in context or code to make themselves do what I want to do.
I could tweak them to my heart's desire.
Apps for living.
They could also be called “coactive apps”.
Coapps are apps that adapt themselves to my needs.
Alternatively, call them tools. Living Tools, or Coactive Tools.
‘Tools’ doesn’t have the expectation of “these are just like apps of today.”
The ideal software in the era of infinite software is pre-assembled lego sets.
You get a full, useful thing out of the box, that an expert designed to be holistically useful.
But it’s made of legos, so you can replace any block… or add whole new things to it.
So then creators can create both new lego blocks... but also new pre-assembled lego sets that all fit together nicely and coherently and are useful right out of the box.
Chat is a special case of a coactive surface.
You can both add messages to the log, but only alternating, and append-only.
A true coactive substrate is one that both the user and the AI can add to and edit.
How much you trust a suggestion is partially due to what context the system is drawing on.
Is it drawing on relevant facts about you?
Is it missing important ways that you differ from the general population?
Is it including irrelevant things that will distract it?
The quality of the context is a big determinant in how good the results are.
This is one of the reasons the ChatGPT dossier memory feels off to me.
You can’t inspect the context to say, “include this, not that”.
The chat is append only, not coactive.
You can imagine a UX where there’s a coactive context drawer at the top of the interaction.
Things in the interaction and that context drawer are all that is given to the LLM, nothing else.
When the drawer is collapsed it shows a short summary.
When you expand it you see trees of context that you’ve pulled in explicitly.
You can add in trees of context on demand easily.
E.g. “Include information about my nuclear family.”
The tree pulls in all of the sub-items that hang off of it.
You can choose to pull in something high up in the tree or low down.
You can delete any tree of context that’s not relevant.
There’s also a list of auto-included context that the system guesses is useful.
Those are included by default, but can be explicitly added to the included items, just like if you’d added it yourself, or deleted.
The ranking function is: how well does the system predict which trees of context to include, (predicting whether the user would accept or delete a suggestion?)
The user choosing to include or excluding bits from the suggested context is an extremely powerful ranking function.
The user wouldn’t even realize that they’re training the system for themselves and others by gardening their context.
Only a small number of users would need to do it to help tune it for a whole population.
Formulating context as a memory makes it sound like it's for the LLM.
Ideally you’d organize it for yourself, which would be useful on its own.
As a bonus it makes the LLM significantly better at doing things for you.
A good executive business partner doesn't “remember” you, they know you.
There's distillation.
How can you get a good enough model of yourself for the LLM?
Imagine a productivity system that does all the grunt work for you.
That gets better as you use it more, not because some PM added a feature or some fully automatic LLM-based insights.
But because it draws on the collective wisdom of everyone using it.
Dan Petrolito: I built something that changed my friend group's social fabric.
An extremely trivial script at the right moment can have a massive impact.
Situated software catalyzes cozy potential.
LLMs have lots of last-mile problems.
Largely due to gardening the right context and the right coactive UI.
Claude Code feels like a ride-on mower.
At first you go “Whoa, this is so easy to use, I can do 10x more than I could before.”
But as you use it for more things you realize it’s too coarse a tool to do detail work.
An extremely useful tool, but doesn’t replace all of your tasks in the garden.
Claude Code feels like a “choose your own adventure” style of developing software.
Anthropic Artifacts’ new dashboard feels like it changes the game.
Which is funny because it didn’t change anything other than having one place to see your artifacts, treating them as a first class citizen.
Before, artifacts were always secondary to the chat.
Your artifacts were lost in the sea of text and other chats.
Feels like the shift to the News Feed in Facebook back in the day.
Didn’t change the information in the system, just changes its visibility.
But that was like a figure-ground inversion.
Moved from k selection to r selection.
Anthropic artifacts dashboard gives you one place to see Darwinian evolution.
Everything you've ever seen before you can tweak and remix.
I liked Grant Slatton’s summary of techniques for LLM memory.
Apps can't borrow your data for a specific task.
They need permanent, unlimited access.
This architectural flaw means using multiple tools requires trusting multiple strangers forever.
It’s easier to just give everything to Google.
The same origin paradigm makes aggregators inevitable.
But the same origin paradigm is not itself inevitable.
Most users won't want to vibecode their own software.
You can’t safely run something that was vibe coded by a stranger if it has access to sensitive data.
That sets a ceiling on vibecode's distribution in the current security model.
In the same origin model you trust the actor not the action.
Chris Joel pointed out to me that DNS origins are like Neal Stephenson’s burbclaves.
“Now a Burbclave, that's the place to live. A city-state with its own constitution, a border, laws, cops, everything.”
Having to trust an origin in an open-ended way is the fundamental problem that leads to aggregation.
That's why switching policies to be on data, not origins is the key unlock.
Previously that was hard because of data sloshing around, impossible to administer.
But Confidential Compute creates the possibility for a runtime that can be structurally trusted to follow policies on data even remotely.
Why can't we have open aggregators today?
An open aggregator would have the benefits of an aggregator for distribution, but without the overly centralized power dynamics.
In the same origin paradigm you have to trust the origin with all of the data.
That’s possible to do with a single entity: “Do I trust Google with this data now and into the future?”
But it’s very hard to make that decision for a swarm: “Do I trust any one of millions of unknown entities in the swarm who might get this data now or in the future?”
Especially when any single bad actor in the swarm can leak the data somewhere else.
The single actor vs open-ended set of actors is a nearly infinite difference.
A swarm and aggregator dynamics are incompatible today.
You can't trust a swarm as one entity, and the same origin model requires you to trust the entities with your data.
But if you change the laws of physics you could get it to work.
The same origin model doesn't grapple with the fact that any data shared with an origin is an open-ended trust delegation.
GDPR tries to fix the same origin model.
But it’s does so hamfistedly because there are no good solutions to the same origin paradigm given its fundamental (lack of) design.
The same origin model doesn’t permit good UXes for privacy or security.
Will vibe coding produce individual rat’s nests of code, or large emergent edifices of collectively useful code?
In a spreadsheet, the more you tailor a spreadsheet to your use case, the more it becomes a personal rat’s nest, taking you farther from collaboration with others.
Contrast that with, say, Google Search.
The more people that use Google Search, the broader the data of queries and clicks to give clear signals, the better it gets for everyone.
Or the more people that use Wikipedia, the better it gets for everyone.
Whether vibe coding is convex or concave to collaboration comes down to the substrate vibe coded things are distributed in.
In Notion even if you plow a ton of effort into it it still doesn't help anyone else.
It's a close-ended system.
Your data doesn’t help anyone else other than your direct collaborators.
You can’t add more turing complete functionality that is missing.
How can you flip the model to be open-ended by default?
Put another way: we need a substrate for vibe coding that is concave to collaboration.
What would a Notion-like fabric look like that’s concave to collaboration?
Open-ended systems are powerful in a way hard to demo quickly.
Imagine a system where the value of the security model allows significant open-endedness.
That benefit would take time to show up for a given user.
When collaborating with a small group of people you know personally who you trust you don't need a different security model to trust them.
The security model first would help when you first execute code written by a stranger.
The web was similarly underwhelming on the first visit; it was only after a few link clicks to other domains that the power of its open-endedness revealed itself to you.
If a task was 1000x too hard and is now 10x easier, it's still 100x too hard!
I think people want a common-sense vision for optimistic human centered computing in the age of AI that is not:
1) Cynical engagement-maxing tech products of today.
2) Crypto.
3) E/Acc / Successionism.
People presume that if you’re optimistic about tech and are in the industry you just want centralization.
Or that if you’re optimistic about tech and don't want centralization then you must like crypto.
But it’s possible to be optimistic about tech and push for neither centralization nor crypto.
Such a third way is more important than ever before in the era of AI.
A HackerNews comment that stuck with me: “From the very beginning Facebook has been an AI wearing your friends as a skinsuit.”
When naming something novel, the slingshot maneuver can be helpful.
Name it based on what people know they want.
Then slingshot them to the thing they didn’t know they needed.
That latter part only becomes clear once they’ve used it.
Meet them where they are to take them to where they should go.
Useful when you have a thing that is superficially like other things, but fundamentally better in a novel way.
There’s a GitHub project with simple little LLM based “gremllms”.
When you access a method, the LLM generates code JIT.
I think the mental model of gremlins fits well: small, not too powerful, but mischievous and a bunch of them together can make an impact.
The word “context” is a good one for “relevant data to give as background to the LLM.”
But in the deep philosophical sense, your real context is outside your control.
Your context is a gravity field.
The water you swim in.
If you've gone through the effort of having high quality programmatic thinking LLMs can write infinite Op Eds for you, on demand.
The backlog of Bits and Bobs feels exceptionally valuable to me in the age of AI.
Teaching forces you to abduct your intuition.
That’s why one of the best ways to understand something is to teach it.
A slippery slope is an example of an emergent phenomena of noisy signal with consistent bias.
No individual step is that bad, obscured by noise.
The bias is in one direction: the gravity of incentives.
So the emergent global effect is clear and powerful.
Bruce Schneier points out that LLMs will bring mass spying.
Before, we had mass surveillance, but a human sifting through the collected data happened rarely.
That limited the oversight to things that could be done at quantitative scale, or the most egregious tails of behavior.
A panopticon kind of game theoretic dynamic.
But LLMs give qualitative insights at quantitative scale.
Society now has the technology to get qualitative insights at scale from that surveillance.
What could possibly go wrong??
LLMs are infinitely patient so good enough ACLs aren't good enough any more.
Before your data was protected a bit by security through obscurity.
But LLMs are infinitely patient to sift through data that was accidentally left open.
Another outgrowth of the “qualitative insights at quantitative scale”.
Some examples of sycosocial relationships with LLMs:
People Are Being Involuntarily Committed, Jailed After Spiraling Into "ChatGPT Psychosis"
Examples in this Ezra Klein interview.
"New social networks are going to appear that will be LLMs creating a cozy web customized for us and our real friends, and their friends and so on
There will be a new social network built on mutual trust, all curated by machines of loving grace
Personal Cozyweb is inevitable"
I originally interpreted this tweet as “Use LLMs to make a psychosocial bubble of fake friends” which I think would be terrible for society.
But I think he meant it more as “make a garden of possibility for you and you friends,” which could be good in some circumstances.
Bruce Schneier: The Age of Integrity.
Integrity is an incredibly important topic.
In Information Flow Control, confidentiality and integrity are the two concepts that flow through the graph.
Integrity is about a trusted chain of provenance.
Integrity will be an important concept to tackle prompt injection.
Doug Shapiro: Trust is the new oil.
Trust will have to be rooted in in-person interactions that are known to be authentic.
Bruce Schneier's focus on integrity is important here.
Culture emerges.
Someone tries something that other people find viable and then the others reshare it, build on it, and remix it.
The things people like are what get built on, emergently.
For things like architectural styles there’s some geographic pockets which helped give distinctive styles in different parts of the country.
As everything gets more fluid and lower friction, we’ll likely see less and less distinctive architectural styles.
Building for scale and building for viability are different.
Imagine working at a company where on day 1 of a product you can expect 50M users.
You have to think about every little detail before building it.
In that environment product discovery is about talking and planning.
Writing down plans so they can be critiqued and collaborated on is a critical step in the process.
But in most contexts, the 0-1 phase has very little usage.
Product discovery there is about experimenting and surfing through the problem domain fluidly.
It’s all about getting something concrete into contact with real people as quickly as possible to iterate.
In that environment, writing down things slows down the discovery process significantly.
The demoware mindset is "does it work?"
The product mindset is "do I want to use it?"
Very different bars!
People who have a high need for novelty won't focus on polish.
Polish isn't novel.
Even if you build the right surfboard, will you catch the wave?
Hank Green on the Cacophonous Age: "You're not addicted to content, you're starving for Information."
In the Cacophonous Age, the new privilege is patience.
What kind of thinking are you not outsourcing?
How can you have a secure attachment to reality instead of trying to leave it?
To get it you need to sit with the discomfort of uncertainty.
The discomfort is the point.
Reframe the discomfort as excitement!
When you have your own authentic clarity, you stick with your intuition when everybody else would have given up.
Terms have an “inferred definition.”
That is, what a term means is what the majority thinks it means after the first time they've heard it.
People will bring their own preexisting priors to any given term, and that bias will lead to what the term means, especially if it’s a consistent bias many first-time hearers will share.
Terms like “context engineering” are useful because they mean the thing that most experts hearing it for the first time would think it means.
“Inferred definition” is itself a term coined by Simon Willison.
Your taste creates an architecture for your thoughts.
If a CEO could direct a swarm of LLM clones of themselves, we’ll expect more volatility in company performance.
Founder led companies are more volatile.
If the founder says to go in a given direction, even if it’s to avoid an obstacle the employees can’t see, they go along with it.
But if the founder missteers, there’s no one and no thing to countersteer.
Founder-led companies have greater returns than average, but also higher likelihood of death.
Non-founder led companies are harder for the CEO to steer.
But even founder-led companies are hard to steer at scale.
Before, the swarm of employees trying to implement the CEO's vision imperfectly gave some insulation.
For good, when the CEO’s idea was disastrous.
For bad, when the company had inertia that counteracted a good idea.
But if every employee is just a clone with minimal principal agent problem, it's like the Wreck it Ralph 2 swarm of poorly rendered copies creating an unruly emergent leviathan.
Which does the organization prioritize: loyalty or competence?
The former is a zombie organization.
Alignment at all costs, even if it kills the host.
No adaptive capacity.
Top-down alignment comes at the cost of local adaptivity and the possibility of emergence.
Whiteboard scribblings after a meeting are often completely meaningless to anyone else.
But they’re extremely meaningful to the people who were there.
Having the experience of that meeting in that space gives you the key to understand it.
Novelty is risk.
It's noise.
Some novelty will turn out to be innovation.
Most will simply not work.
Invest your novelty budget on the things that are differentiators, but nowhere else.
The optimal strategy for a swarm and for an individual are different.
Multi-headed search vs single-headed search.
If you can only have a single search head on a problem domain, you want it to be the best it can be.
K-selected.
If you can have a multi-headed search on a problem domain, you want as many heads as you can get to flood-fill the problem.
R-selected.
The swarm needs breadth, the individual needs depth.
Locality breeds creativity.
It’s the pockets that allow interesting results to grow and then expand.
Some things are convex and some are concave.
Some tend towards a center point.
Auto-stabilizing.
Concave.
Some tend away from a center point.
Auto-destabilizing.
Convex.
Two systems that look superficially the same but are convex vs concave have infinitely different outcomes.
What determines if a system is convex or concave?
If it leads to convergent outcomes or tears itself apart via entropy and diffusion?
I think it’s whether the locally good behaviors lead to emergently good outcomes at the macro level.
Some things, the more useful they are over time, get cleaner.
In some cases the more useful they are, the more fractally complex they get.
Convex vs concave.
Useful things tend to snowball.
In proportion to:
1) the amount of people who find it useful.
2) how useful they find it compared to available alternatives.
As it gets bigger it gets more useful to more people because more people are exposed to it.
The option that everyone has a slight preference for slowly diffuses and smothers out other options.
The more the bias the faster the diffusion.
As information flows with less friction the dominant species creates a mono culture.
Emergent systems need to have all logic decided at the local level, but have global-level outcomes emerge.
In emergent systems, all decisions are local but they have emergent global consequences.
You can't get a bird's eye view to coordinate, which is necessary for top-down convergence in a large system.
Many problems can't be framed this way, with a successful macro level outcome without a birds eye view, but some subset can.
These are typically grown, budded off of other systems that are working.
But a true bird's eye perspective is an impossibility anyway.
The farther you get away from the details, the more fuzzy they become.
The constraint of "local information only" feels overly restrictive, but it's close to the real constraint anyway.
Feedback is generated when you take an action and the world reacts.
You can't simulate what the world will do without taking the action.
The world is a multi-headed environment of execution.
Your head is a single-headed environment.
Debugging tools only give their highest leverage if they're at the proper layer of abstraction.
An inode-level debugging tool for a userland storage system is not helpful.
Good debugging tools give good leverage to features / bugs at that layer.
One reason Rust’s Borrow Checker is palatable is Rust’s amazing error messages.
"What's your dirty little secret?"
Related to the carrot problem.
Every team or product has one.
Data used to be sent on physical media, and moving the physical media had significant friction.
You’d print words in a book and then ship the book.
When data moved to the plane of entirely bits it got orders of magnitude faster.
We divorced information from atoms.
We saw much more aggregation much faster.
The same dynamic as before, just playing out orders of magnitude faster.
An important tactic: lashing yourself to the mast.
To make it so you can't do the things you fear you'll want to do.
Past you can constrain future you, to make sure your reptile mind doesn’t override your aspirational mind.
A game theory solution that's similar to playing chicken with yourself.
Sometimes what you think will be a shortcut is actually a detour.