We’ve left the Generative AI era and entered the Agentic AI era.
The models have gotten so good at tool calling in a loop that they can do something qualitatively different.
Chatbots are from the Generative AI era.
They can do agentic things, but almost as an accidental bonus.
Tools like Claude Code are fundamentally agentic.
They look kind of like a chatbot, but that’s incidental.
A couple of years ago people tried to wrap LLMs and create agents, but it was premature.
The model quality wasn’t there yet.
So they concluded it wasn’t possible.
But it just wasn’t ready yet.
Now it is.
The chatbot form factor is not enough to convey that power.
Great distillation from HBR: AI Doesn’t Reduce Work—It Intensifies It.
By taking away the cognitive labor, work becomes more pure.
Focused on the parts that are where judgment is deployed.
High leverage.
Intense.
When you have abundant cognitive labor at your fingertips, it’s overwhelming.
It’s giddy and exciting, but also a lot.
Suddenly you realize that your opportunity cost is orders of magnitude higher than before.
Every second you aren’t feeding your agent swarm is huge amounts of value you aren’t creating.
You also know that others who are doing it will lap you.
The infinite possibility creates a red queen race.
Engineers are feeling the impact of Agentic AI before everyone else.
It’s leading us to feel manic: euphoric, productive, overwhelmed, like vampires.
This is what the rest of humanity will feel sometime soon too.
People are willing to do crazy things to claw some time back.
If it gives you 3 more hours a day, it can feel worth it, even if it’s reckless.
OpenClaw is the first AI agent that lives up to the hype.
That is, assuming you don't care about security, privacy, or not getting hacked.
It’s this generation’s LimeWire.
Two new classes of engineers created by AI: Dragon Riders and Slop Cannons.
Dragon Riders can tame the gnarliest problems and do 100x more than other engineers can.
Slop Cannons make even more work for everyone else than ever before.
AI gives leverage, so it multiplies your inherent ability as an engineer.
Your judgment and taste matters more than ever before.
Being a Dragon Rider requires jumping between many different things quickly.
The inability to focus on any one thing is now more of an advantage than it was before.
Credit to Dimitri and Dion for these ideas.
Wild West roundup for this week:
Claude Desktop Extensions Exposes Over 10,000 Users to Remote Code Execution Vulnerability.
'Summarise with AI' can secretly sway recommendations, researchers warn.
OpenClaw corner:
MIT Technology Review: Is a secure AI assistant possible?
And in the “what is even happening” corner:
In case you missed it: a tweet about how Claude Cowork accidentally deleted his wife’s 15k photos.
These tools are like rusty chainsaws.
Impossible to make safe enough for mass market in their current form.
The whole point of OpenClaw is the open-endedness with minimal friction.
That power is also dangerous.
The trick will be how to get the power without the danger.
This will not be a simple thing to achieve.
In the last century or two we’ve gotten used to abundant mechanical labor.
Now we have abundant cognitive labor.
This will change even more than the Industrial Revolution did.
Infinite software is downstream of cognitive labor being abundant.
But it’s just the most obvious, immediate outcome.
There will be tons of others.
Hold on to your butts!
LLMs cause Complication Collapse.
In the Cynefin model, there are four different types of problems:
1) Simple
2) Complicated
3) Complex
4) Chaotic
Complicated problems are composed of only complicated or simple components, all the way down.
Previously complicated problems were hard because they required superhuman amounts of patience to tackle and systematize.
But now LLMs make cognitive labor abundant.
Suddenly a large class of complicated problems collapse to being simple.
Credit to my friend Dimitri Glazkov for this idea.
Vibe coding can crystallize the fuzzy future into clarity very quickly.
Collapsing the complexity of the future into complication.
“Does it work or not?”
Everyone has a limit on the number of questions they're willing to answer before they get fatigued.
LLMs never get fatigued.
Sam Schillace points out an important implication of scarce attention in the age of AI.
That means that optimizing production processes to not require human attention is important.
Agents are great at handling tedium.
Much better than humans are.
Infinite patience.
Orchestration gets more onerous as you can outsource more.
AI allows outsourcing more cognitive labor.
So now you need more cognitive labor to orchestrate.
Orchestration requires a view that can span all of your relevant contexts.
Great piece from Rob Dodson: My Second Brain Never Worked. Then I Gave It a Gardener.
The hard part of maintaining a second brain is the cognitive labor.
LLMs can do the cognitive labor part.
We’re all managers now.
Paul Graham talked about makers vs managers.
But now all of us can manage a swarm of agents.
This means that judgment matters more than ability to execute.
In the valley we used to say “Ideas are cheap, execution is all that matters.”
Well, now a large class of execution is cheap, too.
Imagine being god of your own ant colony.
Like in the old Maxis game SimAnt.
We assume software is a rigid artifact that works in a precise, preprogrammed way as a monolith.
Companies assume that, as a foundation of all of the value they create.
But that's no longer the case!
Agentic Engineering allows software to adapt and evolve on its own.
The hard part is not the software.
It’s the integrating with your life part.
One implication of infinite software: Components will kill pages.
Previously the ideal equilibrium chonkiness of software was the app.
But with software being orders of magnitude cheaper to produce, the equilibrium size of chunks of code will get smaller.
Perhaps much smaller.
OpenClaw is Mosaic.
Early, rough, points the way to the future, but not itself possibly mass market.
What will be the Netscape of this new era?
Who will be the Costco of the age of AI?
OpenClaw gives you your own assistants.
Similar looking services from companies just loan you out their assistants.
Superficially similar; fundamentally different.
When push comes to shove, whose goals does the assistant prioritize?
Manus has many of the features of OpenClaw but seems underwhelming.
Part of it is that it’s owned by an aggregator.
“How is this service tricking me into taking myself hostage?” is the undercurrent.
If that same service were open source and fully local you might be willing to use it.
But if you know it’s owned by an aggregator, then it has to be so obviously great you couldn’t imagine not using it.
That’s a high bar to clear!
There will be a similar challenge for OpenClaw at OpenAI.
“Updates to privacy policy” only really move in one direction: against the consumer.
This week I got an email from OpenAI with the subject: “Updates to our privacy policy.”
My immediate response was “Good, they’re going to screw me over in some subtle new way that is obscured by corpspeak.”
But this one wasn’t subtle: it was about adding ads.
An NYT OpEd: OpenAI Is Making the Mistakes Facebook Made. I Quit.
Most software business models make useful software so users store data with them and then the user has to rent it back.
This is extractive and gross, and downstream of software being expensive to produce.
It’s also downstream of the same origin paradigm.
We are all renters in our digital lives.
Silos hold your data hostage and then rent it back to you.
Data is sliced up and oriented around which business finds it useful.
It should be oriented around you.
Cloud providers stake their reputation on not peeking at data they say they won’t.
That typically aligns incentives nicely.
Real time traffic is powerful but it requires giving the full data to Google.
Google doesn't peek... but it could!
Google Search packages up the crowd intelligence of users’ anonymous behavior into a powerful signal and sells it back to users.
But that's the user's data!
Why do they get to charge us for it?
Because they're the only ones with that data, because of the same origin paradigm.
If everyone had it then the algorithm would lose pricing power and be more competitive.
The hyperscalers want to see the world as a flat sheet of glass.
The real world is fractally complicated.
In the future, AI assistance will be like clothing.
Without it people will feel "naked.”
In the same way that gen-z people feel like a phone call is naked relative to text messaging.
The Jeeves and Wooster trick: it seems alive because it acts rather than responds.
Jeeves is the valet.
He anticipates the master's request and does it before he even asks.
In the show, it’s presented as a joke: he’s uppity because he does it without asking.
But if he were just a few seconds later he’d look obedient.
The illusion of agency.
Privacy nihilism leads to less private outcomes in the world.
People who care about privacy as an end sometimes have infinitely high standards for it.
That means that even a company that does an order of magnitude better privacy for a given class of feature still isn't good enough.
The company went out of their way to make their feature way more private.
But the privacy nihilists still give them crap for it.
The other users don’t care… and maybe miss the extra functionality that had to be removed to make it private.
A lose lose.
Companies just say, “screw it, I might as well do it the less-private way if I’m going to get dinged either way.”
What would Jane Jacobs think of modern surveillance?
She talked about "eyes on the street" as a pathway to safety.
But that's about situated people in the community who have shared interests.
The ick of surveillance comes not from recording but asymmetry of access and without a way to audit who can access it.
It doesn't matter if the front door is locked, if the back door is not only open but there's no back wall.
As a developer, data should be a liability.
That is, PII should be something you don’t want to have.
A system that allows you to create value for a user from their PII, without ever seeing it or being able to send it somewhere it shouldn’t go, would be valuable.
In the Chatbot form factor the chat is a party trick.
“Look, it talks just like a person!”
In Claude Code, chat is merely a natural-language way to accomplish the end you actually care about.
The chat is incidental.
This works for Claude Code… but also for any wiki, accretive style use case.
For example, just attaching it to a writable vault of notes.
Anything that is accretive of durable ends.
Those are the form factors that enable the potential energy of LLMs to blossom best.
The right context is contextual.
If you include the wrong context at the wrong time, it makes things less useful.
Any system that tries to automatically store memories and retrieve them in the right context can come across as schizophrenic when multiple contexts overlap.
It’s such a weird feeling when I ask a chatbot a question about the business and in its answer it includes my kid’s name.
Ick.
A position of "It's OK to write code with AI but it has to be reviewed by humans” is untenable.
Code Review is excruciating cognitive labor.
If you try to review every line of AI code you will go crazy.
It's not possible.
LLMs can produce code so quickly the only way to tackle it is to use LLMs to review it.
Chatbots give everyone some of the problems that previously only the ultra-wealthy had.
For example: “Money Disease.”
This is where the ultra-wealthy no longer have any dissent that keeps their thinking sharp,
This leads to them being disconnected from the ground truth and, due to their leverage, possibly dangerous.
When you’re surrounded by an endless stream of sycophants, it’s impossible to not be lost.
With chatbots, now all of us can have money disease… but without the money!
A nice thread about tool-shaped objects.
A performative tool.
The feeling of productivity, hiding an empty interior.
People sometimes say they like ChatGPT as a therapist more than their real therapist.
But that's at least partially because sometimes your therapist says things you don't like!
New products can have two overlapping distributions of use.
The first is the “wow” reaction.
It demos charismatically, everyone gets a wow moment within seconds of opening it.
Something that wildly exceeds their expectations, quickly.
That exceeding of expectations makes them want to share it.
This has a spike of usage, but after everyone has tried it, this type of usage dies out.
The second is the “this is useful” reaction.
As you use it more you realize that this isn’t just flashy, it’s useful.
Your own use accumulates and it becomes indispensable.
You’re more and more likely to recommend it to others.
This has a compounding shape.
Staying power that accretes.
The two types of usage are different and complementary.
At the very beginning it’s hard to distinguish the shape of the usage.
TUIs are not the future.
It’s just that agents are so powerful that TUIs are for the first time Good Enough in many cases.
But browsers protect us in ways we often forget about.
We're so used to our browser detecting homograph URL attacks that we won't realize when the terminal doesn't.
That is, letters that look the same but are semantically different, allowing attackers to have domains that look legit but are actually dangerous.
This library helps do the same thing, but most TUIs won’t even think to do it.
You can get nourishment out of the land in a short-term extractive way or a long-term sustainable way.
In the short term they look the same.
Only the people with long-term ownership of the resource will care to distinguish the actions in the short term.
Everyone else will do the easy thing and try to avoid seeing the indirect effects.
Hyperscale often creates negative externalities.
When everything is explosively growing, you don't care about externalities.
There’s no time to even think about them.
All you can think of is staying on the bucking bronco.
Hyperscaling creates vortexes of value.
Scale leads to transactionalism.
All optimizing systems turn people into NPCs.
They have to be anonymous to you in order to reason about it at scale.
The more you don't expect to see the same people again, the more they are an NPC and it's easier to think about them as their transactional role, as opposed to the whole person.
This leads to a hollowing out in interactions with other people.
Hollow things become rotted out.
An inauthentic process extends and animates them.
Their intrinsics, their internals become hollow.
Swarms of agents will have even stronger Goodhart’s law than swarms of humans.
Agents are told to optimize one thing, and by god they will do it.
Humans will think, "well I have a reputation to uphold" or "this game is an iterated one, so I should balance short-term and long-term."
Individual humans feel bad when they fall into a Goodhart's law attractor state.
But agents don't, if they are doing what they were told to do.
Agents don't feel shame, or can very easily be made to not feel it.
It's quite a bit harder for (non-sociopath) humans.
When does something become a swarm?
When no single individual is distinguishable or matters.
All that matters is the collection, the swarm.
The edge of the query stream is the frontier of quality.
By systematically sampling the queries that don’t get a good response, you can see the use cases the users want to work, but don’t.
If you can make those situations work, you can steadily improve the regions of good quality results.
Google Search for a long time was surprisingly simple internally.
90’s era tech that stayed powerful for multiple decades.
That’s because if you have an emergent, evolutionary data stream that you can close over, and very basic, generic rules to extract and distill it, as the underlying data source evolves, the output gets better, automatically.
In complex systems, often shockingly simple rules produce extraordinarily valuable emergent results.
Optimizing for stability can lock in bugs.
Stability is about not changing things that higher layers depend on.
But sometimes there’s a semantic bug that things depend on.
Those can’t be changed without breaking those dependents.
For example, the Referrer header, which is famously misspelled.
So they’re just left in place.
LLMs are great at rewriting software.
So now in a system that’s being distilled, instead of locking it in, you can easily rewrite the higher layers, meaning you can go further before having to lock in bugs.
Zero days are so valuable and are only burned on high value targets.
They aren’t used on low-value targets.
This creates a rain-shadow effect where higher-value targets protect lower-value targets.
Even if a zero day exists, are you valuable enough for the owner to burn it on?
If you’re creating an open system, the actions of your users should surprise you.
If they don’t, then the overhead of an open system isn’t worth it.
That surprise can be a powerful emergent force, but it has to overcome the inherent messiness of open systems.
What makes an idea have an additional ply?
As in, go from being a one-ply to a two-ply idea?
It's a question that the majority of listeners would not understand with 30 seconds of exposition.
That in practice is a huge hurdle to clear, because everyone is so busy and doesn't have time to consider a surprising idea for longer than a few seconds.
After that they just give up and say "Screw this inscrutable noise I have important things to do."
That means that multi-ply ideas are orders of magnitude harder for people to grok.
This can be bad if you need a lot of people to grok it.
But if all it takes is a small team grokking it, and producing disjoint value that is then obvious, it can change the world.
The multi-ply mechanism doesn’t need to be grokked by others, if the result is obvious.
This week I learned that Obsidian has deep links.
When linking to another page, you can type a “^”.
The autocomplete drop down will then show you all of the paragraphs in that other file.
When you pick one, it will add a deep link anchor to the end of that line, e.g. “^ac3b”.
The anchor is small and unobtrusive; if you edit the text around it (like copy/pasting) it’s clear to include it or exclude it.
This is an elegant system that naturally keeps deep links working even as they move around a file without littering the file with lots of anchors.
A rough mental model of Reinforcement Learning curriculum.
You want to give examples at the right time to the model as it’s learning.
If the example is too hard too early, it confuses the model.
You want to stay in its zone of proximal development.
Generally you want to segment training by difficulty.
At the beginning you have lots of easy and few hard examples.
Then shift the mix based on your surprisal.
A multi armed bandit optimization.
“I don’t think hard would work… oh it does, add more hard in now”.
“I do think this easy one works so don’t include it. Oh it doesn’t? Increase the mix of easy ones.”
Surfing along your edge of surprisal.
An ad hoc change management process can break a team’s OODA loop.
An intentional change management process helps make sure that the team keeps laminar flow and doesn’t get into turbulent flow.
When a team is in turbulent flow, the shared OODA loop diffuses.
It takes focus and stability to get a coherent OODA loop to distill.
To take a team from default diverging to default converging.
A rule of thumb to organize a system:”Put like with like.”
That is, put things together that are similar.
Over time this makes the system more tidy.
A tidy system is about reversing entropy.
Pretty code is often good code, because it’s tidy.
Making the system work more like how you think it works.
Make the system more what it wants to be.
It’s fascinating watching a new complex codebase congeal on a team.
At the beginning in a code base everyone touches everything.
But over time people sort to the part they like best and it auto grows from there.
If they don't actively dislike that area enough to switch, they stay with it by default.
New people don't go into areas that are already well owned by default.
These are rules like “stay in alignment with the team’s goals, avoid crowding neighbors, make sure every bit of code is covered.”
You might recognize these as the three rules of bird flocking behavior.
Emergent order from three simple rules.
Beautiful.
I get excited about the potential of things, even if today it’s only good enough.
I'm used to "of course you survive to see the potential blossom".
That is the default in post-PMF environments.
Someone used to pre-PMF environments will like things that are very good / great today and don’t care much about potential.
Because you have to survive long enough for the potential to blossom.
In a pre-PMF environment, you’re default dead.
Of course in that situation you can’’t survive on potential alone.
A competent proactive person who is misaligned can do a lot of damage in a pre-PMF team.
The team still doesn’t have a coherent, internally stable direction.
That means the direction a competent person wants to go into could pull the company off course.
This can happen easily with sales people.
"I landed this big fish, are you really going to say no??"
A rule of thumb for deep listening.
Don’t respond with a “here’s how that thing you said fits into my world.”
Instead, ask more questions about their world.
Does your conversational turn bring it to your world or let it stay in their world?
Unfortunately, it’s easier to go viral when you lie than when you tell the truth.
That’s because things go viral when they're surprising.
Especially when the surprise is in a “I told you so!” direction.
Lies are easy to make surprising and aligned with what the listener wants.
Another example of a fundamental asymmetry that hollows things out.
The faster society goes, the more this asymmetry gets momentum.
Games have different rules than reality.
When you realize you're playing a game and the physics are only a simulation, lean into the way the simulation is different from the world.
That's where people will continue to assume it works the normal way but it actually works the different way.
That's where the leverage is.
The first step is to realize you’re in a game.
One of the reasons the boy in The Matrix can bend the spoon is because he realizes there is no spoon.
The components of the ground truth can exist in a group but still be obscured by various weeds.
Weeds like empathy issues, personality mismatches, power dynamics, etc.
A system to do the cognitive labor to clear those weeds can make the truth discovery process much more effective.
It's never urgent to plant a tree.
But it’s quite often important.
An idea from stoicism: “in order to be harmed you have to believe you’ve been harmed”.