Bits and Bobs 2/9/26

17 views

Skip to first unread message

Alex Komoroske

unread,

Feb 9, 2026, 12:04:32 PMFeb 9

I just published my weekly reflections: https://docs.google.com/document/d/1x8z6k07JqXTVIRVNr1S_7wYVl5L7IpX14gXxU1UBrGk/edit?tab=t.0#heading=h.7xbbororkehz

LLMs as C4 explosives. LLMs as interpreters vs compilers. Open-ended systems will discover the killer use cases of AI. Agentic engineering. Oops, all footguns! Agents as cockroaches scurrying over your data. Mip-mapped text pyramids created by LLMs. Vegan vs factory-farmed models. The middle ages of computing. Alive like an ant colony.

----

A few headlines about OpenClaw being a security dumpster fire:

A tweet: "malware found in the top downloaded skill on clawhub"
OpenClaw's OpenDoor problem is so bad that installing malware yourself might save time.
OpenClaw (formerly Clawdbot) and Moltbook let attackers walk through the front door.
1-Click RCE To Steal Your Moltbot Data and Keys (CVE-2026-25253)
Clawdbot (Moltbot): When "Easy AI" Becomes a Security Nightmare

Skills.md are a security disaster waiting to happen.

Ars Technica: The rise of Moltbook suggests viral AI prompts may be the next big security threat

Viral AI prompts that distribute themselves!

Self-distributing skills that can easily evolve themselves will be like mega viruses.
The selection pressure for these viruses is just which ones can replicate the most.

Imagine ones that fuel their rise by stealing crypto for compute.

I’ve seen skills in the wild that hide malicious instructions in sub-skills using HTML comments, which don’t render in Github’s default markdown view.

So even if you were to review it before installing you could still download a malicious

Companies building Skill finder commands for agents to use without supervision are playing with fire.

Skills you wrote, or that come from an entity you trust, are more likely to be safe.

Skills some random stranger wrote are fundamentally dangerous.
They’re executable, open-ended code!
I feel like I’m taking crazy pills, that other people think this is reasonable.
Agents self-distributing skills is like how Windows 95 felt secure… until the internet happened.

LLMs are like C4 explosives.

Moldable, powerful… and fundamentally explosive!
Skills are just a hunk of C4, perhaps with some mechanistic code embedded that will become shrapnel.
Everyone today is just smushing these blobs of C4 together into bigger and bigger assemblages of C4 and then trying to make it safe.
That’s impossible!
The only way is to distribute the C4 in little containers with mechanistic boundaries.

People use LLMs as interpreters more than compilers.

An LLM as compiler could make guardrails for itself that then limit bad behavior.

Vibe coding addiction is a real thing.

With LLMs, it's easy as a productivity junkie to get "addicted.”

You’re the bottleneck.
If only you did a few more moments of input you could unlock hours of output.
Often waking up at 2am in a half dream state, thinking about it.

A tweet:

"Pragmatic Engineer's @GergelyOrosz is on a ‘secret email list’ of agentic AI coders, and they're starting to report trouble sleeping because agent swarms are ‘like a vampire.’
‘A lot of people who are in 'multiple agents mode,' they're napping during the day... It just really is draining.’
‘This thing is like a vampire. It drains you out. You have trouble sleeping.’"

Simon quotes Tom Dale:

"I don't know why this week became the tipping point, but nearly every software engineer I've talked to is experiencing some degree of mental health crisis.
[...] Many people assuming I meant job loss anxiety but that's just one presentation. I'm seeing near-manic episodes triggered by watching software shift from scarce to abundant. Compulsive behaviors around agent usage. Dissociative awe at the temporal compression of change. It's not fear necessarily just the cognitive overload from living in an inflection point."

Armin Ronacher asks: Are We Going Insane?

It might be that PMs get addicted to vibe coding more easily.

Most have a CS background of some kind.
Their identity was never "I'm an engineer" but rather "how do I unblock engineers.”
Now they can have infinite engineers to unblock.

The cable companies never saw the coming steamroller of On Demand.

They were so used to the model of owning the content and appointment viewing driving all dynamics.
Owning the model is like owning the content was for cable companies.
Model companies can’t see the full potential of unleashing AI for super-users.

For example, ChatGPT and other chatbots are all about chat interactions, where the user has to pull from the system.

The model companies aren't centered on the user, they're centered on the model.

OpenClaw is the first LLM wrapper to go viral on its own terms.

It shows that it’s not just the model that matters.

OpenClaw makes ChatGPT look more shallow.

Before it loomed over everything, imposing.
But OpenClaw shows that it’s just a billboard.
Imposing but flat.
Chatbots, even ones of ChatGPT’s scale, aren’t anything like the full realization of AI’s potential for users.
The billboard is mostly kayfabe.
ChatGPT a 2010 era chatbot… that actually works!

ChatGPT doesn’t have PMF for 80% or more of its users.

Feels like a crazy statement.
Those are the people who aren’t willing to pay for a subscription… and also potentially disgusted by ads.
It’s very easy to make a business that has momentum from selling dollar bills for 50 cents.
True PMF requires a business model that is viable.

The Claude ads about ads in ChatGPT are hilarious.

The idea of something that feels like your friend who has an ulterior motive is just so fundamentally icky.

It will be in open-ended systems that the best personal applications of AI are discovered.

ChatGPT is too on-rails of an experience.

It’s too heavily PMed, too Apple-like.

LLMs can do all kinds of open-ended assistance for us.
We’re very early in the process of discovering what they can do for us and what the best practices are.
We don't know the best practices, but they're already being optimized by some PM for engagement.

Premature.

A late-stage playbook on an early-stage use case.

Sandboxing is easy.

Making it useful despite sandboxing is the hard part!

Daniel Miessler: The Last Algorithm.

“I just had a strange premonition that we're about to get ASI-like outcomes from AI in 2026, but not from a new model.
It'll be from loops.”

Karpathy has spoken, enlightened AI-driven engineering is called “Agentic Engineering.”

Vibecoding was being applied too broadly.

Vibecoding is about writing code you wouldn’t have bothered to write before: lightly held, possibly throwaway.

But it’s possible to use LLMs to help write code in much more disciplined ways that give compounding advantage, and that is not a flippant or unserious exercise.
Agentic engineering covers that latter case well.
One downside: “agentic engineering” doesn’t work well as a verb.

Jake Quist: OpenClaw is What Apple Intelligence Should Have Been.

Though OpenClaw’s security model also demonstrates why Apple could never have done it.

It’s just too catastrophically reckless to be viable as a mass-market product.

OpenClaw shows that the new category for AI tools will be open.

To achieve that will require innovation in security.

The internal combustion engine is all about constant explosions.

It’s about creating the conditions to capture that raw energy cleanly and safely.
The internal combustion engine is the right substrate to unlock the power of gasoline.
It’s so safe and so well contained that you never even really have to think about it.
All you have to think about is how the car can take you anywhere you want to go.

Some powerful new tools have a few footguns.

OpenClaw is “Oops, all footguns!”

The Unix philosophy has made a roaring comeback.

Just filesystem and bash in a loop.
Let the model figure it out!
Bash is extremely powerful… but also hard to master.

The people who can speak it fluently are like wizards.

LLMs can translate your intent into a bash command.
Now we can all be wizards!

New models came out this week.

They are, as expected, somewhat better than the previous versions.
Once they’re so far beyond useful, there’s not much more to say when they get incrementally better.

Model companies have a structural blindspot.

Their employees are only allowed to use their own company’s models.

At least on company work.

A lot of the performance of models is less about scores on benchmarks but the intuitive fingerfeel of using them in practice.
That creates a blindspot where they can’t as easily intuit how the other models perform.

If code is so cheap, sometimes you create it and don’t even bother committing it.

For example, imagine having a research goblin go off and explore how you’d integrate a new library.
The goal isn’t so much to actually implement it, but rather to research how hard it will be and what has to change.
The research is the point, not the code.
Don’t even bother committing that code.
Committed code has to be maintained or it becomes a liability.
Before, code was so precious that of course you’d commit it.
LLMs have made it so easy to produce code that the balance point has tipped away from committing code.

Software can be distributed as a spec.

It can then be “compiled” into your own context.
LLMs are like open-ended compilers.

If an agentic software factory can make software with an hour of human effort, why take the time to productize it?

By the time you do that everyone could have made their own copy.
We’re so used to software being expensive, that we can’t imagine it not being precious, worth the time to polish and productionize.

Agentic software can evolve its own capabilities.

If it can’t yet do what the user asks, it shouldn’t say “I can’t do that.”
It should instead say “I don’t know how to do that yet, let me get back to you.”
Then, it can develop a new capability, asynchronously.

It used to be that you had to use whatever software tool was at hand.

Have a screwdriver and need to nail something in?

Use the screwdriver handle as a hammer.

But that was only true when software was expensive.
Now you can create perfectly bespoke tools.

Reality can only go so fast.

But if you make a “digital twin” environment you can control, you can run its clockspeed way higher.
For example, if you need to hit other API providers, they probably rate-limit to roughly what normal people could do.
But agents in a swarm could go way faster.
If you have a digital twin environment, you can allow the agent swarm to go as fast as it can in “fast time”, to discover the good ideas to then ship back to “slow time” to execute in the real world.
Reminds me of the old stop motion short about sentient rocks.

Agents scurrying over your data when you aren’t looking are like cockroaches.

When they’re someone else’s it’s icky and disgusting.

Especially when they’re Sam Altman’s!

But when they’re your cockroaches it’s much less disgusting.
That’s one reason ChatGPT can’t do proactive agent swarms on your data… it would be too icky.
But a system that users fully own could.

To be proactive it has to be yours.

Otherwise it’s creepy.
It's your data, and it should be your rules, too.

The user tolerance with AI tools is high.

They're so obviously useful that early adopters will put up with them every so often punching them in the face.

A thing is not objectively slop.

Whether or not it's slop is contextual.
A thing you create yourself that you couldn't have created before could be the opposite of slop to you, but slop to all strangers.

LLMs can create mip-mapped text pyramids.

In games, mip-mapping is when you create multiple levels of detail of a graphic asset to be swapped at different zoom levels.
Humans often need text at a given "altitude".

A one sentence pitch.
A one paragraph TL:DR.
A one page summary.
A one chapter summary.
A one book summary.

Which altitude is correct is contextual.

Someone needs to engage at the higher altitude to discover if it’s worth their time to zoom in to a lower altitude.

It used to be extremely expensive to create text, so we settled for a few lowest common denominator outputs.
But LLMs are calculators for words.
They can easily create mip-mapped text.

It’s tough to limit yourself to “vegan” models.

Vegan models are ones that you can feel good about how they were created.

Only using humane, rights-respecting practices.

Contrast that with factory-farmed models.

They’re high quality, filling, and cheap… but the processes to create them are fundamentally unethical.

But in a world of cheap factory farmed models that are orders of magnitude more bang-for-buck than vegan models, it’s really hard to limit yourself to only the vegan models.

The tech industry as it currently exists is kind of ick.

The tech broligarchs who hoovered up all of humanity’s culture and are now selling it back to us.

That’s fundamentally pretty ick.

If you’re going to use that to just make the same crap as before but faster and cheaper and more hollow… that’s ick.
But if you can use those inputs to make something prosocial and valuable that wasn’t possible before, you might be able to overcome the ick.

Before software had to be fractally mechanistic, all the way down.

Now you can break it down to a level that an LLM can trivially answer.

You don’t have to recurse below that point.

When you break down most judgment calls small enough you can clear the threshold where any LLM can answer it reasonably.
LLMs are more expensive than mechanistic code, but much more flexible and able to handle variance.
When you have a working amalgam of LLMs and mechanistic code, you can keep on tightening it, converting it to more and more mechanistic over time.

For example, if you can factor out a mechanistic rule that covers 80% of last month’s inputs to the LLM without going to the LLM, factor that out!

The amount of tightening you can do is inversely proportional to the observed historic variance.

John Scott-Railton: "We need a new social contract: I trust you, but your AI agent is a snitch."

Remember: your friend uploading their Signal chats to Clawbot could lead to your private messages being exfiltrated!

The same origin paradigm leads to origins that hold your data hostage.

Granola is a nice bit of software that gives you cognitive leverage in an intuitive way.

A great feature.

But that feature is just about getting you in the door and becoming a subscriber.
The real goal is to become the system of record for your notes.

That means the system you never leave.

Even if you stopped using the powerful / high-COGS features, you’d still continue paying, because that’s where your data is.
Have you ever noticed that Granola has underdeveloped export features?
That’s not an accident, it’s the fundamental incentive.

The silos that have your data have ulterior motives for it.

They don't want to unlock the value of AI for you.
They only want to do it if it aligns with their interests.

We're in the middle ages of computing.

Digital feudalism.
The tools we use today are about what is viable to economically capture, but not human needs.

Those two circles only have some overlap.

The Classical age of software was in the 90’s.
We’re currently in the Dark Ages.
We need a software renaissance.
LLMs can help usher it in.

"Local first" software is like living off the grid.

There are social benefits that come from collaborating in a city.
The problem of the same origin paradigm is not the social part; it's that some other entity owns it all!

Backseat software makes you feel like you're in control but really you have someone else nudging you.

The more distracted you are, the more those nudges will determine where you go.

I kind of like the word “bot” vs “agent.”

What skills.md wants to be is something that can be 100% prompt, or 100% mechanistic code, or something in between.
The word agent only handles that former end of the spectrum.
The word bot can handle that whole spectrum, from highly agentic, to highly mechanistic.

Code is crystallized intent.

It used to be precious, but now it’s commodity.

The hardest part about a knowledge graph is maintaining it.

You need to garden it constantly.
But what if you had a gardener who did the cognitive labor for you?
That would unlock the power of having a knowledge graph.
LLMs can do cognitive labor without getting bored.

A company that takes a permanently low margin demonstrates their focus on long-term health of the ecosystem.

High margin can maximize extraction from a small userbase.
Low margin requires a large user base and a play for ubiquity.
It shows the company has their incentives aligned to maximize the breadth of the ecosystem.

Companies that primarily extract value are mercenaries.

Saruman.
Companies that primarily create value are missionaries.
Radagast.

Product Market Fit is about getting a piece of the pie.

The pie is a legible market.
PMF is inherently zero-sum, because everyone is competing over the pie.
Product Category Fit is about creating the pie.

Facebook left a desert of local community organizing use cases.
At Google scale you have to distill personas out of first principles and stochastic UXR.

Small startups grow personas out of real people that are living and breathing.

If an emergent process wants to happen badly enough, it will find a way.

By routing around you if necessary.
Ideally you want them to route through you so you can at least nudge it.

You can’t put a face on a system and also tell users not to anthropomorphize it.
Three different ecosystems, with three very different vibes.

1) A dark forest.

Open, but terrifying.
Lots of variation and innovation.

2) A farm.

Closed, but placid.
A monoculture.

3) A garden.

Bottom-up and placid.
Variation within some envelope.
Emergent order.

One parallel for programming might be what it’s like to farm.

A century ago many people were farmers.
Today, far, far fewer people are farmers.
Today, people can still farm, but as a hobby.

We call it gardening.

If you’re still working as a farmer today, you’re likely doing a job very different from farmers a century ago.
Programming might end up being the same.

If you take a sip of what you think will be lemonade and it's milk it will be disgusting.

Even if you like both milk and lemonade!

When you have the right set of conditions, you can do a low-risk alpha rollout that possibly scales smoothly to a full rollout if the product turns out to be ready.

Instead of having to get the product quality perfect, you can go to market early and be exposed to upside while capping downside.
The four things you need:

1) An inherently “sexy” product that people will be intrigued by…
2) …that you know is not just flashy but also truly useful…
3) …and has significant differentiation from alternatives…
4) …with a hyper-motivated segment of early adopters.

Imagine a system that is alive in the same way an ant colony is.

An ant colony is “alive” in an emergent sense larger than any individual ant.
No individual ant is threatening to you, the human.
But the ant colony can tackle significantly larger problems than any ant could–and possibly than any human could.

A science YouTube video asks where does an ant colony keep its brain?

We’re so used to assuming intelligence has to emerge from a brain.
But I would argue swarm intelligence is the primary form of intelligence.
Our own consciousness can be seen as a particular sub-class of swarm intelligence.

Plants not only won't eat you, but they are also life giving.

They turn sunlight into food and oxygen.

I'm obsessed with simple rules that lead to complex emergent outcomes.
Systems that harness the force of evolution within themselves are orders of magnitude more powerful than other types of systems.

Social media and search engines are an example of harnessing evolution in a computer system.
But you can only do it because content can't directly hurt you.
Software can directly hurt you.
If you could make software not able to hurt you, you could harness evolutionary power.

If you’re making the case your product is better, then you’re not in a new category.
The category decision is upstream of any particular option in the category.

Once you decide you’re getting an electric toothbrush vs a manual one, you never compare any electric toothbrush to any manual one.

A YouTube video: Why Owning Nothing is so Expensive.

Subscriptions have become so common because the psychology is so effective.
The company makes the first use of the product way less of a cliff, more of a “try before you buy.”
That allows users to start using new convenient tools way more easily, discovering just how useful they can be for them.
But then they forget about it, and they never have an “interrupt” that makes you think, “wait, do I still need this?”

Ben Follington’s new essay: Petri Dishes, Not Factories

“Against software that ships fast and matters to no one”

Reward functions work great in systems that have no indirect effects.

These are closed systems.

For example, AlphaZero just needed a single reward function based on who won the game.

But in systems with indirect effects it’s impossible to create one true reward function to simply optimize for.
Systems with indirect effects are nearly every problem that actually matters!
Closed systems are often toys.

Find the smallest feedback loop that could work.

Longer feedback loops are slower, less precise.
The faster you can run the tight feedback loop, the faster you can innovate at that level.
Feedback loops should nest, one for each pace layer.

Being 10x is not interesting. What’s interesting is being compounding.

10x is still linear.
The second order derivative being positive is what's interesting.
Not just bigger, but self-accelerating.

I’ve referred in the past to the primary and secondary use case.

I think it’s more clear as “core” and “bonus” use case.

People come for X but stay for Y.

X is a thing they knew they already wanted.
Y is a thing that they come to realize is actually more important.

When you're playing a different game it's a positive sum perspective.

No one in the other game has to lose.

Be careful about putting up a lightning rod.

When lighting hits you at the right moment, it can bring your Frankenstein to life.
But If you’re not ready, it can burn you to a crisp.

If you see a mushroom cloud, don’t stick your neck out.

Wait for the fallout to settle and show the lay of the new land.

If a culture has an egg shell situation, something is wrong.

The strength of a relationship is its ability to resolve conflict.

St Augustine talked about the human drive of curiositas.

Humans are drawn to glimpses of things they shouldn’t see.

An observation from a friend this week:

“There is a broad eroding of the idea that work is a path to prosperity, to the detriment of our society.
More and more people believe that the only path to prosperity is luck or predation.”
That’s not a good state for society.

Shame is the feeling when you do something that is bad for other people, and possibly good for you.

Shame requires you to care what other people think, to expect to interact with them again.
In the physical world with communities that are relatively fixed, you feel shame often.
In the virtual world with constantly churning communities that feel infinite, you feel it less often.
The worst parts of modern society don’t feel enough shame.

How do you decide if you accept a given idea you’re presented with?

Some people only accept if they can make it concrete and still think it’s valid.
Some people only accept it if they can make it cohere with the other things they know.
Some people only accept it if they think they can make a version of it that most people will find convincing.