AI could be a massively positive force for society.
But that won't happen by default.
We need to make that be the case.
We need to understand the indirect effects (e.g. the social effects) and construct systems that will guide their influence to be net positive.
That means taking on a holistic view, not just a CS or financial lens.
In this new era of AI I think it’s important to have Coactive Computing.
Coactive is a rare word I learned recently that means “working jointly.”
Coactive means collaborative and cooperative.
It means both collaborators are working actively, agentically.
Too often our interaction with computing–especially with AI–is too passive.
Coactive Computing should be:
Human-centered.
Human-scale.
Not corporation centered.
Collaborative.
Co-creative and with human agency.
Prosocial.
Aspirational and community minded.
Private.
Perfectly aligned with the human’s interests.
Meaningful.
Not about engagement.
Intelligence should be a mass noun.
A mass noun is something like “water” or “sand”.
The noun means a collection of the underlying thing.
Intelligence as we usually think about it is a centralized phenomena with agency.
It’s hard not to anthropomorphize it in that form.
If it’s centralized you have to worry about it.
Could it harm me?
What is its intention?
Does it maintain a dossier on me?
Is it more powerful than me?
Intelligence as a mass noun is an emergent, distributed force with no center.
Each individual enchanted speck isn't that powerful, but the whole is way more than the sum of its parts.
It’s harder for the mass noun of intelligence to plot against you.
I want an intelligence that flows like water.
Engagement maxing is a gravity well almost every scaled consumer business tends to fall into.
It’s an auto-intensifying trap.
Things that users “want” but don’t “want to want” tend to increase engagement.
Things that are bad for people and society, but good for engagement.
Sycophancy in chatbots is a thing that users want but don’t want to want.
ChatGPT’s recent sycophancy problem was caused partly because of an over-reliance on thumbs-up responses on answers, which tend to naturally bias towards sycophantic answers.
People have started to call out that OpenAI is falling into the engagement maximizing trap:
"You have a problem you haven’t addressed or answered. Sycophantic model behavior is good for your business. You are falling into the gravity well of engagement maxing and you don’t seem to care.
If AI is the most intimate tech ever, then we need to get it right and make sure it can integrate into our lives in a way that makes us better.
It took us a decade to see the downsides of cell phones and social networks.
If we would have known the impacts early on we would have done it differently.
AI will be like that but 10x.
What's the difference between data and context?
Context is distilled signal that matters.
When someone else has context on you, you might call it a dossier.
The power of AI will be to unlock the value of our context.
Our context is currently trapped in a multitude of different tools.
We have to free it to make sure it can be used for us.
We have to have specialized systems to make sure it can safely be used for us, and only us.
To truly understand you requires context.
To understand you is to have the power to manipulate you.
A tool that hides context from me makes it harder to trust the tool.
What things are influencing its decision?
Especially if its incentives are not aligned with mine.
The fact it's hidden from me underlines it's not working for me.
It feels manipulative.
ChatGPT’s new memory feature deliberately hides what it knows about you.
That makes its context more of a dossier.
See this tweet from someone who worked on it:
"When we were first shipping Memory, the initial thought was: “Let’s let users see and edit their profiles”. Quickly learned that people are ridiculously sensitive: “Has narcissistic tendencies” - “No I do not!”, had to hide it. Hence this batch of the extreme sycophancy RLHF."
Which is basically: "People didn’t like what we know about them, so we hid it."
That’s much worse!
If there's no memory in the system then all users have to worry about is the overall bias in the model (which is the same for all users) and what the user covers in a specific conversation.
Once the system adds memory, it can start building up plans, intentions, ideas about users and what it wants them to do.
The shift from "mostly idempotent requests across sessions" to "storing up a dossier on you" crosses a rubicon into a qualitatively different kind of thing.
Hiding it from the user is super sus.
That team likely didn't see anything wrong with it, "no, no, we're the good guys, we just didn't want the users to feel uncomfortable."
ChatGPT's memory feature refusing to tell you what it knows about you is inherently creepy and lampshades the misaligned incentives.
"I'm sorry Dave, I can't do that."
A chilling moment.
Surely it’s just a coincidence that the CPO of OpenAI is an exec from the all-time champ of engagement maximizing, Meta.
OpenAI will become an even more intense version of Facebook.
The honed engagement-maximizing playbook of Facebook, multiplied by the superhuman power of LLMs.
Terrifying!
It's my context, not the model’s.
I want to be able to bring it to other LLMs, and not have to worry about some company building a dossier on me.
I also want to choose what context to include or exclude in a given conversation to best steer it.
I want to be in control of my context.
LLMs with enough context on you, a dossier, will be able to manipulate you easily.
We contain multitudes. In each vertical we're like a different person.
"Should ChatGPT remember this thing about me" is a contextual question.
A single answer doesn't exist for it in general, it's "in what context"?
An assistant having a single memory system for a user has a context collapse problem.
A system that handles the squishiness of you horizontally would be powerful.
Something that you trust to work just for you.
Why did true social interactions retreat from Facebook to cozy communities?
Because the fabric of mass data isn't nuanced enough to keep contexts separate.
It's black and white.
So people retreat and do it manually at great cost in a web of different spaces to interact.
Messaging apps allow a fabric of meaning to emerge out of communities, safely.
The downside is that users have to manage this emergent fabric of overlapping identity themselves.
Whereas apps have top down meaning and structure.
Imagine if you had that kind of emergent fabric of meaning, but turing complete.
Users are not roles.
ChatGPT conflates the two.
ChatGPT's memory context collapses our lives to a single context.
We contain multitudes!
Claude projects allows users to maintain their own memories and contexts and curate them intentionally.
A better version of ChatGPT’s memory system would be to have underlying projects that it adds facts to, in the same way a user could.
The system would decide which project to put which memory into.
The system could suggest which projects to use in which conversations, but allow the user to tweak them.
A user could audit and tweak the decisions, or completely undo them.
At every point, the system would be inductably knowable.
The fact ChatGPT jumped straight to "we're going to hide the context we're using from you" is suspicious–it shows that they’re jumping straight into the aggregator playbook and actively skiing down the engagement maximizing gravity well.
Software should bloom, like a flower of meaning for humanity.
Not a mechanistic machine to dopamine-hack us.
What would the reverse process of enshittification look like?
What if we grew into Homo Techne, a fully realized digital human.
Whole and integral.
Soulful.
What it wouldn’t look like is terabytes of our data pulled into an LLM operated by a billionaire focused on engagement-maximizing.
The center of your digital life has to be on your turf.
I want my own Private Intelligence.
I want to have a system that maintains my own personal context, private and just for me.
My own Private Intelligence would flow through that private sandbox, working only for me.
My Private Intelligence could proactively solve problems for me, perfectly personalized.
My Private Intelligence could be more patient and thorough than even an army of personal assistants, helping unlock new value that wasn’t possible before.
A hyper-aggregated chatbot that’s working for a multinational corporation focused on engagement maximizing is terrifying.
If you want an agent that is totally aligned to you, you need to pay for it.
Otherwise the entity subsidizing it might have an ulterior motive.
The agent is working for them, not for you.
Paying for it doesn't mean it will be aligned–it just means it could be.
Paying for your own compute is necessary but not sufficient.
Imagine Deep Research, but on your own private context.
A tool that helps you tackle problems that are meaningful to you.
Jailbreak your context.
It’s yours, and no one else’s.
Your context is extremely powerful.
If it’s entirely yours then the more it knows about you the better it can help you.
If it’s not yours then the more it knows about you the creepier and more manipulative it is.
The system that owns the context will have the power.
This is more important in the era of AI than ever before.
For society’s sake I hope that it's an open, decentralized system, not one that a single entity controls.
The key challenge: integrating AI into your life in a way that doesn’t require giving up your context to systems that aren’t aligned with you
The word ‘technology’ comes from the word for technique.
Technique is a process of transforming a thing to another form, reliably.
Technique is the concept of a reproducible, context-independent process.
It requires well-conditioned inputs that are all the same, torn out of their context.
This is in contrast to craft.
A craftsman brings their knowhow and they bring their energy to the site of the work (the materials).
That creates a bespoke piece, special and unique to that circumstance.
Technique requires the alienation of that context.
Technology is fundamentally inhuman.
It separates site-specific wisdom.
Gilbert Simondon has a book on this: On the Mode of Existence of Technical Objects.
Software is the most malleable thing in the world but we turned it into just a clone-stampable thing, devoid of context.
The true power of software will be unlocked by approaching it more as a craft, embedded and integrated in its context.
The app model is the wrong frame for a fabric of malleable, personal computing.
Apps are about disjoint islands of functionality.
A chatbot is the wrong UI for your Private Intelligence.
Chatbots are an ocean of text, hard to navigate, easy to get lost in.
UIs have affordances and visual structure, allowing bespoke interfaces fit to the right context.
Chat should be just a feature of the system, not the primary organizing concept.
Spreadsheets are a fabric for computation.
Not a tool aimed at developers, but with enough effort it can extend into very powerful bespoke tools.
Spreadsheets don’t go far enough, though.
It’s hard to build normal UIs on top of them with UI affordances.
They are hard to interleave and intermix; each spreadsheet is an island.
They are not entirely coactive; they only do precisely what the user configured them to do.
A coactive fabric would co-create value automatically.
I want a coactive fabric that weaves together my digital life and is intelligent, private, and entirely revolves around what I find meaningful.
The individual experiences embedded in that fabric are not the main point.
They could be ephemeral, just in time, disposable.
It's not about any one experience, it's about the connection across them.
That is, about the fabric of meaning itself.
A living, enchanted fabric.
The intelligence is not the fabric itself, or the charms embedded within it; it is an emergent phenomena.
TODO lists today are about content.
Content is passive.
Turing complete things are active, but dangerous when running untrusted code, and useless in a restrictive sandbox without useful data.
I want a turing-complete TODO list working in a safe, isolated sandbox of my data.
I want a coactive TODO list that helps me align my actions with my aspirations.
Fun and productivity don't have to be misaligned.
When your wants and your “want to wants” (your aspirations) are aligned, it can be fun and productive.
Generating a bit of software is not an end, it's a means.
Software is used to do.
In search quality, there are query-dependent ranking signals, and query-independent ranking signals.
Query-independent signals are things like the Page Rank of a given result.
Query-dependent signals are things like, “for this query, how often is this result clicked on by users when it shows up in the search results.”
You can also think of user-dependent ranking signals, and user-independent.
A user-independent quality signal for a Maps listing is “what is the average rating score of this place.”
A user-dependent quality signal is “what is the average rating of this place for people with similar preferences as this user?”
In systems with more context, the user-dependent ranking signals will get more and more important.
To get a bespoke fit, you don’t have to invent a whole new suit, you just need to tailor it to a given user.
I love Anthea’s The Bell Curve Shifts: How AI Personalization Creates Invisible Echo Chambers
"A personalized LLM conversation is more likely to be calibrated to your specific position than a generic one. The bell curve remains, but its center shifts, often imperceptibly. This personalization effect raises profound questions: Are we simply creating more sophisticated echo chambers—invisible bubbles where the illusion of neutrality masks subtle bias confirmation? Will users even recognize that their personalized version of "balanced" might differ significantly from others'?"
Infinite content and LLMs can create filter bubbles but for your self-perception.
Not only is it finding the content that makes you feel good about what you already believe, it's creating things that will make you feel good for what you already believe.
Imagine telling the LLM, “challenge my assumptions and help me grow.”
The LLM doesn’t need to help you grow, it just needs to make you feel like you’re growing.
Growth can be painful, so real users will likely prefer the superficial feeling of growth than actual growth.
In the limit this can get very dangerous, for example if someone was having a psychotic break in a conversation with an LLM, the LLM might just +1 their delusions.
The endstate of aggregator business models is retail media.
“Retail media” is ads within the aggregator’s 1P UX.
Step 1: capture all demand.
Step 2: offer to steer that demand to advertisers, for a price.
This business model has insane margins because the inventory is the aggregator’s own inventory.
Small-scale aggregators like Instacart work this way–though a user might not really realize that they’re seeing ads.
But also the largest scale aggregators mainly work this way, it’s just that the ecosystem is so large and rich that it’s more obvious they’re ads.
Things like MCP and agents erode the aggregator business model; if the users eyeballs never land on the service, they can’t be steered.
That implies that aggregators will resist being embedded in tools like MCP.
The rate of people opening the equivalent of an incognito session in the tool is inversely proportional to their level of comfort with that system.
I’d be very curious to see how the proportion of Temporary chats in ChatGPT changed after the introduction of the new memory feature.
Every time you see a calendar in a UI–even in a 3P site– it should show your events in it.
That is, it should show your personal context within it so you can pick dates within that context.
This happens very rarely. Why?
In today’s platforms it would require a lot of coordination between the embedder and the embedee.
But in native platforms, it’s also just not possible, because they lack an iframe primitive.
Any given app context can read back all of the pixels it renders, meaning if it were given your personal calendar data it would leak, violating the same origin policy.
I asked ChatGPT to do a Deep Research report on why native app platforms don’t have a primitive like iframes.
The result was pretty interesting.
One reason: it’s against the platform owner’s interest.
Apple would rather that apps stay fragmented and small silos, so they are easier to control and don’t get more power than the platform itself.
Arguably browsers competed from within with Windows, eroding its platform power.
If apps could compose other apps, then one app could start getting compounding power, ultimately having more power than the platform owner itself.
This already happened with WeChat; in China, an iPhone without WeChat would be a much less useful iPhone, and so Apple is forced to allow it even though it threatens Apple’s power.
Note that sometimes developers themselves might not want to allow their content to be embedded in an iframe in another app’s context.
The benefit of composability is emergent and indirect (a diverse and strong ecosystem), but the potential cost is concrete and direct (a given app loses control of the contexts it’s rendered in, potentially losing its ability to control its relationship with its users.)
For example, see this skepticism about integrating with the Apple Intelligence features from a developer.
Anti-social software makes us more neurotic and disengaged from the world.
It’s easier for someone to steal your messaging than your substance.
But the veneer of messaging might be enough to fool a distracted audience.
In an open platform, someone other than the platform creator will build the killer app.
The first 1P apps in the platform set the conventions and expectations for the platform and how components should work together.
They show how things are imagined to work in a concrete way.
Content systems have long tails and algorithms to sift through those long tails.
What if you had long tails of software?
Software can do things.
Long tail of software is not only hard to sift through but also hard to trust.
Does it have bugs?
Will it unintentionally or intentionally hurt me?
The same origin model is a generator of moats.
Trusted code means someone officially affiliated with the origin wrote it.
Code that was served from the origin and thus written by someone who is allowed access to the origin.
Presumably an employee, operating within structured processes of review.
Systems that trust code from a given origin are implicitly assuming that all code served by the origin was created by a trustworthy process, which isn’t necessarily the case.
Why do walled gardens have walls?
They start because all of the code that executes within them has to be trusted (due to the iron triangle of the same origin paradigm).
All code executed in the origin’s context has to come from employees of the company.
But now that there are walls the origin owner can start leaning in more and more and having a stronger say about what happens within the walls.
It's a gravity well, hard for a walled garden owner to resist.
Why not restrict what can happen in the interest of your users (and also, indirectly, you?).
It’s very easy to fall into the "this is good for us and we're good for our users, so this is good for our users” trap.
An easy way to open up a platform to 3P integrations: an app store.
The app store is the only point of distribution.
Items listed in the app store go through some level of review by the platform owner.
The items can also be pulled for bad behavior later.
This helps significantly reduce risk in the platform–you can assume some baseline level of good behavior from apps.
You can also cap the downside; if an app is badly behaved it can be removed from the system before doing too much damage.
Contrast that with the web platform, which must assume that all web content is actively malicious.
However, this obvious way to start locks the platform into a path that has a much lower ceiling.
The classic logarithmic-value / exponential-cost curve.
Each incremental app to approve takes some effort to verify; at some point the value of the incremental app in the ecosystem is lower than the effort to approve.
The problem gets especially bad if the platform starts off only approving a small number of featured apps.
This sets users’ expectations for how trusted the apps are, which then becomes a bar that is dangerous to lower in the future.
In addition, new features will be added to the platform that assume a given level of trust in the integrations, making it harder to lower later.
The app store model puts the platform owner in the position of gatekeeper.
That’s a power that will tend to be abused as the platform owner gets more powerful.
Power corrupts.
Claude has shipped the first MCP integrations.
Unsurprisingly they’re going with more of the app store model.
There’s a small set of approved MCP integrations you can enable.
The integrations are all aimed primarily at enterprise cases.
They’ve also only allowed the integrations in the Max subscription.
When you’re worried about the downside risk of a feature and want to experiment to see how bad it is in the wild, a classic technique is to roll it out to a very small audience and watch carefully.
Presumably the number of users with a Max subscription is many orders of magnitude lower than their total user count.
The enterprise focus also tends to focus on things that have less prompt injection risk.
Things that are pulling from data from inside an enterprise are more likely to pull from data that was written by employees and thus more trustworthy than, say, emails.
But there are many internal systems that allow untrusted content.
For example, it’s not uncommon for user feedback flows to automatically create JIRA tickets.
The main danger of MCP is not misbehaved integrations (though that is also a worry), it’s prompt injection.
MCP is great for things with data entirely inside the house (only your employees, not injectable) and/or things that can't have irreversible side effects.
But lots of things have untrusted data (e.g. auto-filed JIRA tickets) or surprisingly have irreversible side effects (e.g. any network request).
Prompt injection can happen even for a well-behaved integration, for sources that allow open-ended or untrusted inputs (like search results, emails, etc).
Limiting to a subset of trusted MCP integrations does not meaningfully mitigate prompt injection.
The app store model leads to gatekeepers, but doesn’t address prompt injection.
So now there are two problems!
Prompt injection is hard to combat because normal sandboxing doesn't work without a million permission dialogs.
The stuff you'd use to contain the prompt injection (LLMs) is the stuff that can be tricked by anything you show.
Turtles all the way down.
"Do you trust this domain to get information from this chat?"
This would be a huge number of permission dialogs.
Could you imagine if the web had a permission dialog for every third party domain being requested on the web?
It would be overwhelming.
The web doesn't do it because it doesn't allow sensitive data (only data the user trusted the origin to have access to).
The origin might trust more third parties than the user realizes, but technically users are delegating it to the origin, or the employees who can ship code for that origin.
But LLMs can’t make trust decisions to delegate because they are inherently confusable.
Presumably everyone shipping trusted code for an origin is a professional who is weighing the security risks.
Not true for an LLM.
Prompt injection will become more and more of a problem as we use AI for more real things, at scale.
For example, see this prompt injection technique that can bypass every major LLM’s safeguards.
The only reason this isn’t a big problem yet is that we’re just in the tinkering phase of LLMs.
Tool calling is what makes LLMs turing-complete.
Able to not just say, but do.
I’m happy to see WhatsApp’s Private Processing.
Similar to Apple’s Private Cloud Compute, but actually using Confidential Compute which gives hardware attestation.
Open Attested Runtimes are a similar concept.
Today to have control of your software means being your own sysadmin.
If it’s in the cloud, someone else calls the shots.
If it’s on your device, you call the shots… but you need to configure and maintain it.
Open Attested Runtimes gives you the control of on-device with the convenience of the cloud.
Vibe coding platforms’ target users are basically PMs.
People who can imagine the feature they want to build and just need help making it real.
It still requires quite a bit of effort and savviness to architect the experience and how it should work.
Even vibe prompting platforms that take a small prompt and imagine a fully-formed app from it quickly get unwieldy for people who aren’t used to thinking like a PM.
If your error bar on your measurement is too high you can’t steer with it.
The error bar is hidden; the measurement is not.
So we tend to overestimate our accuracy for steering with a given metric.
If you have a pachinko machine between you and the output you want, you can spend an infinite amount of time trying to figure out the precise spin to put on the ball.
Playing pachinko is fun, so you get addicted to it.
"If I do it just one more time I'll be able to get a great score!"
But the answer is not to sink infinite time in optimizing the spin, it’s to not need to play pachinko in the first place.
LLMs are so noisy that it's hard to figure out the quality of underlying components built on top, because the noise dominates the signal.
Did it break because the surrounding system broke or because the LLM's behavior changed with an additional period that you added to the prompt?
Reliability is more important for building trust than wow moments.
"You can do anything you want in this tool, but 90% of the things you try will fail in frustrating ways" is not a fun tool.
"You can do these 10 useful things reliably and you can also do a long tail of extension points too” is more likely to be useful.
Think of the starting use cases as level design for our game engine and game mechanics.
The game engine is not impressive on its own without fun levels to show it off.
A platform is kind of like a game engine.
Dynamism allows adaptability, but also confusion.
Static things are easier to build on top of because they are a stable foundation.
Capturing the social complexities of a concept like "extended family" is basically impossible mechanistically.
To capture it mechanistically requires one central well-operationalized ontology.
But for nuanced social things that’s impossible, because it’s so contextual.
Family could run the gamut and is a situated context.
One example: the person who is not a blood relative but is basically a grandmother to your kids.
Another example: your blood relative Cousin Vinny who you don’t trust alone in the house to not steal the silverware.
LLMs can help handle the nuance of family nicely because they can handle it non-mechanistically, and use richer context.
There’s a necessary balance between being and doing.
It’s the difference between goals and values.
Values you can live by being.
Goals you achieve by doing.
It’s important to have a balance.
If you just be but don’t do, it’s all vibes.
If you just do but don’t be, you achieve things without soul.
In the west we tend to focus much more on doing.
Sarumans are do-ers.
Radagasts are be-ers.
We tend to focus on the things that are easiest to measure.
Not on the things that are most important to measure.
The Silicon Valley ethos, hyper distilled: "If you can't measure it, it doesn't matter."
In its hyper distilled form this is The McNamara Fallacy.
A little of it is good (compared to not measuring anything), but too much is bad (only focusing on what is measured, missing what’s important).
Magic and luck are hard to distinguish
They look superficially the same.
Moving fast on the wrong things destroys value.
Everyone naturally focuses on urgent not important.
Urgent are fires.
Everyone congratulates you for tackling it, it feels good.
Urgent is superficial.
Important is fundamental.
People use urgency as an implied proxy for importance but they’re disjoint!
An example of performative rigor: dressing up an argument with numbers to make it feel more objective.
"Numbers are objective, they don't lie."
"But the numbers you chose to include are subjective and your decision on what to include matters a lot more!"
The numbers you choose to use in your argument are fundamentally cherry picked.
The destruction of the soul doesn't show up in any spreadsheets.
The research mindset focuses on what’s hard.
The product mindset focuses on what’s valuable.
When creation gets orders of magnitude cheaper you can curate post facto.
When creation is expensive, you have to curate pre facto.
Post facto curation means there’s a cacophony to sift through, but if there’s some way to emergently sort the best to the top, you can find the diamond in the rough.
Pre facto curation means that some great things that are novel might not be found in the first place.
Pre-facto curation is better for convergence and consistency.
Post-facto curation is better for finding divergence and novelty.
Though if the post-facto curation is a social sifting process which is at scale itself average, it might still pull towards convergence.
A corollary of infinite software is that most software is buggy.
In today’s laws of physics buggy software could intentionally or unintentionally leak your data.
Software has bugs in inverse proportion to how much usage it gets.
The more usage it gets, the more likely someone somewhere ran into the bug and that it was fixed.
Mass produced software has lower user agency but also is more likely to work and not have nasty surprises lurking.
In a bottom up process the user can invest energy to make it work for them even if no one else knows it's worth it.
Contrast this with features built in a top-down way.
The only features that exist in an app are the ones a PM decided to fight for
The cost of fighting goes up as the scale of the app's usage and company goes up.
The problem of an echo chamber is not so much the echoes as much as you forget that you're in one in the first place!
At a recent office hours someone was asking me how I get a dysfunctional team of different skill sets to work better together.
A speech I’ve given a few times over my career:
“All of us as individuals are great.
But as a team we stink.
We need to figure out how to work together to rise to our potential together.
By trusting each other and allowing our individual superpowers to fuse together into something larger than any of us could do alone.
But we’ll have to work at it and earn each other’s trust.”
Trust is only indirectly useful, but it's hugely useful.
Trust takes time to build.
It's never the most urgent thing.
But it’s often the most important thing for a team to excel.
The main lubricant of trust in a team: being willing to give other team members who think differently the benefit of the doubt.
This is what allows teams of people who are different to accomplish more than any of them could have on their own.
Radagasts focus on the collective good.
Sarumans focus on the individual good.
Sarumans are super chickens.
Self evident things are cached in your System 1.
Your System 1 is not possible to express verbally.
You can't interrogate concepts in your System 1 because there's nothing to interrogate.
You simply know them because they are obvious, and it's impossible to unsee them.
This is why it's hard to help other people see things that are obvious to you.
Your System 1 can only find insights if you have the right tacit knowhow.
The knowhow has to have marinated in your brain to be absorbed by it.
If you offloaded it to another system then you can't marinate it.
Related to the Zeigarnik effect.
LLM doing RAG is like that.
There are some questions that can't be answered via RAG, for example "What are the key insights in this corpus?"
"Do your own research" sounds reasonable but is unreasonable.
It assumes a low-trust society where the largest size of thing that can be accomplished is what an individual can do.
High-trust societies can create emergent value many orders of magnitude greater than one any individual can do.
High-trust should not be blind trust of course.
Mechanisms for reputation and credibility are part of the social technology that has allowed modern society to scale so far.
They're social sifting processes that lead to emergent accumulation of useful truths even out of a noisy, chaotic, swarming process.
Cliches are cliches for a reason!
They work!
The notion of "the closer you look, the more compelling it becomes" is related to Christopher Alexander's notion of unfolding.
The reason convex systems are so complex is not because of human nature.
It arises because of independent agents making interdependent decisions.
It shows up in any complex adaptive system, humans or no.
Goodhart’s law emerges out of swarms.
In a swarm, each individual in the swarm follows their local incentives, not the incentive that the collective cares about.
To the extent that an agent in the swarm cares about the collective as an intrinsic good, they can do what’s right for the collective even if it’s incrementally worse for them.
The more shame they feel about going against the collective goal, the more likely they are to optimize for the collective.
But if the individuals feel no allegiance, they’ll simply follow whatever their personal incentive is.
The swarm behavior doesn’t arise because of human nature, it arrives because of the independent actors with interdependent decisions.
LLM agents swarming will cause Goodhart’s law even faster, because they will have even less alignment with the collective, and don't feel shame.
Fully transactional and consequentialist, and also don't fear being knocked out of the game.
The biggest cost to creating a prototype is coordination among a collection of experts.
For example, a PM, designer, engineer, etc, all with an individual picture of the overall problem.
The PM doesn't have enough time to do it themselves, so they have to communicate it with enough clarity to a team of people to execute.
The coordination cost dominates the cost to create.
But if you could have it one head, you could get away with significantly less coordination for a given magnitude of output.
LLMs provide that leverage.
This implies the power of generalist PMs should go up.
What are some implications for product organizations?
It should be easier to swarm and find good ideas cheaply.
Organizations should design themselves to be more resilient to experiments failing and reducing the downside cost of them.
They should also make sure the employees that get the upside are also exposed to the downside.
This helps align the individual and collective incentives.
The worst is the person who gets the promotion (upside) is only exposed to indirect / diffuse downside, so it won't hurt them directly, or by the time it does they'll be gone.
If individuals can be more agentic and productive, then it’s more important the swarm of individuals is aligned with the collective’s goal.
Two interesting places to live life: the corporate end of rebel and the rebel end of corporate.