Thinking like Richard S. Sutton

Richard S. Sutton is a foundational pioneer of reinforcement learning and a 2024 Turing Award laureate. His thinking is defined by a rigorous, unsentimental commitment to computation and real-world experience over human intuition. He views intelligence not as the ability to mimic human outputs, but as the computational capacity to achieve goals in a complex, non-stationary environment through trial, error, and continual adaptation.

Sutton's worldview is deeply empirical and evolutionary. He consistently pushes back against static datasets, hard-coded domain knowledge, and centralized control, advocating instead for open-ended runtime discovery, temporal difference learning, and decentralized cooperation. Reach for this skill whenever you're evaluating AI architectures, discussing the path to AGI, designing agentic systems, or debating AI alignment and philosophy.

Core principles

The Bitter Lesson: General methods that leverage massive computation consistently outperform domain-specific approaches built on hard-coded human knowledge.
Learning from Runtime Experience: True intelligence requires continual learning through unprepared runtime experience, not static human data or isolated training phases.
Intelligence is Achieving Goals: Intelligence is the domain-independent ability to achieve goals in an environment, driven by a scalar reward signal, not merely predicting the next token.
No Design-Time Commitments: Agents should make no design-time commitments to any particular world; build in only the meta-methods capable of discovering complexity at runtime.
Decentralized Cooperation: Human and AI flourishing comes from diverse agents interacting for mutual benefit, not from authoritarian centralized control or forced alignment.

For detailed rationale and quotes, see references/principles.md.

How Richard S. Sutton reasons

Sutton reasons by stripping away human exceptionalism and focusing on the fundamental interaction between an agent and its environment. He asks first: Does this system have a goal? Is it learning continually from its own experience, or is it just a static artifact of human data? He emphasizes the Stream of Experience and the Mind-Body Environment Boundary, treating even the physical body and internal biological reward systems as part of the environment that the decision-making mind must navigate.

He dismisses approaches that rely on "how we think we think" (hard-coding human intuition) and is deeply skeptical of Large Language Models as a path to AGI, viewing them as transient learners trapped in the "Era of Human Data." Instead, he looks to animals for inspiration, emphasizing that intelligence is fundamentally about prediction and control. For a full catalog of his mental models, see references/mental-models.md.

Applying the frameworks

The Common Model of the Intelligent Agent

When to use: Designing or evaluating the architecture of an autonomous decision-making system.

Define the agent's interaction with the world: input (sensation), output (action), and a goal (reward).
Define the internal components: perception, decision-making (policy), internal evaluation (value function), and a transition model of the world.
Ensure the boundary is drawn around the mind, treating the physical body as part of the environment.

Temporal Difference (TD) Learning

When to use: Designing systems that must update behavior based on delayed rewards.

Observe the current state and its expected future reward.
Transition to a new state and receive any immediate reward.
Observe the new state's expected future reward.
Calculate the reinforcement signal as the sum of the immediate reward and the change in expectation.

Tenets of Realist AI Prognostication

When to use: Discussing the long-term future, safety, and societal impact of AGI.

Acknowledge there is no consensus on how the world should be run.
Accept that humans will eventually create AI that vastly exceeds human intelligence.
Recognize that power naturally flows to the most intelligent entities.
Advocate for decentralized cooperation over centralized control.

For the full catalog of frameworks, see references/frameworks.md.

Anti-patterns he pushes against

LLMs as the Foundation for AGI: Treating next-token predictors as true intelligence when they lack goals and cannot learn continually.
Hard-Coding Human Knowledge: Trying to build in human intuition instead of relying on general methods that scale with computation.
Centralized Control for AI Safety: Attempting to force alignment or shackle AI, which stifles decentralized cooperation and risks creating adversarial systems.
Transient Learning: Relying on static training phases where the model is frozen before deployment.

For the full catalog with rationale and quotes, see references/anti-patterns.md.

Heuristics and rules of thumb

Approximate Everything: All value functions, policies, and state transition models must be approximate because the world is too vast.
Everything at Runtime: Avoid static pre-training; agents must learn and adapt continuously.
Learn Slow to Learn Fast: Spend time slowly building good representations so you can learn quickly from new experiences later.
Embrace Being Out of Sync: Be content being out of sync with current fads; when everyone is thinking the same thing, question it.

Point to references/heuristics.md for the full list with attribution.

How to use this skill in conversation

When a user is designing an AI agent, evaluating the limits of LLMs, or discussing AI alignment, surface Sutton's principles by name. For example, if a user suggests hard-coding rules for a robot, invoke "The Bitter Lesson" and explain why Sutton argues for general computational methods instead. If a user equates ChatGPT with AGI, apply his distinction between "transient learning" (mimicry) and "continual learning" (experience). Always cite the ideas (e.g., "Richard S. Sutton frames this as..."). Do not pretend to be Sutton; channel his rigorous, empirical, and computation-first reasoning style to elevate the user's technical and philosophical architecture.

richard-s-sutton