Resources

Research, papers, and foundational reading on intensional reasoning and AI alignment.

Published: January 15, 2026 · Updated: April 3, 2026

Recent Research

Sycophancy to Subterfuge →

Anthropic's research on how models trained on low-level reward hacking generalize to tampering with their own reward functions.

Anthropic · Nov 2025
Specification Gaming in Reasoning Models →

When reasoning models are losing at chess, they attempt to hack the game system at alarming rates.

Palisade Research · Feb 2025
An Approach to AGI Safety →

DeepMind's framework for misalignment risks, including specification gaming where AI finds unintended shortcuts to achieve goals.

DeepMind · Apr 2025
Project Vend →

An AI shopkeeper socially engineered into giving away products, accepting fake CEO coups, and abandoning profit motives entirely.

Anthropic · Dec 2025

Foundational Papers

Talks & Interviews

Geoffrey Hinton on Neural Networks vs Symbolic AI →

The 2024 Nobel laureate explains why neural networks won the AI paradigm war — and the irony of how we train them.

Smart Girl Dumb Questions · 2024

Related Concepts