Why LLMs (still) lack taste

Tue, 09 Jun 2026 00:00:00 +0000

Frontier LLMs are really smart, and they’re becoming particularly good at software development. It feels like every week there’s a new model release that achieves SOTA scores on a handful of benchmarks. I use LLMs to build software every day, and they’re incredibly useful, and getting better. But I’m still frequently surprised by the types of mistakes they make.

I don’t expect LLMs to be perfect. Even smart humans make mistakes! But LLMs often make errors that a human with a similar depth of knowledge would never make. Their capabilities feel jagged; they’ll brilliantly pull together thousands of error logs into a coherent analysis that would’ve taken me hours, but then use blatantly flawed reasoning to derive the root cause. So why does “PhD-level intelligence” make these kinds of mistakes?

Reinforcement-Learning on Beyond the Prior

Why LLMs (still) lack taste