Agentic Development Is Here - Two Podcasts and Two Tools Worth Knowing

Two podcasts I kept coming back to this week, and two tools I came across that connect nicely to what both episodes are talking about.

1. Andrej Karpathy on Code Agents, AutoResearch, and the Loopy Era of AI

No Priors · Mar 20 · 1hr 6min

Karpathy joins Sarah Guo and opens with something I think a lot of engineers are quietly feeling but not saying out loud: he hasn’t typed a line of code since December. The shift happened fast. He went from 80/20 writing code himself to almost entirely delegating to agents. He calls it “manifesting,” not coding.

What genuinely stuck with me

Token throughput is the new productivity metric. Karpathy’s mental model has completely shifted. The question isn’t “what am I building?” It’s “how do I stop being the bottleneck?” He runs multiple agents in parallel across different repos, each handling a discrete chunk of work while he orchestrates. The anxiety isn’t about running out of ideas, it’s about not maximizing available compute. He feels nervous when he has subscription tokens left over.

AutoResearch surprised even him. He let agents run overnight on his nanoGPT repo, a codebase he’s tuned by hand for two decades, and they came back with improvements he hadn’t caught. Missed weight decay settings, undertuned Adam betas. His framing is sharp:

Anything with a clear metric and cheap verification is a perfect fit for autonomous loops. Researchers shouldn’t be running experiments. They should be setting the objective and stepping away.

“Dobby”: a persistent agent that runs his home. One of the more tangible demos in the episode. He built a claw (persistent agent) that manages his smart home entirely via WhatsApp:

Scanned his local network and found his Sonos system unprompted
Reverse-engineered the API endpoints and built a control dashboard
Now handles lights, HVAC, shades, pool, and a security camera
Texts him on WhatsApp when a FedEx truck pulls up

He used to manage all of this across six separate apps. Now it’s just a conversation.

Model jaggedness is real and underappreciated. He describes talking to models as like talking to a brilliant PhD systems programmer who is simultaneously a 10-year-old. The example he uses: ChatGPT still tells the same atom joke from five years ago, because jokes aren’t in the RL loop. Verifiable tasks improve fast. Everything soft and nuanced barely moves. That gap matters a lot for how much you can actually trust an agent to run unsupervised.

On jobs: cautiously optimistic, via Jevons paradox. He pulled Bureau of Labor Statistics data and spent time thinking through which professions AI would displace versus augment. His framing: AI right now is essentially a “digital ghost.” It can manipulate bits at incredible speed but has no physical presence. That means the disruption hits knowledge work first and hardest, while physical jobs lag behind.

On whether engineering demand will shrink, he points to the ATM/bank teller story. ATMs didn’t reduce tellers, they made branches cheaper to run, so there were more branches and more tellers. Same logic applies here:

Software was scarce and expensive. If the barrier comes down, you get Jevons paradox. Demand for software actually goes up.

He’s not dismissive of the disruption. He acknowledges things will change significantly for anyone whose job is primarily processing digital information. But his read on software engineering specifically is cautiously optimistic. Cheaper, faster software means more of it gets built. The long-term forecast is genuinely hard and he says so plainly, but his short-term view is that demand isn’t going anywhere.

Education is being rerouted through agents. His MicroGPT project is LLM training boiled down to 200 lines of Python. It’s something he’d normally have made an explainer video for. He didn’t this time. The agent already understands it and can explain it better than he can, tailored to whoever’s asking. His value-add is now just the few bits an agent can’t generate on its own: the distilled insight, the opinionated curriculum. Everything downstream is the agent’s job.

2. From IDEs to AI Agents with Steve Yegge

The Pragmatic Engineer · Mar 11 · 1hr 31min

Steve Yegge, 40+ years in software with stints at Amazon, Google, and Grab, talks with Gergely Orosz about what’s actually changed since his last appearance a year ago. He was initially skeptical of LLMs, became a convert after trying Claude Code, and now argues we’re entering the steep part of an exponential curve that isn’t slowing down.

What genuinely stuck with me

The eight levels of AI adoption. Steve laid out a spectrum that’s worth knowing:

No AI
Coding agent in your IDE, cautious usage
Coding agent in IDE, “YOLO mode”, trust is going up
Starting to not review every diff, letting more through
Agent-first, IDE is secondary
Running several agents in parallel
Managing 10+ agents by hand
Building your own orchestrator

His concern isn’t that people won’t get there. It’s that good engineers stuck at levels 1-2 are going to get left behind faster than they realize. The jump from “reviewing every diff carefully” to “I just want the agent, I’ll look at the code later” is a bigger mental shift than most people admit.

The 50% dial. He argues every company has an implicit dial for what percentage of engineers they can cut to fund AI tooling for the rest. He thinks most are landing around 50%. That number sounds alarming, but the point is more nuanced. Engineers who are embracing AI will be dramatically more productive, and companies will right-size around that. We’re already seeing it.

The Dracula effect. This framing was new to me. Vibe coding at full speed is physically draining in a way that eight hours of normal coding isn’t. Steve’s take: three productive hours of full-AI-augmented work per day is probably the real ceiling. Engineering leaders need to start thinking about this. Otherwise you burn people out while the company captures all the upside.

Big companies are structurally stuck. The innovation is going to come from small, AI-augmented teams, same dynamic as when cloud computing shifted the balance of power. Big companies have hyper-productive engineers hitting internal bottlenecks and quitting. Steve’s line:

We’re looking at big, dead companies. We just don’t know they’re dead yet.

3. Tools Worth Knowing

Both episodes talk about multiplying your output through agents. These are two frameworks I came across that seem to take that idea pretty seriously.

Superpowers - github.com/obra/superpowers

⭐ 105k stars (at time of writing)

The problem Superpowers solves is subtle but important: out of the box, Claude will just start writing code the moment you give it a task. That’s often not what you want. Superpowers intercepts that instinct with a structured workflow. It slows the agent down at the start so it can move much faster (and more accurately) through the rest.

The workflow it enforces automatically:

Brainstorm - agent asks clarifying questions, refines the spec, saves a design document
Git worktree - creates an isolated branch so you’re never working in a dirty state
Write plan - breaks the work into 2-5 minute tasks with exact file paths and verification steps
Subagent-driven development - launches a fresh subagent per task, each with a two-stage review (spec compliance, then code quality)
TDD enforced - RED-GREEN-REFACTOR on every task; code written before tests gets deleted
Code review - reviews against the plan between tasks, blocks on critical issues
Finish branch - verifies tests, presents merge/PR options, cleans up the worktree

The key insight is that none of this requires you to remember to invoke it. Skills trigger automatically based on what you’re doing. Your agent just has Superpowers.

Install (Claude Code):

/plugin install superpowers@claude-plugins-official

gstack - github.com/garrytan/gstack

⭐ 37.7k stars (at time of writing)

Garry Tan is President and CEO of Y Combinator and has shipped 600,000+ lines of production code in the last 60 days, part-time, while running YC full-time. His personal record: 140k lines added, 362 commits, ~115k net LOC in a single week.

gstack is the tooling behind that. The idea is that solo builders now need the same coverage a full team provides: someone thinking like a CEO, a designer, an eng manager, a QA engineer, a release manager. gstack gives you that as a set of Claude skills, one command per role.

The sprint structure is what ties it together. Each skill knows its scope and when to stop, so you can run multiple agents in parallel without chaos. Garry uses Conductor to run 10+ sprints simultaneously: one agent on /review, another on /qa, a third shipping a feature.

Install (Claude Code):

git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack
cd ~/.claude/skills/gstack && ./setup