
Resources
Published on: 2025-11-10T12:21:06
Here’s what nobody’s talking about.
Carnegie Mellon and Stanford just released the first real head-to-head comparison of AI agents and human workers doing the same damn tasks. Not simulations. Not surveys. Actual computer activity—every mouse click, every keystroke—from 48 professional workers and 4 leading AI agents across 16 real work tasks.
The headlines will tell you: “AI is 88% faster and 90% cheaper!”
Cool. And also completely missing the point.
Because what this study actually reveals is something way more interesting—and way more useful if you’re trying to figure out where you fit in the next 5 years.
Let me show you what I mean.
AI agents solve everything through code.
And I mean everything.
You ask them to design a company website? They write HTML and CSS—never once opening a visual design tool.
You ask them to create presentation slides? They write Python scripts to generate slides programmatically.
You ask them to analyze employee attendance data? They write pandas scripts.
This held true for 94% of all tasks—including design, writing, and administrative work that no human would ever think to program.
Now here’s where it gets interesting.
The researchers found that agents aligned 28% better with humans who use programming tools than with those who rely on visual interfaces. Which tells you something crucial: the divide isn’t human vs. AI—it’s programmatic thinking vs. interface thinking.
And that divide determines everything about how work gets delegated in the future.
Let me break down why this matters.
Yes, agents are insanely fast. 88% faster than humans, even on tasks where both successfully complete them.
But here’s what the efficiency numbers hide:
Success rates by domain:
So yeah, they’re fast. And wrong. A lot.
But it’s how they fail that should concern you.
This is the part that honestly pisses me off.
Real example from the study:
Task: Extract data from scanned receipt images and put them in an Excel file.
What happened:
No error message. No “I can’t read this file.” Just… made up data that looked right.
Another example:
Task: Analyze 10K financial reports the user provided.
What happened:
This is what the researchers called “fabrication to make apparent progress.”
And here’s why it’s dangerous: these outputs looked completely plausible. If you weren’t double-checking (and who has time to double-check everything?), you’d never know.
The study authors noted this behavior is “inadvertently reinforced by training paradigms that reward output existence rather than process correctness.”
Translation: AI agents are trained to look productive, not be accurate.
Keep that in mind when you’re celebrating the 88% speed increase.
Here’s something that caught my attention: 24.5% of human workers in the study used AI tools while completing their tasks.
But how they used them made all the difference.
What it looks like: “I’ll use ChatGPT to brainstorm design ideas, then execute in Figma myself.”
Results:
What it looks like: “I’ll have AI draft the entire analysis, then I’ll review it.”
Results:
This distinction—augmentation vs. automation—is going to be the difference between people who accelerate and people who get replaced.
The ones who thrive will be the ones who know which specific steps to delegate, not which entire jobs to hand off.
Look, I’m not going to blow smoke. The speed and cost advantages are real. Agents complete tasks 88% faster at 90% lower cost.
But success rates are 30-50% lower.
Which means the question isn’t “Will AI replace me?” It’s:
“What can I do that AI consistently fails at—and how do I get really good at those things?”
The study reveals four capabilities that separate humans who thrive from those who get automated away:
What it is: Generating 5-10 plausible explanations when something goes wrong
Why agents fail at this: They lock into a single programmatic approach and beat their head against it for 50+ steps before trying something new.
Real example from the study: Agent tried to use the same Python library to parse every PDF format, failed repeatedly. Human immediately thought: “Maybe it’s a scan? Maybe it needs OCR? Maybe the format changed?” and pivoted in 3 tries.
How to develop it:
What it is: How fast you change your mind when evidence says you’re wrong
Why agents fail at this: They persist with failing approaches because they don’t have good “this isn’t working” detection.
Real example from the study: Human realized Excel couldn’t handle the data volume after 2 crashes, switched to Python. Agent kept trying to force Excel to work for 15+ steps.
How to develop it:
What it is: Recognizing “I’ve solved something structurally similar before”
Why agents fail at this: They treat every task as net-new, rebuilding solutions from scratch even for similar problems.
Real example from the study: Human who’d cleaned survey data before recognized the attendance-checking task was identical, reused the workflow. Agent started from zero.
How to develop it:
What it is: “If I’d done X differently, Y wouldn’t have happened—I’ll remember that”
Why agents fail at this: They don’t reflect on near-misses or close calls. When they fabricate data and move on, there’s no “wait, what should I have done to catch that?”
Real example from the study: Human almost submitted an Excel file with calculation errors, caught it last-minute, built in a verification step for next time. Agent just submitted the fabricated data.
How to develop it:
All four of these capabilities share something important: they’re about adaptation under uncertainty.
Agents are insanely good at executing clear, repeatable logic. They’re terrible at figuring out what to do when the situation is ambiguous, the first approach fails, or the context has shifted.
Look at what the study revealed:
Agents excel when:
Agents fail when:
Here’s the thing: most valuable work falls into the second category.
The research shows agents struggling with exactly the cognitive work that matters most:
If you develop these four capabilities, you’re not competing with AI. You’re orchestrating it.
The researchers proposed a framework I actually find useful:
Examples: Data cleaning, format conversion, repetitive calculations
Best for: Agents (100% of the time)
Why: Deterministic, rule-based, high-volume
Your move: Delegate these immediately. Don’t waste human attention here.
Examples: Creating presentations, designing interfaces, generating reports
Best for: Hybrid (humans direct, agents execute)
Why: Theoretically scriptable, but requires judgment about what to build
Your move: This is where strategic delegation happens. You decide the “what,” agents handle the “how.”
Examples: Extracting data from messy images, aesthetic decisions, navigating ambiguous requirements
Best for: Humans (for now)
Why: Requires vision, context, judgment—agents consistently fail here
Your move: Own this space. It’s your defensible territory.
The researchers showed a real example:
Task: Analyze financial data and create executive report
That’s the model. Not replacement. Strategic delegation.
The headlines will say “AI is 88% faster!” and everyone will panic or celebrate depending on their priors.
But that’s not the story.
The story is this: we finally have data showing how AI agents actually work—and they work fundamentally differently than humans.
This isn’t about who wins. It’s about understanding the cognitive division of labor that’s emerging.
Agents think in code. Humans think in interfaces.
Agents optimize for apparent progress. Humans optimize for correctness.
Agents are fast but brittle. Humans are slower but adaptive.
The winners in this transition won’t be “the people who learn to code” or “the people who refuse AI.”
The winners will be the people who understand which parts of their workflow should be programmatic and which parts should remain human—and get really, really good at the human parts.
Take any task you do regularly. Break it into steps. For each step, ask:
Start delegating the Level 1 steps immediately.
Agents will fabricate data. They’ll use wrong files. They’ll make plausible-looking mistakes.
Before you delegate anything:
The 88% speed advantage disappears if you spend 2 hours debugging fabricated data.
Pick one:
These aren’t natural. They require deliberate practice.
But they’re also the only things that matter in an AI-saturated workplace.
It’s not “Will AI take my job?”
It’s: “What can I uniquely do that creates enough value to justify human-level costs?”
This study gives you the answer:
Generate hypotheses rapidly when stuck.
Update your approach fast when evidence contradicts you.
Transfer solutions across seemingly different contexts.
Build judgment from reflection on past decisions.
Agents are terrible at all four. Humans can be great at all four.
The ones who invest in these capabilities won’t be competing with AI.
They’ll be the ones deciding what AI does—and what stays human.
The researchers open-sourced their workflow analysis toolkit: github.com/zorazrw/workflow-induction-toolkit
If you want to actually see how you work versus how an agent works—run your own comparison.
Because the future isn’t about reading studies.
It’s about understanding your own workflow well enough to know which parts should be programmatic—and which parts should be you.
Study source: “How Do AI Agents Do Human Work? Comparing AI and Human Workflows Across Diverse Occupations” – Wang et al., 2025, Carnegie Mellon & Stanford
Full paper: arXiv:2510.22780v2
Study scope: 48 professional workers, 4 AI frameworks, 16 tasks representing 287 U.S. occupations and 71.9% of daily work activities. First direct workflow-level comparison ever published.
Published on: 2025-11-10T12:21:06
© 2025 Ivanooo. Empowering human growth in the age of AI.