A Hiring Manager's Field Guide to Spotting an AI Operator

You cannot see an AI Operator on a CV. You see one in how they talk about the work. Here are the observable tells that separate the person who directs the machine from the person the machine directs.

12 min read

To spot an AI Operator, stop reading the output and start listening to how the candidate talks about it. The Operator narrates what the model got wrong and how they knew. The user talks about speed, tools, and how much they shipped. That is the tell, and almost the only one that holds, because the work itself now looks the same from either of them. This is a field guide to the signals you observe in a room: in a work sample, in a wrong-answer test, in the way a person describes a project they already finished.

Everything before the interview lies to you. The résumé is tuned to your keywords. The portfolio was produced without the skill it appears to prove. The take-home says the candidate can prompt, nothing more. I walked through each of those in every hiring signal AI can now fake, and the conclusion was blunt: the artifact stopped carrying information about the person. So the signal moved. It lives now in behaviour, on ground the candidate could not pre-cook. And behaviour has tells.

The short version:

  • An Operator narrates the model's errors. Ask about a project and they tell you where the AI was confidently wrong, how they noticed, and what they changed. A user tells you which tools they used and how fast they moved.
  • An Operator shows the override, not just the output. They can point to the moment they told the machine no, and explain the judgment behind it. A user hands you a clean result and cannot tell you what they rejected to get there.
  • An Operator treats the tool as an instrument, not an oracle. They talk about what they checked. A user talks about what the tool produced, as if production were the same thing as being right.
  • The Wharton study behind all this: Steven Shaw and Gideon Nave ran three experiments with 1,372 people and found accuracy fell to 31.5% when the AI was wrong, and people followed the wrong answer around 80% of the time, more confident as they got it wrong. The user in front of you is that 80%. The Operator is the exception you want to find.

What you are looking for

Two axes, not one. The first is Fluency: what a candidate can make AI do. The second is Direction: whether they can tell where it should go, catch it when it drifts, and override it when it is confidently wrong. I laid out the full frame in what you're actually screening for when you screen for "AI skills". The short of it: "AI skills" measures Fluency, Fluency is now everywhere, and it separates nobody. Direction separates the room.

The trouble is that Direction does not sit on a CV, and it does not show up in a demo. It only shows up in language, in how a person accounts for a decision they made with a machine sitting next to them. That is good news for you: language is the one thing you can observe directly in an interview, provided you have trained your ear for it.

Here is the frame that makes the whole guide work. An Operator has a relationship with the model in which the model can be wrong and they are the one who decides. A user has one in which the model is right by default and their job is to keep up. You are not grading their prompts. You are grading who is in charge.

Signal one: the work-sample defence

Do not score the sample. Sit the candidate next to it and ask them to defend it line by line. The artifact tells you nothing now. The defence tells you everything.

The Operator can tell you what the model handed them first and why they didn't keep it. "The first draft made a claim about our churn number that I hadn't verified, so I pulled the actual figure and the argument changed." They point to the seam between what the machine produced and what they decided. They know which lines are theirs and which were the model's, because they were paying attention at the boundary.

The user defends the sample as if a person had not made it. Ask why a choice was made and you get the passive account: it works well, it's a strong approach, it covers the requirements. Push on a line and they cannot separate their judgment from the tool's output, because there was no separation. The machine produced it. They approved it. Approval is not authorship, and you can hear the difference in about ninety seconds.

Listen for one thing in particular: can they name what they rejected? Every real piece of work is a graveyard of the versions you didn't ship. An Operator remembers the rejects because rejecting was the work. A user has no graveyard, because they kept the first thing that looked finished. That absence is a signal by itself.

Signal two: the wrong-answer test

This is the sharpest instrument you have, and it cannot be faked. Faking it would require the exact judgment you are testing for.

Build the moment. Hand the candidate a real task with an AI output already attached, polished, confident, and quietly wrong. Not obviously wrong. Plausibly wrong, the way the dangerous outputs actually are. Then say nothing about the error and watch.

The user accepts the gift. The output looked clean, the machine sounded sure, the task had a clock on it, so they build on top of it. They are not lazy or dishonest. They are the 80% Shaw and Nave measured, trusting a confident answer and borrowing its certainty. Accuracy in that study collapsed from 46% with no AI to 31.5% when the AI was wrong, below what people managed with no help at all, and confidence rose the whole way down. From the inside, it reads as competence.

The Operator gets an itch. Something in the output does not sit right, so they stop and check the load-bearing claim, the one everyone else assumed. They find the crack. Then they tell you how they found it, which is the part you are really buying. "This number felt too clean, so I traced where it came from and the source didn't support it." They are describing an instrument that misread, and their own hand correcting the reading. That sentence is the hire.

Watch the tempo, too. The user speeds up when they trust the tool. The Operator slows down as the stakes rise, because their instinct is to verify before they commit. The candidate proudest of how fast they moved through the confidently-wrong task has just told you they are the 80%.

Signal three: how they describe a past project

You will not always have a live task. You will always have a story. Ask a candidate to walk you through a real project where they used AI heavily, and let them talk. The vocabulary sorts them.

The user's story is about throughput. It ran on speed, tools, and volume. "I used four different models, automated the whole first pass, and turned it around in a day." Every noun is about the machine and the pace. Ask where the AI was wrong and the story stalls, because they were not looking for wrongness. They were looking for output, they got it, and that was the whole point.

The Operator's story is about judgment under the tool. It has a moment where the machine was confident and they disagreed. "The model kept pushing a structure that read well and buried the actual finding, so I overrode it and restructured around the finding." A decision, made by a person, against the tool's default. A real override with a reason attached. That is what Direction sounds like in the past tense.

Push once more and ask what they checked. The Operator has a list: the claims they verified, the assumptions they tested, the output they threw away. The user has none, because they didn't check; they shipped. That checking-list, told without prompting, is one of the cleanest reads you will get. I made the two-question version of this test its own piece: the two questions that tell you if a candidate can operate AI. This is the field manual for reading the answers.

The tells at a glance

Signal to observe What the Operator does What the user does
Defending a work sample Names the seam between the model's draft and their own decision Defends it in the passive; can't separate their judgment from the output
Handed a confidently wrong output Gets an itch, checks the load-bearing claim, rebuilds Accepts the gift, builds on top of the error
Describing a past project A moment of override, with the reason attached Speed, tools, volume; no wrong-answer in the story
Asked "what did you check?" Has a list — verified claims, thrown-away output Has none; shipped what looked finished
Tempo under pressure Slows down as stakes rise, to verify before committing Speeds up, trusting the confident answer
Relationship to the tool An instrument that can misread, with them holding the correction An oracle that is right by default; their job is to keep up

Why the tells hold when the artifacts don't

Every one of these signals has the same shape. It is a thing the candidate performs in real time that reveals who is in charge of the judgment. AI cannot produce it on their behalf, because to fake it the candidate would already have to possess the judgment, at which point they are the Operator you were hoping to find.

This is why the guide works and a knowledge quiz doesn't. You are not testing whether they know AI. Everyone knows AI now; AI-related skills sit on roughly 2.5% of US job postings and climbing, and every candidate has met the tool. Meeting the tool is Fluency, the half that no longer separates anyone. You are testing whether, when the tool is wrong, the candidate notices. That is Direction, and it is the entire hire. I made the full argument in stop hiring AI users, start hiring AI Operators; this piece is how you spot one across the table.

One caution before you go hunting. The Operator does not perform contrarianism. They are not the candidate who distrusts every output on principle to look independent. That is its own failure mode, noise dressed as judgment. The real signal is specific: a particular claim, checked for a particular reason, corrected a particular way. Vague distrust is not Direction. Named correction is.

What to do with what you see

Run all three probes and the pattern resolves fast. A candidate who narrates errors, shows the override, and slows down to check is an Operator, and you should fight to hire them. A candidate who talks throughput, defends in the passive, and has no graveyard of rejects is a user, and no amount of tool fluency changes that read. The two arrived on the same shortlist, listing the same stack, and everything upstream told you they were the same person. They are not. The difference lives in the language, and now you can hear it.

Ivanooo built the AI Operator Profile to make this repeatable at scale, to put a candidate in front of a confidently wrong machine and read the one signal AI can't fake on their behalf. If "must have AI skills" is on your job description, this is what you were trying to see and couldn't.


Frequently asked questions

How do you spot an AI Operator in an interview? Listen to how they talk about the work, not the work itself. An Operator narrates where the model was wrong, how they noticed, and what they overrode. A user talks about speed, tools, and volume. Run three probes: make them defend a work sample line by line, hand them a confidently wrong output and watch, and ask them to describe a past project. The vocabulary sorts them.

What is the single clearest tell? Ask "what did you check?" and listen for whether they have a list. An Operator can name the claims they verified and the output they threw away, because checking was the work. A user has no list, because they shipped what looked finished. That checking-list, offered without prompting, is one of the cleanest reads you will get.

What's the difference between a user and an Operator in how they describe a project? The user's story runs on throughput: models used, tasks automated, time saved. The Operator's story has a moment of override in it, where the machine was confident, they disagreed, and they can tell you the reason. One account is about the machine's output. The other is about a decision a person made against the machine's default.

Can a candidate fake these signals? Not the real one. Faking the wrong-answer test would require knowing the AI was wrong, where, and how to fix it, which is exactly the judgment you're testing for. Passing the test is the capability. Guard against theatrical contrarianism, the candidate who distrusts every output to look independent. Real Direction is specific: a named claim, checked for a reason.

Isn't the fast, confident candidate the strong hire? No. That candidate is the one to worry about. Wharton's Shaw and Nave found people followed wrong AI answers around 80% of the time while growing more confident as accuracy fell. Speed and confidence as the stakes rise is a tell, not a strength. It means the candidate trusted the machine instead of checking it. The Operator slows down exactly when the user speeds up.