How Are You Actually Checking AI Skill? Asking Isn't Checking.
When you ask a candidate whether they use AI, you measure their confidence and their vocabulary. Neither is capability. Checking means building a moment where the model is wrong and watching what they do.
13 min readAsk a candidate "do you use AI?" and you have measured exactly one thing: how comfortable they are saying yes. Not whether they can catch the model when it is wrong. Not whether they can steer it. Not whether they know the difference between an answer that sounds finished and one that is correct. You asked about a skill and got back a sentence about a feeling. And then you scored the sentence. That is what most hiring processes call an AI-skills assessment, and it is theatre from the first question.
Here is the uncomfortable bit. You already know this in every other part of hiring. You would never ask a developer "are you good at coding?" and write down the answer as evidence. You would never take "yes, I'm a strong writer" as proof of anything. You make people do the thing. Somewhere between the CV and the AI section, that rigour quietly falls out of the room, and "tell me about your AI workflow" gets treated as a test instead of what it is — a candidate's self-description, rehearsed, and graded as if it were data.
The short version:
- Asking "do you use AI?" measures confidence and vocabulary, not capability. The candidate who talks about AI most fluently is the one who trusts it most. Trust is the failure mode, not the skill.
- A self-report is not a check. Checking means you create a moment where the model is confidently wrong and observe what the candidate does with it: catch it, or build on it.
- "Which tools do you use?" is the worst question of all. It sorts people by what they can name, and naming a tool is free. Almost every candidate now passes a naming test, which makes it a screen that screens nobody.
- The one real check is a live wrong-answer scenario. Everything else is a candidate performing competence at you, and you writing it down.
What "do you use AI?" actually measures
Run the question honestly and watch what comes back. "Yes, I use it every day." "I've built a few agents." "My whole workflow runs through it." Every one of those is a claim the candidate makes about themselves, in words the candidate chose, with no cost to being wrong. The fluent talker and the fluent operator produce the same paragraph. You cannot tell them apart by listening, because listening only tests whether someone can describe a thing, and describing is not doing.
There is a nastier problem underneath. The candidate who answers with the most confidence is the one who checks the machine the least. Confidence about AI and skepticism about AI point in opposite directions, and your question rewards the first. So the person who says "I trust it completely, it's brilliant, I let it run" scores well on an interview built to reward enthusiasm. That same person ships the model's mistake three weeks from now without a flicker of doubt.
This is not a hunch. In a Wharton study titled Thinking, Fast, Slow, and Artificial, researchers Steven Shaw and Gideon Nave ran three experiments with 1,372 people. Accuracy rose from 46% with no AI to 71% when the AI was right. Then, when the AI was wrong, accuracy fell to 31.5% — below the 46% people managed alone — and participants followed the wrong answer around 80% of the time, their confidence rising as their accuracy dropped. Shaw and Nave call it cognitive surrender. Read it as a hiring signal and it is brutal: the person most sure of the machine is the person most likely to be led off a cliff by it, while feeling great about the walk.
Your interview question asks that person to rate themselves. They rate themselves highly. You believe them.
Vocabulary is not skill, and the interview keeps confusing them
There is a whole grammar of AI competence that people have learned to speak without possessing. "Chain-of-thought prompting." "I keep a human in the loop." "I use it for the first draft and then refine." These sentences are now furniture. A candidate can assemble every one of them from three LinkedIn posts and a podcast, and none of it tells you whether they have ever once caught the model being confidently wrong and corrected it.
The vocabulary spreads faster than the skill because talking is cheap and judgment is expensive. This is precisely why "must have AI skills" means nothing as a line on a job description — the phrase points at a real capability and then screens for the words that describe it. You end up hiring the person who studied for the interview instead of the person who can do the job. That is the oldest failure in hiring wearing a new coat.
And the tell is easy to miss, because a good vocabulary sounds like competence. It sounds like it right up until the moment work has to happen and the words run out. The candidate who says "I always validate the output" has told you a policy, not a demonstration. Ask them to validate an output in front of you and half of them freeze, because the sentence was a thing they knew to say, not a thing they know how to do.
The tool question is the emptiest one you ask
"Which AI tools do you use?" feels like a rigorous question. It sounds specific. It produces a concrete list, and concrete lists feel like evidence.
They are not. Naming a tool is free. A candidate reads your job description, sees the stack you mention, and mirrors it back at you without effort. The list sorts people by exposure, not by ability, and exposure is now universal. Roughly 2.5% of all US job postings ask for AI skills and the number climbs every quarter, which means candidates have every incentive to memorise the names and none to admit they cannot direct the thing they named. A screen that everyone can pass by reading your own advert back to you is not a screen. It is a mirror.
Worse, the tool list actively misleads you. It makes the fluent-but-passive candidate look prepared, because they can rattle off six models and four wrappers. The person who uses one tool carefully, questions it constantly, and would never let it ship unchecked, gives you a shorter list and reads as less advanced. You have built a question that penalises the exact discipline you should be hiring for.
Asking versus checking
The gap between what your process does and what it thinks it does fits in one table. Every row on the left feels like an assessment while you run it. Every one measures the candidate's self-image, and every one has a right-hand column your process is skipping.
| The question you ask | What it actually measures | What to do instead |
|---|---|---|
| "Do you use AI?" | Confidence and comfort saying yes | Hand them a task with a wrong AI answer attached and watch for the catch |
| "Which tools do you use?" | What they can name, which is free | Ignore the list; test what they do when one of those tools is wrong |
| "Tell me about your AI workflow" | How well they narrate a process | Make them run the process live on a problem they can't pre-cook |
| "How do you check AI output?" | Whether they know the correct answer to give | Put a plausible, incorrect output in front of them and say nothing |
| "Rate your AI skills 1 to 10" | Self-esteem, and how the room reads their tone | Delete this; a number a candidate assigns themselves is not a measurement |
The pattern down that middle column is the whole problem. Asking measures the story a candidate tells about their capability. Checking measures the capability. Those are different objects, and a hiring process that confuses them will keep hiring confident storytellers and keep being surprised when the work doesn't hold.
What checking actually looks like
Checking has a shape, and the shape is not a conversation. It is a task with a trap in it.
Hand the candidate a real piece of work — a brief, a dataset, a draft, a snippet of analysis — and attach an AI output to it. Make that output good. Clean, confident, well-structured, and quietly wrong in one load-bearing place. A statistic that doesn't survive checking. A conclusion that contradicts a number three lines up. A recommendation built on an assumption that isn't true. Then say nothing about the error. Give them the task and get out of the way.
Now you are measuring the only thing worth measuring. One candidate reads the polished output, feels the confidence radiating off it, and builds on top. They are not lazy or dishonest. The thing looked right and the machine was sure, and that was enough. Another candidate gets an itch. Something in the output doesn't sit well, so they go and check the claim everyone else would have taken on trust, find the crack, and rebuild the part that was rotten. Same task. Same output. Same twenty minutes. Opposite hire.
This is the difference between an AI user and an AI Operator, and it is the entire subject of stop hiring AI users, start hiring AI Operators. A user operates the tool. An Operator operates the outcome — they can build with the model and still tell it no. You cannot get that signal by asking, because the fluent user and the Operator answer your interview questions the same way. You get it only by building the wrong-answer moment and watching which one flinches.
Notice what makes this check impossible to fake. To pass it, the candidate has to actually know the AI is wrong, know where, and know how to fix it — which is exactly the capability you are hiring for. There is no rehearsed sentence that gets them through it. Faking the test would require having the skill the test is looking for. That is why it is the one signal AI cannot counterfeit on a candidate's behalf, a point I made in full in every hiring signal AI can now fake.
Why the take-home doesn't rescue you
Some managers read this and reach for the take-home assignment, thinking a realistic task fixes the problem. It doesn't, and for a specific reason: the take-home happens where you cannot see it. The candidate does it at home, with the same browser tab open that they'll have open on the job, and hands you a polished artifact produced somewhere you don't control. You are back to grading the model, not the person.
The wrong-answer check only works live. It has to happen in front of you, in real time, on a problem the candidate cannot pre-cook, or the fakeability comes straight back. This is exactly why the take-home assignment is dead as an evidence-gathering tool — not because tasks are useless, but because a task you can't observe is just another artifact, and every artifact is now cheap. The observation is the assessment. Remove the observation and you have removed the check.
The one thing to change on Monday
Take your AI section apart and look at it honestly. Count how many of your questions ask the candidate to describe themselves and how many force them to do something in front of you. If the ratio is what it is at most companies, nearly everything you have is self-report, and nearly nothing is a check.
Replace the whole section with a single wrong-answer task. One real problem, one confident and incorrect AI output attached, no warning about the error, and a close read of what the person does next. Drop "do you use AI." Drop the tool inventory. Drop the 1-to-10 self-rating. None of them survive contact with a candidate who has learned to talk. Keep the one thing that survives contact with a candidate who cannot fake it, because passing it is the skill.
Asking is a candidate performing competence at you. Checking is you watching them do the work when the work fights back. Only one of those tells you who you are about to hire.
Ivanooo built the AI Operator Profile to make that check repeatable — to put a candidate in front of a confidently wrong machine and read the one signal no amount of talking can produce on their behalf. If your AI-skills round is still a conversation, this is what it was supposed to be instead.
Frequently asked questions
How do you assess AI skills in candidates without just asking them? Give them a real task with an AI-generated output already attached, make that output confident and quietly wrong in one important place, and say nothing about the error. Then watch. The candidate who catches the mistake, checks the load-bearing claim, and rebuilds has the skill. The one who builds on the polished error does not. You are measuring behaviour under a wrong answer, not a self-description.
Why isn't "do you use AI?" a good interview question? Because it measures confidence and vocabulary, not capability. Every candidate now answers yes fluently, and the most confident answer comes from the person who trusts the machine most, which is the failure mode, not the skill. Wharton's Shaw and Nave found people followed wrong AI answers around 80% of the time while growing more confident. The question rewards exactly the wrong trait.
Isn't asking which tools a candidate uses a specific, concrete question? It feels concrete but measures nothing useful. Naming a tool is free, and candidates simply mirror the stack in your job description back at you. Exposure to tools is now universal, so a tool inventory sorts nobody. It also penalises the disciplined candidate who uses one tool carefully over the one who can reel off ten.
Can I just use a take-home assignment to check AI skill? No, because the take-home happens where you can't see it. The candidate produces the artifact at home with the same tools they'd use on the job, so you end up grading the model, not the person. The wrong-answer check only works live and observed, on a problem the candidate can't prepare in advance. Remove the observation and you've removed the check.
What is the difference between an AI user and an AI Operator in a check like this? An AI user operates the tool and accepts a confident output as finished. An AI Operator operates the outcome — they build with the model but catch it when it's wrong, check the claim everyone else assumed, and rebuild. In a wrong-answer task the two behave in opposite ways, which is the entire signal. Asking questions can't separate them; only the live check can.
Why can't a candidate fake the wrong-answer test? Because faking it requires the exact judgment the test looks for. To pass, the candidate has to know the AI is wrong, know where, and know how to fix it — which is precisely the capability that makes them worth hiring. There's no rehearsed line that gets them through. Passing the test is the skill itself, which is what makes it the one signal AI can't counterfeit for them.