You Can't Train What You Can't Measure. And Nobody's Measuring This.
Your AI upskilling budget is buying more of the thing every candidate already has. It is training Fluency and never measuring Direction, and you cannot train what you refuse to measure.
12 min readYou cannot train what you cannot measure, and your AI upskilling programme is not measuring the thing that matters. It measures Fluency, which every person on your team already has. It never touches Direction, which is the only capability that predicts a good hire. So the budget you approved for prompt courses and tool certifications is buying you more confident tool-users, at scale, and calling them Operators. The certificate at the end proves someone met the tool. It proves nothing about whether they can catch it when it lies.
Sit with the shape of that for a second, because it is the whole problem. A training programme is a measurement instrument wearing a curriculum. You decide what "good" looks like, build exercises that produce it, assess against it, and certify the people who clear the bar. Every step assumes you can measure the target. If you cannot measure the thing you actually want, the programme quietly retargets onto the thing you can measure, which is Fluency, and optimises hard for that. Your L&D team has built a beautiful machine for producing the one capability that no longer separates anyone.
The short version:
- AI upskilling trains Fluency and calls it capability. Fluency is what you can make AI do — draft, summarise, generate. It is trainable in an afternoon and every candidate now has it. It is not the signal.
- The signal is Direction: whether a person can steer the model, catch it when it drifts, and override it when it is confidently wrong. Nobody is measuring Direction, so nobody is training it.
- You cannot train what you cannot measure. A course with no measure of judgment-under-a-wrong-answer will drift onto tool proficiency by default, because that is the only thing it can score.
- The spend is real and the targeting is off. "AI skills" sits on roughly 2.5% of US job postings and about 4.2% of entry-level roles, and companies are pouring L&D money at the phrase. The phrase names Fluency. The fix starts with a measure of Direction, not another course.
What your training programme is actually measuring
Walk into any "AI upskilling" module and read what it certifies. Can the learner write a prompt that gets a usable draft. Can they chain a few tools into a workflow. Can they name the models, the plug-ins, the automation platform your company standardised on last quarter. Every one of those is a Fluency check. Every one of them is a thing a motivated teenager has by default and a course teaches in an afternoon.
Now ask the harder question. Where in that module does anyone measure what the learner does when the model hands them a clean, confident, wrong answer? Not a trick question with an obvious tell. A plausible one, the kind that reads finished and costs money three weeks later. In most programmes that moment does not exist, because it is hard to build and harder to score. So it gets skipped, and the thing that separates a good hire from a liability never appears in the curriculum at all.
This is the mechanism nobody says out loud. A training programme cannot certify a capability it has no way to observe. Judgment under a wrong answer is expensive to observe. Prompt proficiency is cheap. So the programme optimises for the cheap thing, ships a certificate, and the buyer reads it as if it meant the expensive thing. It never did.
Fluency is trainable. Direction has to be measured first.
We split the capability into two axes for exactly this reason, and the split is the whole argument here. Fluency is what you can make AI do. Direction is whether you can tell where it should go, catch it when it drifts, and override it when it is confidently wrong. This is the same pairing that decides whether you should stop hiring AI users and start hiring AI Operators. Fluency is the half everyone has. Direction is the half almost nobody tests.
Fluency responds to training the way keyboard skills respond to training. You show, they practise, they get faster, the certificate is honest. Direction does not work like that, and the reason is that it is not a technique. It is a disposition under pressure, the itch that makes one person check the load-bearing claim while the person beside them ships. You cannot certify a disposition you have never measured. And you cannot even design an exercise to build it until you have a way to score whether it moved. Measurement is not the reporting layer that comes after the training. It is the thing that has to exist before the training can point at anything real.
| Fluency | Direction | |
|---|---|---|
| What it is | What you can make AI do | Whether you can steer, catch, and override it |
| Where it shows | The draft, the demo, the certificate | The moment the model is confidently wrong |
| Trainable? | Yes, in an afternoon | Not until it is measured first |
| How common in 2026 | Universal | Rare, and untested |
| What most L&D measures | This | Nothing |
| What predicts a good hire | Almost nothing on its own | This |
Read the table top to bottom and the failure is obvious. Everything the training industry can measure lives in the left column. Everything that predicts whether the hire was worth the salary lives in the right column. The two columns are not the same skill at different levels. They are different capabilities, and the market has built its entire measurement apparatus around the one that no longer matters.
The market is spending into the wrong column
The demand is not imaginary. AI-related skills now appear in roughly 2.5% of all US job postings, and in about 4.2% of entry-level roles, nearly double a year earlier by CNBC's read of the same barometer. Every one of those postings is a hiring manager who wrote "must have AI skills" and then handed L&D a budget to close the gap for the people already inside. The intent is sound. The target is a phrase that names Fluency, which is why "must have AI skills" ends up meaning nothing the moment you press on it.
So the money flows into the left column at scale. Prompt libraries. Certification tracks. Tool badges that expire the next time the model updates. Your people get faster at operating the machine, the completion dashboards go green, and the CHRO reports upward that the workforce is now "AI-enabled." Every number in that report is real. Every number in that report is measuring the column that separates nobody.
Here is the part that should stop the meeting. The person who benefits least from more Fluency training is the confident tool-user who already trusts the machine. You are pouring judgment-free proficiency onto someone whose actual failure mode is a shortage of judgment. Wharton researchers Steven Shaw and Gideon Nave, in a study called Thinking, Fast, Slow, and Artificial, ran three experiments with 1,372 people. Accuracy climbed from 46% with no AI to 71% when the AI was right — and then collapsed to 31.5% when the AI was wrong, below what people managed with no help at all, because they followed the wrong answer around 80% of the time and grew more confident as they did it. More Fluency training does not touch that. It sharpens the surrender. We traced how that decay actually runs in cognitive surrender and output quality; the short version for L&D is that you cannot course your way out of a judgment gap you never measured.
You cannot train what you cannot measure
This is an old idea and it is exactly right here. Training is a controlled attempt to move a number. If there is no number — no measure of the target capability before, during, and after — then the programme is not training toward Direction. It is doing something else and calling it that. It is producing activity, completion, and a certificate, and none of those three is a measurement of judgment under a wrong answer.
The proof is in how the programmes are built. Ask a vendor to show you the item on their assessment that scores what a learner does when the AI is confidently incorrect. Not their satisfaction survey. Not their prompt-quality rubric. The specific measure of catch-and-override under a plausible wrong answer. In almost every case it is not there, which means the programme has no read on the thing it claims to build.
The correct order is the reverse of the industry's order. First build the measure: a repeatable moment where the model is confidently wrong and the learner's response is scored — did they generate an alternative, revise the belief the output contradicted, catch the pattern the model missed, and trace the consequence out a few moves. Those are the four moves that sit underneath judgment, the ones we mapped in why hiring broke, and it wasn't AI. Only once you can score them can a course honestly claim to move them. Measurement is not step nine. It is step one, and the industry skipped it because Fluency was sitting right there, cheap to score and easy to sell.
What an L&D leader does on Monday
Stop buying Fluency and calling it capability. Not because Fluency is worthless — it is necessary, and your people should have it — but because it is the half the market already commoditised, and more of it changes nothing about who can direct the machine when direction is the only thing that matters.
Then flip the order. Before you renew a certification contract, ask the vendor to show you their measure of Direction. If they cannot, they are selling you the left column at premium prices. Build or buy the measurement first: put a person in front of a confidently wrong model and score the four moves. Baseline your team on it. You will find, almost certainly, that your most fluent people are not your best directors, and that the gap you have been funding is not the gap you have.
The correction is not more courses. It is a number you do not currently have. Once you can measure Direction, you can finally train it, hire for it, and stop paying to sharpen the exact instinct — trust the confident answer — that you most need your people to resist.
Ivanooo built the AI Operator Profile to be that missing instrument: not a test of what a person can make AI do, but a repeatable measure of whether they can direct it when it is wrong. It is the number your training was always supposed to move, and never had.
Frequently asked questions
Why doesn't AI upskilling training work? Because it measures the wrong thing. Almost every programme scores Fluency — prompt writing, tool chaining, model familiarity — which is trainable in an afternoon and which every candidate already has. It rarely measures Direction, the judgment to catch and override a confidently wrong output. A course with no measure of judgment quietly retargets onto tool proficiency, so it produces faster tool-users, not Operators.
What is the difference between Fluency and Direction? Fluency is what you can make AI do: draft, summarise, generate, automate. Direction is whether you can steer the model, catch it when it drifts, and override it when it is confidently wrong. Fluency is universal in 2026 and trainable. Direction is rare, largely untested, and it is the axis that predicts whether a hire is worth the salary.
How do you measure AI capability properly? Not with a knowledge quiz or a prompt-quality rubric. Build a repeatable moment where the model is confidently and plausibly wrong, then score what the person does: do they generate an alternative, revise the belief the output contradicts, catch the pattern it missed, and trace the consequence out a few moves. That measures Direction. Everything else measures Fluency.
Isn't more AI training always a good thing? No, and it can make one problem worse. The person who benefits least from more Fluency training is the confident tool-user who already trusts the machine. Wharton's Shaw and Nave found people followed wrong AI answers around 80% of the time while growing more confident. Pouring judgment-free proficiency onto a judgment gap sharpens the surrender rather than closing it.
Why do you say measurement has to come before training? Because training is a controlled attempt to move a number, and if there is no measure of the target capability, the programme has nothing real to point at. It will drift onto whatever it can score, which is Fluency. You cannot design an exercise to build judgment until you can tell whether judgment moved. Measurement is step one, not the reporting layer at the end.
What should a CHRO or L&D leader do first? Before renewing any certification contract, ask the vendor to show their measure of Direction — the specific item that scores catch-and-override under a plausible wrong answer. If it is not there, they are selling Fluency at a premium. Build or buy the measurement first, baseline your team, and expect to find your most fluent people are not your best directors.