The AI Employee Trap

20 Jun

Augment everywhere. Agentise selectively. Own the judgement layer.

The industry is being sold ‘AI employees’ that cost orders of magnitude more than a chatbot and quietly take over the firm’s judgement. Augmentation pays back almost everywhere. Agents pay only on a narrow band of work, and most of what’s being sold sits well outside it. The skill that now separates the winners from the also-rans is telling the two apart.

A pitch crossed my feed last week: seven ‘AI employees’ for a real estate investment firm, plus an orchestrator running the lot around the clock and handing leadership a ‘real-time command centre’. It is a seductive picture. It is also, mostly, the wrong thing bought at the highest possible price. The reasons are worth your time, because the same logic decides where AI pays in your business and where it simply bills you.

Executive summary

Two kinds of AI are landing in commercial real estate at once, and they have opposite economics. Augmentation (AI as a thinking partner, around twenty pounds a month) makes expensive people sharper and pays back almost absurdly. Agentic systems cost one or two orders of magnitude more to run, and only pay when the work is cheap to check and the human time is dear. Most of what is sold as ‘AI employees’ fails that test: it automates the judgement that should stay human and bolts itself onto a deal process that ought to be redesigned. The firms that pull ahead will sort their own work before they buy anything, and will own the layer where their expertise meets the machine. The biggest stack is the booby prize.

SEVEN EMPLOYEES AND A COMMAND CENTRE

The pitch gives a real estate investment firm seven ‘AI employees’, each with a remit:

- Sourcing scrapes broker inbound, offering memoranda and listing platforms, and ‘kills 40-50% of deals before a human looks’.
- Market monitors public and proprietary data and flags regulatory, economic and asset-level risk.
- Comps tracks and analyses sales and lease comparables around the clock.
- Underwriting builds and updates models inside the firm’s own Excel templates.
- Intelligence watches the portfolio for risks and opportunities in real time.
- IC generates institutional-grade investment memos, diligence summaries and deal playbooks.
- Asset management tracks post-close execution, from hundred-day plans to monthly KPIs.

Over the top sits an orchestrator running all seven at once, always on, handing leadership a single ‘real-time command centre’ across deals, assets and operations.

The people who built this know real estate. That is exactly why it deserves a serious read, and why the flaws are instructive rather than embarrassing. The property knowledge is sound. The trouble lives in two questions the pitch never asks: what does it cost to run, and what does it cost to trust?

TWO MACHINES, TWO BILLS

A chatbot you use as a thinking partner answers in one pass. You read it, you push back, you decide. An agent works differently. It plans, calls tools, retries, checks itself, and drags a growing pile of context through every step. A single task can fan out into dozens or hundreds of model calls. So the running cost lands one or two orders of magnitude above a chatbot’s, and a system of seven agents plus an always-on orchestrator is the most expensive shape you can build: continuous machine thinking applied to data that mostly sits still.

At twenty pounds a month, AI as a thinking partner has return economics that are almost embarrassing to write down. Make a skilled analyst even a few per cent sharper and the maths is over before it begins, because the human already owns the judgement and, sitting right next to the work, checks the machine cheaply.

The cheap tool captures value per pound.

The expensive one earns its keep only where the human bottleneck is worth removing, and there a second truth takes over: at the apex of value, the compute cost stops mattering at all. Run a complex deal through three models wearing three analytical personas, and if one of them surfaces a structural risk that would have sunk a fifty-million-pound purchase, the token bill is a rounding error against the disaster avoided.

The waste was never the compute. The waste is spending it on lease comparables. Heavy machine thinking belongs where the stakes are highest and the judgement densest, nowhere near the work a junior clears in ten minutes. The bill only matters when the work doesn’t.

THE QUESTION THAT SORTS IT

So when does an agent earn its tokens? Two things decide it, and neither was on this vendor’s slide.

The first is how cheap the output is to check. If you can verify a result at a glance, you can let a machine run and simply catch the errors as they come. If checking the work costs nearly as much as doing it, the agent has saved you nothing and added a layer of risk for the privilege.

The second is how expensive the human time is that the agent displaces. Free a senior underwriter from an afternoon of model-building and the tokens are a rounding error. Automate a task a junior does in ten cheap minutes and you are paying a fortune to save loose change.

Put the two together and you have a rule you can carry into any vendor meeting:

*agents pay where the work is cheap to verify and the human time is expensive; everywhere else, augment the human and keep them in the loop.*

This is my CRE Automation Matrix doing its job. Sort the work by whether you can check it and what kind of work it is, and the safe ground for autonomy separates cleanly from the ground where AI should challenge a human and never replace one.

And before anyone objects that the cost is falling: the price of a given capability is dropping fast, but that is not the same as your bill falling. As tokens cheapen, consumption explodes to match (the Jevons paradox, in silicon), a hundred agents running around the clock where today you would balk at one, and the firms selling the tokens have every reason to cheer that on. Unit price drops; total spend holds or climbs. What cheaper compute never does is make unverifiable work checkable or turn a commodity into your edge. So whichever way the price curve runs, the discipline is the same: design efficient systems. Verification and differentiation decide whether to trust an agent at all; how well it is built decides what it costs you to run. And building it well is a lever only the firm that owns the system ever gets to pull.

NOW READ THE STACK

Run those seven ‘AI employees’ through the rule and they sort into three piles.

Put asset management in the pile marked safe. It comes last on the list, and it is the strongest thing on it. Tracking KPIs against plans is verifiable to the penny, the work is a genuine grind, and handing it to a machine is pure gain. The two monitoring agents, market and intelligence, belong there too, on one condition: a human acts on the flag, rather than the machine acting for them.

Put underwriting and comps in the pile marked handle with care. Building a model inside the firm’s own template is genuinely useful and easy to check, so long as the agent does the mechanical build and a human still owns the assumptions: the rent growth, the exit yield, the voids. The moment it sets those for you, it has automated the judgement and left you checking arithmetic, and a tidy model carrying plausible wrong numbers is more dangerous than an obviously broken one. Comps has the same shape plus a data problem: private-market comparables are lumpy and lagged, and ‘around the clock’ is mostly marketing.

Then the pile I would mark dangerous, which is, tellingly, the one the pitch leads with. Sourcing sits first on the vendor slide, and ‘kills 40-50% of deals before a human looks’ is sold as efficiency when it is really a hidden tax on your edge. A tired associate skimming a hundred memos misses good deals too, and brings their own biases to it: favoured sponsors, familiar geographies, the assets that look normal. But those biases are plural and contestable. A team carries many of them, they argue in the room, and a killed deal can be reopened. A model collapses all of that into one bias, applied the same way to every deal in the funnel, with nobody watching it happen.

Picture a rifle sighted three inches left: it misses identically, every shot, without a sound. In a business where the return so often lives in the deal that looks wrong on paper, that single silent screen running across the whole funnel is how you automate away the very thing you are paid to spot. A panel of models does not save it: they are trained on the same internet and tuned toward the same notion of sensible, so they share most of the blind spots and merely reject more confidently. And a sharper screen is the wrong goal at the mouth of the funnel, where the job is to keep the odd-looking deal alive rather than bin it more efficiently. The IC agent fails in a subtler way. A machine helping to write the memo is fine; the danger is the memo becoming a substitute for the analyst’s conviction. An investment memo should be the compression of hard-won judgement. If the machine produces the prose before the human has done the thinking, you get confident ‘Workslop’: shallow analysis laundered into the house style of rigour.

WHO CHECKS IT, AND IS IT WORTH IT

Notice what the rule forces you to hold in mind: two separate questions, not one. Can you trust the output? And is it worth the spend? The vendor only ever pitches the second, and answers it with a saving on cheap human attention.

There is a deeper point, and it is the one I would watch most closely. Where a task lands on that matrix is not fixed by the task. It is decided by how well the system is built. A poorly designed memo agent that emits polished prose from a black box is unverifiable by construction, and lives in the danger zone. A beautifully built one that shows its sources, exposes its reasoning, and stops to ask a human to sign off the assumptions becomes checkable, and moves the same task onto safe ground. Same job, opposite outcome. The difference is craft.

Which is why most of these stacks disappoint. They take the firm’s existing process and bolt agents onto it. We spent two decades ‘digitising the past’; this is the same instinct, now ‘AI’ing the past’: automating a workflow that should have been redesigned. The orchestrator’s ‘command centre’ is the tell. It sells leadership the feeling of control while removing the human from the loop, then stacks seven error rates on top of one another and hides the total under a tidy dashboard.

THE EDGE IS THE LAYER

This reframes the build-versus-buy question every firm is now circling. The instinct is to ask whether to build a stack in-house or buy one from a PropTech. Both framings miss where the value sits.

A PropTech selling one stack to many firms has to generalise somewhere. The real question is whether it generalises away the very thing that makes you distinctive. It can hire a clever ex-broker, but it cannot encode your particular judgement and then sell that same product to your competitor. The generic stack, by its nature, cannot carry your edge. Build in-house and you meet the opposite trap: deep property knowledge wedded to naive system design, reproducing the vendor’s mistakes bespoke and at greater cost. Domain experts reliably underrate the craft of building these systems, for the same reason the vendor underrates the verification economics. Each side is strong precisely where the other is weak.

So the better question is about neither building nor buying. Both miss the point. The value sits in who owns the layer where your domain knowledge meets the machine’s capability: the instructions, the checks, the inputs, the process design. That layer is the moat. Give it a name: a Skill (the format devised by Anthropic, now an open standard). A portable, refined body of instruction that captures how your firm reads a building, prices a risk and walks away from a deal, the explicit rules and the tacit gut of your best operators alike, and rides on top of whatever foundation model is state of the art this quarter. The models will keep leapfrogging one another and converging on price. The Skill is the part that stays yours. It is also, conveniently, the one part you can own without building the underlying technology at all, by taking a general-purpose model and pouring your own expertise into how you instruct it and how you verify it. Your edge comes from nailing every input and every process. The model is a commodity. The judgement you encode into how it is used is not. Human is the new luxury, made operational.

But own that layer only where the work actually sets you apart. This is the old distinction between core and context, and it cuts both ways. Where a task is the source of your edge, the judgement your clients pay you for, build it, encode it, guard it, and never let a vendor turn it into something it also sells the firm down the road. Where a task is commodity work that every firm does much the same way, do not build it yourself either. Buy the generic tool and point your scarce design effort at the work that compounds.

This answers the value question as much as the build question. Your real differentiators tend to live in the hard-to-verify work: the investment thesis, the taste, the relationship, the call on the deal that reads wrong on paper. That is the work to automate least and own most, because it is where your returns concentrate and the one thing a competitor cannot buy off the shelf you used. Spend there. Industrialise everything else.

START WITH YOUR OWN DESK

So before you buy a single ‘AI employee’, do the unglamorous work. Write down your firm’s real tasks. Against each one, ask three things:

how cheaply can you check the output,
how expensive is the person doing it now,
and is this where your edge actually lives.

Augment everywhere: that twenty-pound thinking partner belongs on every desk. Hand to an agent only the work that is cheap to verify and dear in hours. Build and own only where the work sets you apart. Buy the rest, and feel no shame in it.

Antony Slumbers

The AI Employee Trap

Augment everywhere. Agentise selectively. Own the judgement layer.

The Office That Earns Its Rent

Off the Yellow Brick Road