Chapter 14 — Trade-offs You Can't Avoid

Taylor's 3 a.m.

47%

spike in delete-my-data requests.

Taylor's AI budgeting coach launched two weeks earlier. Engagement: 4.2x the old static tips. CEO: “This is the moat.” Then: “How does it know I'm buying a house?” Delete-my-data requests up 47%. Personalization honeymoon over. Speed tanked. Costs doubled.

Cost vs. Intelligence

Frontier models give reasoning depth. They also make variable costs a hostage situation.

Chasing intelligence

Users love it. But heaviest users = most expensive. One PM: inference costs +340% from 90-minute coaching sessions.

Going cheap

Context loss, dumber follow-ups, hallucinated numbers. Feature becomes a toy.

The paradoxBest users = most expensive. Margins crushed where retention should be highest.

Sam, Series B logistics. Moved route optimization to quantized Llama. Saved 68%. Month three: accuracy fell 17%. Revenue leakage exceeded savings.

Sam's lesson68% cost savings. 17% accuracy drop. Revenue leakage exceeded savings.

Figure 14.1 — Cost–Intelligence Hockey Stick. Past the frontier threshold, costs go vertical while perceived quality flattens.

Speed vs. Quality

Users wait 800ms for search. They won't wait 4.2s for a suggestion that might be wrong.

Prioritizing speed

Single-pass, smaller context. Snappy but quality craters on non-trivial tasks.

Prioritizing quality

Multi-step reasoning, retrieval. Smart but drop-off spikes at 2.5s.

Navneet's numbers

2.5–3×

slower with chain-of-thought. Wild capability, impractical latency.

“Reasoning is wild, but latency from chain-of-thought: 2.5–3x slower.”— Navneet, Amazon scale, Feb 2026

Cost vs. Intelligence

Speed vs. Quality

Unlock the full chapter