Taylor's AI budgeting coach launched two weeks earlier. Engagement: 4.2x the old static tips. CEO: “This is the moat.” Then: “How does it know I'm buying a house?” Delete-my-data requests up 47%. Personalization honeymoon over. Speed tanked. Costs doubled.
Cost vs. Intelligence
Frontier models give reasoning depth. They also make variable costs a hostage situation.
Users love it. But heaviest users = most expensive. One PM: inference costs +340% from 90-minute coaching sessions.
Context loss, dumber follow-ups, hallucinated numbers. Feature becomes a toy.
Sam, Series B logistics. Moved route optimization to quantized Llama. Saved 68%. Month three: accuracy fell 17%. Revenue leakage exceeded savings.
Speed vs. Quality
Users wait 800ms for search. They won't wait 4.2s for a suggestion that might be wrong.
Single-pass, smaller context. Snappy but quality craters on non-trivial tasks.
Multi-step reasoning, retrieval. Smart but drop-off spikes at 2.5s.
“Reasoning is wild, but latency from chain-of-thought: 2.5–3x slower.”— Navneet, Amazon scale, Feb 2026