Chapter 2 — AI Fundamentals Every PM Should Know

The demo problem "We saw 94% in staging" is the AI equivalent of "works on my machine." Production data is a different beast.

Alex was midway through the weekly leadership sync when the demo went sideways. He was Product Lead at Supportify, a Series B helpdesk startup that had just bet big on an AI ticket router. "Watch," he told the room, uploading a sample P1 outage ticket. The model tagged it "feature request" with 87% confidence. The CEO leaned in. "That's… not great."

Two engineers started typing furiously. The head of support muttered, "We saw 94% in staging." Alex felt the familiar stomach drop. Same input, different day, different output. The model hadn't "broken." It was doing exactly what probabilistic systems do.

Here's the thing: Alex wasn't dumb. He'd read the papers, watched the Andrew Ng talks, even sat through a weekend prompt-engineering workshop. But none of that prepared him for the moment the fundamentals actually mattered in front of his CEO.

Key principle Traditional software: Input X → Output Y, every time.

AI: Input X → Output Y_ish, probably, today.

This chapter is for every PM who's been in that room. We're going to give you the exact mental models you need — not to become a data scientist, but to ship AI features without getting destroyed in a demo or in production.

The AI PM Trinity

And why it beats every other framework.

Every AI product decision sits at the intersection of three things. We introduced this in Chapter 1; now we go deeper. Data is the foundation. Models are the engine. UX is the only thing users ever see. Nail the balance and you ship. Get it wrong and you get 10,000 angry support tickets.

Figure 2.1 — The AI PM Trinity. Data, Models, and UX form the three pillars. Each edge carries a specific tension — neglect any one and the product collapses.

What Are We Even Talking About?

The PM translation guide.

Machine Learning (classic) ML

Train on labeled data so the system can predict or classify new stuff. Think fraud detection, churn scoring, recommendation engines. Deterministic-ish — same input gives roughly same output.

Generative AI GenAI

Creates new content — text, images, code, summaries. The magic (and the risk) is it's generative and probabilistic. Same prompt can give you three different answers.

Agents Agentic

Not just answering, but acting — calling tools, looping, planning. Most production agents today are heavily constrained (74%+ stay in narrow lanes).

PM translation You don't need to know how transformers work. You need to know that the same question asked twice may get two different answers — and that's by design.

What "Using AI" Actually Means

AI isn't synonymous with LLMs or chatbots.

AI ≠ LLM ≠ chatbot. AI is the broad umbrella. An LLM is one powerful tool under it. A chatbot is simply one possible interface on top of an LLM.

Six application classes you'll actually ship

Conversational interfaces: chatbots, copilots, and voice assistants
Search and retrieval: semantic search and RAG systems that understand intent
Ranking and recommendations: personalized feeds, next-best-action
Prediction and optimization: churn scoring, fraud detection, demand forecasting
Generation: writing assistants, image/video creators, code completion
Automation and decision support: agents, intelligent routing, anomaly detection

Each class stresses the Trinity differently. Know which game you're playing — it changes your metrics, failure modes, and how you design the experience.

Data Is Your PRD

The gap

99.2%lab

81%production

The delta that killed a Series B narrative.

Priya was PM for a radiology AI startup in 2024. The model was state-of-the-art — 99.2% accuracy on clean public datasets. They raised a $42M Series B on that number.

Then they shipped to three hospital systems. Accuracy dropped to 81%. Radiologists started routing everything back to humans.

The model wasn't broken. The data was never production-ready. Real hospitals sent blurry scans, rotated labels, inconsistent formats, missing metadata.

Priya's fix? She stopped treating data as "engineering's problem" and made it her PRD. Every sprint started with a data audit. They built synthetic edge cases. Six months later, accuracy stabilized at 96%.

"I used to write 40-page PRDs. Now my first 10 pages are all data requirements. The model is almost an afterthought."— Priya, PM at a radiology AI startup

Non-Determinism Isn't a Bug

It's the feature. And the nightmare.

Priya's rule "The first 10 pages of any AI PRD should be data requirements. The model is almost an afterthought."

"

With AI the demo is easy, the product is hard. We will need a new class of product managers to build trusted interfaces for pseudo-non-deterministic systems.

— Matthew Prince, CEO of Cloudflare

Key insight Traditional PM: spec behavior, expect it to work that way.

AI PM: design for what happens when it doesn't work that way.

You will ship something that is 97% correct 99% of the time and still get destroyed by that tail. Users don't remember the 97%. They remember the time it confidently told their biggest customer their $2M deal was cancelled.

This is the fundamental shift from traditional software PM work. You are no longer specifying deterministic behavior. You are designing systems that handle uncertainty gracefully — for your users and for your business.

The tail risk

3,000

wrong answers per day from a "97% accurate" model at scale

The math that should scare you: A 97%-accurate model serving 100,000 requests/day generates 3,000 wrong answers daily. If even 1% are high-severity, that's 30 incidents a day.

Figure 2.2 — Precision vs. Recall. Left: high precision but misses real fires. Right: catches everything but drowns users in noise. The slider is a product decision, not an ML one.

Precision vs. Recall

This one lives at the Models ↔ UX edge of the Trinity. The slider between precision and recall is always a product decision.

The trade-off You can't maximize both. Every product must choose its bias — and design UX to compensate.

Marcus and the fraud model

Marcus, a fintech PM, had a fraud model with 99.1% precision — almost never flagged legitimate transactions. But it missed 19% of actual fraud. Customers got hit, chargebacks spiked.

He flipped the dial to higher recall. False positives jumped. Users started getting "your card was declined" on coffee runs. Churn went up.

The winning move wasn't "make the model better." It was designing the UX around the uncertainty: soft flags, one-tap confirmations, trust-building micro-copy ("We're 94% sure this is fraud — want to review?"). Precision/recall became a product decision, not an ML one.

Marcus's fix "We're 94% sure this is fraud — want to review?"

One line of micro-copy turned a model trade-off into a trust-building moment.

One sentence fixed recall by 14 points.

Ivan Leo was debugging an LLM classifier for e-commerce categories. Recall was stuck at 0.86. He added one sentence to the system prompt: "Consider the full range of bottoms: jeans, shorts, pants, skirts, etc. before assigning a category."

Recall jumped to 1.0 on that test set. That's context engineering, not magic.

Ivan Leo's lesson Before you retrain — look at the failure cases. The fix might be one sentence of context.

Define Your AI's Job Description

Your Monday-morning checklist. Takes 20 minutes. Saves months.

Bad job description "Route support tickets"

Good job description "Route 80% of non-urgent tickets with <3% FP rate"

1
Job title & success criteria. "Route 80% of non-urgent tickets with <3% FP rate."
2
Input & output contract. What data do we have in production? What format? How will we surface confidence?
3
Trinity check. Data: enough labeled edge cases? Models: acceptable non-determinism? UX: recovery when wrong?
4
Failure modes & human-in-loop. List the three worst things. Design the escape hatch first.
5
Metrics that matter. Support deflection rate. Time-to-resolution. User trust score. Accuracy is table stakes — not the goal.

Print this. Tape it above your monitor. Make your team run it before any AI spike.

Common Mistakes

And the scars that taught them.

The pattern Every mistake here is the same root error: treating AI like deterministic software.

Mistake	Fix
Treating the demo as the product.	Production data ≠ demo data. Instrument from day one.
Ignoring model drift.	Design for drift from day one. Models degrade. Plan for it.
Chasing accuracy instead of user outcomes.	99% sounds great — until the 1% are highest-value customers.
Prompt hacking instead of context engineering.	Real leverage is in data and workflows, not clever prompt tricks.
Skipping the data audit.	Fastest way to an expensive pivot. Ask Priya.
No eval = no PRD.	Can't measure it? You don't have a product.

Key Takeaways

▸

Data is your PRD. Models are interchangeable.

▸

Non-determinism is not a bug to fix; it's the material you build with.

▸

Precision/recall are UX decisions, not ML ones.

▸

Context engineering > prompt engineering.

▸

The demo is easy. The product is hard. Your job is the hard part.

Table-stakes skill Translate between probabilistic reality and business outcomes. That's the PM job now.

You don't need to become an ML expert. You need to translate between probabilistic reality and business outcomes.

Ask Your DS Team (Next 1:1)

Do this and the next demo won't make you sweat.

1. "What does our data actually look like in production versus training? Show me the last three drift incidents."

2. "If we dial precision up 5 points, what happens to recall and latency? Walk me through the UX impact."

3. "What's our plan for the 3% tail where the model is confidently wrong?"

Now go ship something that feels like magic —
because you understood the fundamentals
well enough to hide the mess.

Cross-references: Chapter 5 covers scoring and optimization. Chapter 9 gives you the product sense framework. Chapter 12 has the tools and metrics.

Reading order Chapter 3 (Conversational AI) builds directly on the Trinity and Job Description frameworks introduced here.

CHAPTER 2 AT A GLANCE

Core mental model	The AI PM Trinity — Data × Models × UX
Key framework	AI Job Description (5-point checklist)
Stories	Alex (demo disaster), Priya (data is your PRD), Marcus (precision/recall), Ivan Leo (one-sentence fix)
Critical shift	Non-determinism is the material, not the bug
New PM skill	Translate probabilistic reality → business outcomes