Minimal Theory Insert · After Chapter 13

What Fine-Tuning
Actually Does

Fine-tuning is not magic education. It’s behavioral steering. Treat it like tailoring an off-the-rack suit, not building a new person.

📖 ~5 min readTheory Insert
scroll
Lena’s $42K lessonPerfect brand voice. Then confidently invented clauses that never existed. The model overfit to patterns and filled gaps with nonsense.

Lena thought fine-tuning would be her silver bullet. As PM at a fast-growing legaltech startup, she was tired of the base model ignoring their clause library. “Just fine-tune it on our 5,000 approved contracts,” she told engineering. Six weeks, $42K in labeling + GPU time later, the model went live.

First week: brand voice finally perfect. Second week: it confidently invented clauses that never existed. Legal almost had a heart attack. The model hadn’t “learned” new facts — it overfit to patterns and filled gaps with high-confidence nonsense.

Fine-tuning takes a pre-trained model and continues training on a small, high-quality dataset of your input–output pairs. You’re nudging the probability distribution so outputs look more like yours.

Does well

Tone, style, voice consistency. Format adherence (JSON, templates). Domain adaptation (legal, medical, jargon). Efficiency on narrow tasks.

Does not do

Reliably add new factual knowledge (use RAG). Fix reasoning weaknesses. Make a mediocre model brilliant.

Santiago’s ruleTry prompting, in-context examples, RAG, guardrails, and chaining first. Fine-tuning is the last resort.
“99% of problems don’t require fine-tuning… Fine-tuning should be your last resort, not the first step.”— Santiago @svpino, June 2025

When it is the right move, production wins come from LoRA or QLoRA — tiny adapter layers at 1/100th the cost. Elliot Arledge: instead of $10K to fine-tune 32B, run multiple rollouts + introspection for $18.

LoRA economics1/100th the cost of full fine-tuning. Tiny adapter layers that punch above their weight on narrow tasks.
BASE MODELWide highway — general knowledge LoRAadapter YOUR DOMAINNarrow road — brand voice, format BEFORE: swerving generic path AFTER: laser-straight branded path &x26A0; May forget side roads
Fine-Tuning as Steering. It narrows and specializes behavior — it does not expand intelligence. Warning: catastrophic forgetting.

Trinity Impact

Data: Where most teams die. You need hundreds to thousands of clean, consistent examples. Garbage in = very expensive garbage out.

Models: You trade some general capability for narrow reliability. Expect to re-tune every quarter as your product evolves.

UX: More consistent, on-brand outputs = faster trust. But failures now feel more confident, which hurts worse.

Kai’s real winFine-tuning forced the data cleanup that should have happened anyway. The model improvement was a bonus.

Kai at the DTC apparel brand: Tried fine-tuning for product descriptions. It worked — after three months cleaning historical data. His exact words: “Fine-tuning didn’t save us time. It finally forced us to fix our data mess. That was the real win.”

PM Monday Checklist

Before you greenlight fine-tuning, answer all four honestly.

The $10K vs $18 testElliot Arledge: instead of $10K to fine-tune 32B, run multiple rollouts + introspection for $18. Try cheaper methods first.
  1. 1

    Have you maxed prompting + RAG? If not, start there. Most teams skip this step and regret it.

  2. 2

    Do you have 500+ gold-standard examples? Clean, consistent, representative of production. Not “we can scrape some.”

  3. 3

    Is the use case high-volume and repetitive? Fine-tuning pays off on narrow, repeated tasks — not broad, creative ones.

  4. 4

    Will you monitor for drift? Catastrophic forgetting and distribution shift are real. Budget for quarterly re-tuning.

Quarterly re-tuneYour product evolves. Your data distribution shifts. If you’re not re-tuning, you’re drifting.

Ask Your DS Team

1. “What’s the realistic effort to collect and clean 800 high-quality examples — and how will we keep them fresh?”

2. “LoRA or full? What’s the inference cost delta at our projected volume?”

3. “How do we catch distribution shift post-deployment before users do?”

Fine-tuning is powerful when you treat it like
tailoring an off-the-rack suit, not building a new person.
Do it at the right moment and you get the crisp,
on-brand experience your users actually trust.
Continue Reading
Trade-offs You Can’t Avoid
Cost vs intelligence. Speed vs quality. Control vs convenience. No best practices — only consequences.
Continue to Chapter 14 →