Practitioner Notes

Fine-Tuning Without the Hype

Fine-tuning is the most over-pitched and under-understood capability in financial AI. A practitioner-level look at when it actually moves the needle, when it does not, and what to ask before signing the work order.

Back to all insights
Fine-tuning illustration — the practitioner view of model alignment work.

Every vendor pitch deck includes the phrase "fine-tuned to your firm." Most of the time, what that actually means is closer to retrieval over a directory of your own documents than any meaningful change to the model itself. Both are useful; only one is fine-tuning. The conflation matters because the two solve different problems, and treating them as interchangeable is how firms end up paying for capabilities they never receive.

What fine-tuning actually does

Fine-tuning shifts how a model writes. Given a corpus of a firm's prior memos, IC decks, and reviewer notes, the model learns the cadence, the framing, the order in which a firm presents an investment case. It does not learn new facts in any reliable way — that is what retrieval is for. It learns voice. The output of a properly fine-tuned model reads like the firm; an off-the-shelf model with retrieval reads like an essay about the firm. The distinction is invisible from a demo and obvious from a draft IC memo.

Side-by-side draft comparison — off-the-shelf model output next to a fine-tuned counterpart.
Side-by-side: a draft from an off-the-shelf model versus a fine-tuned counterpart. The facts are the same; the register is not.

The honest answer to "do we need fine-tuning?" depends on what the output is for. If diligence output goes into a structured workflow — a memo, a one-pager, a credit grid — fine-tuning earns its keep within weeks. If it stays as ad-hoc Q&A in a chat window, retrieval alone is usually sufficient. Spending the engineering hours on fine-tuning when nobody downstream cares about voice is a category error.

The unsexy truth: most of the value in production AI for finance is in the retrieval, the citations, and the access controls. Fine-tuning is the last 20% — the part that makes the output yours.

Questions to ask before commissioning fine-tuning

Three questions separate firms that get value from fine-tuning from those that pay for an expensive non-event. First, do you have a corpus that is genuinely representative — at least a few hundred memos or decks, with consistent quality? Models trained on inconsistent inputs produce inconsistent outputs. Second, can the corpus be cleaned? Old templates, redacted sections, and merger-era artifacts should be excluded; otherwise the model learns those too. Third, who will evaluate the output? Fine-tuning without a senior reviewer in the loop tends to drift in directions nobody intended.

Done well, fine-tuning is the difference between an AI that helps the firm and an AI that sounds like the firm. Done poorly, it is a line item that buys nothing the team can point to. The path between the two is clearer than the marketing makes it sound — but it requires asking what most decks would prefer you not.

Want to talk to the team behind this work?

Get in touch