Why I fine-tune small models instead of prompting big ones

Fri, 22 May 2026 00:00:00 +0000

About the ELM work: why bother? You can prompt Claude or GPT-4 to generate an account brief. It works. Why spend weeks building a fine-tuning pipeline?

Three reasons.

Cost at scale. A frontier API call costs money. Every call. When you’re generating account briefs for hundreds of accounts before a QBR cycle, that adds up fast. A locally-deployed 7B model costs the electricity to run it. At our inference setup that’s roughly $0.00 per call.

Posts on Jeff Geiser

Why I fine-tune small models instead of prompting big ones