Jeff Geiser

Why I fine-tune small models instead of prompting big ones

Fri, 22 May 2026 00:00:00 +0000

About the ELM work: why bother? You can prompt Claude or GPT-4 to generate an account brief. It works. Why spend weeks building a fine-tuning pipeline?

Three reasons.

Cost at scale. A frontier API call costs money. Every call. When you’re generating account briefs for hundreds of accounts before a QBR cycle, that adds up fast. A locally-deployed 7B model costs the electricity to run it. At our inference setup that’s roughly $0.00 per call.

About

Mon, 01 Jan 0001 00:00:00 +0000

I’m Jeff Geiser — VP, Customer Engineering at Zenlayer, building in public at the intersection of enterprise AI and sovereign inference. Based in Northern Virginia.

We work with large companies to deploy training and inference - focused primarily on edge inference.

On the side I’m building Wicklee an abservability platform for local ai — watts per token, thermal state, routing decisions. Taarn is the personal AI OS I’ve always wanted: runs on my Mac Mini, knows my goals, texts me every morning, monitors the corners of the internet I care about. compass-md is the open spec underneath both — portable context files any AI tool can read.

Now

Mon, 01 Jan 0001 00:00:00 +0000

Updated May 2026 · what is a now page?

ELM work — finishing account-intelligence-7b-v1. Fine-tuned Qwen2.5-7B on a synthetic dataset of 568 examples across six surfaces: meeting prep, QBR, handoff, renewal alert, onboarding, escalation triage. LoRA training on DGX Spark. Eval harness running DeepEval + LLM-as-judge with 8 locked metrics. Q4_K_M GGUF release when it clears eval.

Taarn — morning brief agent wired to my live compass directory. Texts me at 7am with what matters. Adding the monitor agent next — r/LocalLLaMA, arXiv, GitHub repos I track. Building in public at taarn.ai.

Projects

Mon, 01 Jan 0001 00:00:00 +0000