Why my free meeting transcriber costs me $75 a month, not $1,365
The local vs cloud vs hybrid cost call behind a free, no-login meeting transcriber I built in a week, plus the privacy wedge and the mobile wall I had to own.
Hey!
My name is Lambert, and I run growth at Kai.

I gave a meeting transcriber away for free. A Granola alternative that runs entirely in your browser: open a tab, hit record, and when you stop you get a clean transcript with an AI summary and action items. No login, no download, no signup.
Built the obvious way, on a cloud speech-to-text API, it would have cost me about $1,365 a month to run. The way I actually built it costs about $75. Same tool, roughly 16 times cheaper, and your audio never touches a server.
I built it in a week during a hackathon, and I’m not an engineer. The surprising part was that the code was the easy part. The real week went into the cost decision above, and into the small UX details that make a free tool worth using. Here is the full breakdown, including what broke.
This playbook is for you if:
- You want to ship an AI product but you don’t write code for a living.
- You care about local-first and privacy, and you want the real number on what it costs to run.
- You use free tools as a growth channel and you want the honest version, including what broke.
Let’s go.
Part 1: The constraint that shaped everything
The first thing I learned is what a browser physically cannot do. On a Mac, a web page can record your microphone, but it cannot capture the audio coming out of a native app like the Zoom or Teams desktop client. The only sound it reliably gets is your own voice through the mic. Wear headphones, and the other side of the call is gone.
That one limit shaped the whole product. Instead of fighting it, I leaned in: this would be an honest demo of what our app does. Two things I could deliver well were a model that transcribes your voice locally, and an LLM that writes the same summary and action items Kai produces in-app, from the same prompts. So I decided what the tool could not do, and said it plainly in the copy. That one choice turns “this is broken” into “this is fair.”
Part 2: Running Whisper locally, and hiding the 80 MB
Transcription runs on your device. I use Whisper through Hugging Face’s Transformers.js (the Xenova/whisper-base model, served from the Hugging Face Hub). It is about 80 MB on first load.
My first instinct was to run the summary model locally too, something like Llama, so nothing would ever leave the browser. But the download was huge and slow in my tests, and a long download on an anonymous tool is a dead end for conversion.
So the job became hiding it. This is the first place the work was less “write code” and more “steer the model”: my early versions showed a blunt “loading 80 MB” bar, and I kept redirecting Claude until the load disappeared behind the button. Now you click record, a short screen shows for a beat while the model loads, and you are recording in about a second. You load it once. Unless you wipe your data, you never wait again.

Takeaway: Hide the infrastructure the user never asked to think about. A model download is plumbing, not a step they should approve.
Part 3: Local vs cloud vs hybrid, the real cost
I wanted a fully local tool that costs nothing. Reality pushed back: to keep the summary quality high, I needed a paid model for that one step. So I modeled every option and picked with real numbers, not vibes.
| Architecture | Transcription · Summary | $/session | $/month* | Audio on device |
|---|---|---|---|---|
| Pure local | Whisper Base (local) · Llama 3.2 3B (local) | ~$0 | ~$0** | Yes |
| Hybrid · cheapest | Whisper Base (local) · Gemini 2.5 Flash Lite | $0.003 | ~$45 | Yes |
| Hybrid · what I shipped | Whisper Base (local) · Gemini 2.5 Flash | $0.005 | ~$75 | Yes |
| Hybrid · premium | Whisper Base (local) · Claude Haiku 4.5 | $0.010 | ~$150 | Yes |
| Hybrid · max accuracy | Whisper Turbo (local) · Claude Sonnet 4.6 | $0.044 | ~$660 | Yes |
| Pure cloud · AssemblyAI | AssemblyAI · Claude Haiku 4.5 | $0.083 | ~$1,245 | No |
| Pure cloud · OpenAI STT | gpt-4o-mini-transcribe · Gemini 2.5 Flash Lite | $0.091 | ~$1,365 | No |
* At 500 sessions/day, 30-minute average session, around 5-6k input and 1-1.5k output tokens per summary. Pricing as of June 2026. ** Local compute is free; pure local’s only real cost is shipping the 1.78 GB local bundle (Whisper Base plus the Llama summary model), which is ~$0 on a free model CDN like Hugging Face. I skipped it for the download size, not the cost.
The expensive part is cloud transcription. The moment you send audio to a paid speech-to-text API, you are well over a thousand dollars a month. Run transcription on the device with Whisper and that cost goes to zero. The privacy story and the cost story turned out to be the same decision: keep audio on the device, keep the bill sane.
To be exact about privacy: your audio never leaves your device, but the transcript text does go to Gemini for the summary. That is the one cloud hop. Pure local keeps even the text on-device, and the table shows why I didn’t take it.
I shipped the hybrid: Whisper local for transcription, Gemini 2.5 Flash (the same model the Kai app uses) for the summary. About half a cent a session, roughly $75 a month at launch volume, versus 16 times that for full cloud.
One honest sub-story: an anonymous endpoint that calls a paid model gets abused. So a real chunk of the week went to the boring defense: the key stays server-side, locked to one model, with a daily spending cap and a per-IP rate limit. None of it shows in the UI. Most of “shipping a free AI tool” is exactly this.
Takeaway: Local-first is often the cheapest and the most private option at once. Model the boring alternatives before you commit. The winner is rarely the obvious one.
Part 4: Stop gating, start demoing
Pretty quickly the real question showed up: how do you make people care? My first plan was the classic growth move, hide the transcript and force a signup or an email to unlock it. Always an exchange.
I dropped it. The tool became a full, free demo you can use whenever you want. You can even email yourself the notes or send them to a colleague. We store nothing. The value is not in withholding the transcript. It is in everything that comes after: polishing the output, turning a meeting into a task, building real docs from a call, storing every meeting, and talking to an assistant that has access to all of them. That is where Kai stops being a demo and becomes the product you pay for.

Want to see it? Open a tab and talk for ten seconds: hirekai.ai/tools/meeting-transcription.
Takeaway: For a top-of-funnel tool, give the value away. Convert on depth, not on a gate. It is the same bet I made with my free SEO tools.
Part 5: Why it doesn’t work on mobile
Mobile was a real fight, and I lost it. I spent a day going back and forth with Claude on it: the model would hang at 99 percent, then the tab would die a few seconds into recording. My first guess was WebGPU, but iOS Safari ships WebGPU now, so that wasn’t it. The real wall is memory. Loading the model and running inference blows past what iOS gives a single browser tab, and the tab gets killed. I tried a lighter fallback model. Same wall.
That stings, because the tool is built for SEO and mobile traffic matters; a page that doesn’t work on mobile is a flag for Google. But I can’t fake physics. On mobile it doesn’t work, and if you want it on your phone you use the app. Testing on a real device, not the simulator, is the only reason I know. And knowing what not to ship here, claiming “works on mobile” when it crashes, saved me more than the honest limit cost.
Part 6: The small details that make people stay
This is the lesson I underestimated. After a lot of feedback, I found where a good experience quietly turns bad: you refresh the page and lose everything. So I added local storage for your last meeting. Refresh, lose connection, or come back three days later, and it is still there.
There is a whole layer of touches like that. A loading animation while the summary is written, so the wait feels alive. The little banner that brings your last meeting back. None of these are features you would list on a landing page. But they are why people stay on the page longer. And for a growth person that matters twice: time on page is a signal Google rewards, and it tells anyone who lands there that the real product is built with care. It is the same instinct behind my AI-native website playbook.

Takeaway: The UX details are not polish you do later. For a free tool, they are the conversion.
Part 7: What I actually learned
Here is what surprised me. The MVP was the cheap part. I shipped this in a week without being an engineer, and the building itself was rarely the bottleneck.
What ate the time was the two things you can’t hand to the model: steering it well when it goes sideways (the mobile crash above cost me a full day), and having the taste to know which UX details actually matter. A few years ago the craft was building the MVP. Now the MVP is nearly free, and the craft has moved: to directing the AI, and to the experience around it. That is the real skill now, and it is not the one I expected to be using.

The system
If I distilled the week into steps:
- Find the hard constraint first, and design around it instead of fighting it
- Pick local-first when you can. It is usually the cheapest and most private at once
- Hide the infrastructure (downloads, model loads) the user never asked about
- Put a real spending wall on any anonymous endpoint that costs money
- Give the value away for free. Convert on depth, not on a gate
- Be honest about what doesn’t work. It costs less than faking it
- Sweat the small UX details. They are the conversion and the SEO signal
- Spend your real time on the two things the model can’t do for you: steering it, and the experience
About Kai
Kai is an AI assistant that does things for people, built by the team behind Morgen. Free tools like this one are one piece of how we think about growth. The companion Chrome extension, which captures full tab audio, is a separate project.
Follow the journey:
Thanks for reading
This isn’t theory. It’s a documented week of building a real tool, including the parts that broke. If it was useful, share it with someone who’s about to build their first AI tool, or message me on LinkedIn.