Building interm.ai – A Real-Time Helper with Electron, Swift & Bolt


⚡ TL;DR
interm.ai is my cross-platform AI assistant that listens to calls and whispers suggestions in real time—think Cluely, but laser-focused on interviews & sales demos.
Electron desktop app for macOS / Windows
Swift CLI on macOS for lossless system-audio + screenshots
Google Speech-to-Text (Deepgram soon) → GPT-4o for sub-500 ms suggestions
Landing page on Bolt.new; demo videos generated by Claude Code
🛠️ What We Built
Layer | Stack | Why It Matters |
Desktop app | Electron 30 + React | One code-base → .dmg , .exe , AppImage |
Real-time AI | Google Speech-to-Text → GPT-4o (Deepgram planned) | < 500 ms latency |
macOS helper | Swift CLI (ScreenCaptureKit) | Captures system audio + screenshots that Electron can’t |
Marketing & demos | Bolt.new site + Claude Code | Ship landing pages & walkthroughs in hours |
🌐 Demos
Architecture Overview 🛠️
Swift CLI: Capturing macOS Audio & Screenshots 🎙️🖼️
// AudioScreenshot.swift
import AVFoundation
import CoreImage
import ScreenCaptureKit
@main
struct AudioScreenshot {
static func main() async throws {
// 1️⃣ Capture raw system audio
let session = try SCStream.shared(systemAudio: true, microphone: false)
// 2️⃣ Poll active window every 5 s
while true {
let image = try session.captureCurrentFrame()
save(image) // → ~/Library/Caches/interm/frame.png
try await Task.sleep(for: .seconds(5))
}
}
}
Roadmap 🗺️
On-device fallback with a tiny LLM for offline flights.
CRM webhook – push call highlights straight into HubSpot.
Switch speech backend to Deepgram once their low-latency best-word model exits beta.
Open-source the Swift CLI (after tidying up the build script).
Try it Out & Give Feedback 🙌
Subscribe to my newsletter
Read articles from Xin Shen directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
