transcribe.dev - Voice Dictation Platform
Project Overview
transcribe.dev is a system-wide voice-to-text dictation tool (similar to Wispr Flow) that transcribes speech in real-time, then uses AI to clean, format, and adapt tone based on context. Works in any application.
Tech Stack
- Desktop App: Electron + Next.js + TypeScript
- Backend: Convex (real-time database, actions, file storage)
- Voice Capture: Web Audio API
- STT Engine: Deepgram (primary cloud) / whisper.cpp (local/privacy mode)
- AI Cleanup: Claude Haiku API (via Convex actions)
- Text Injection: robotjs or nut.js (cross-platform keystroke simulation)
- Auth: WorkOS AuthKit
- Mobile (future): React Native
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ ELECTRON APP │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ Voice Input │→ │ Deepgram WS │→ │ Convex Action (cleanup) │ │
│ │ (mic) │ │ (streaming) │ │ Claude Haiku │ │
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ System Tray │ │ Text Injection (nut.js) │ │
│ │ + Hotkeys │ │ → Active Application │ │
│ └─────────────┘ └─────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ CONVEX BACKEND │
│ Tables: users, dictionaries, snippets, appTones, history │
│ Actions: transcribeAudio, cleanupTranscript │
│ Real-time sync across all user devices │
└─────────────────────────────────────────────────────────────────┘Project Structure
transcribe-dev/
├── apps/
│ └── desktop/ # Electron + Next.js app
│ ├── main/ # Electron main process
│ │ ├── index.ts # Main entry, window management
│ │ ├── tray.ts # System tray
│ │ ├── hotkeys.ts # Global hotkey registration
│ │ ├── audio.ts # Audio capture
│ │ └── injection.ts # Text injection via nut.js
│ ├── renderer/ # Next.js renderer
│ │ ├── app/
│ │ │ ├── page.tsx # Main dictation UI
│ │ │ ├── settings/ # Settings pages
│ │ │ └── layout.tsx
│ │ ├── components/
│ │ │ ├── Waveform.tsx
│ │ │ ├── DictationOverlay.tsx
│ │ │ └── SettingsPanel.tsx
│ │ └── hooks/
│ │ ├── useVoiceCapture.ts
│ │ ├── useDeepgram.ts
│ │ └── useTextInjection.ts
│ ├── electron-builder.json
│ └── package.json
├── convex/ # Convex backend
│ ├── schema.ts # Database schema
│ ├── users.ts # User queries/mutations
│ ├── dictionaries.ts # Personal dictionary
│ ├── snippets.ts # Voice shortcuts
│ ├── appTones.ts # Per-app tone settings
│ ├── history.ts # Dictation history
│ └── voice.ts # Actions: transcribe, cleanup
├── packages/
│ └── shared/ # Shared types and utilities
│ ├── types.ts
│ └── constants.ts
├── package.json # Monorepo root
├── turbo.json # Turborepo config
└── README.mdConvex Schema
// convex/schema.ts
import { defineSchema, defineTable } from "convex/server";
import { v } from "convex/values";
export default defineSchema({
users: defineTable({
authId: v.string(),
email: v.string(),
plan: v.union(v.literal("free"), v.literal("pro"), v.literal("team")),
settings: v.object({
defaultLanguage: v.string(),
localModeEnabled: v.boolean(),
autoCapitalize: v.boolean(),
autoPunctuation: v.boolean(),
holdToTalk: v.boolean(),
hotkey: v.string(),
}),
createdAt: v.number(),
}).index("by_auth", ["authId"]),
dictionaries: defineTable({
userId: v.id("users"),
word: v.string(),
pronunciation: v.optional(v.string()),
category: v.optional(v.string()),
}).index("by_user", ["userId"]),
snippets: defineTable({
userId: v.id("users"),
trigger: v.string(),
expansion: v.string(),
isEnabled: v.boolean(),
}).index("by_user", ["userId"]),
appTones: defineTable({
userId: v.id("users"),
appIdentifier: v.string(),
tone: v.union(
v.literal("casual"),
v.literal("professional"),
v.literal("technical"),
v.literal("friendly")
),
customInstructions: v.optional(v.string()),
}).index("by_user_app", ["userId", "appIdentifier"]),
history: defineTable({
userId: v.id("users"),
rawTranscript: v.string(),
cleanedText: v.string(),
targetApp: v.optional(v.string()),
durationMs: v.number(),
createdAt: v.number(),
}).index("by_user_time", ["userId", "createdAt"]),
});Key Convex Actions
// convex/voice.ts
import { action } from "./_generated/server";
import { v } from "convex/values";
import Anthropic from "@anthropic-ai/sdk";
export const cleanupTranscript = action({
args: {
rawTranscript: v.string(),
tone: v.string(),
customInstructions: v.optional(v.string()),
userDictionary: v.array(v.string()),
},
handler: async (ctx, args) => {
const anthropic = new Anthropic();
const systemPrompt = `You are a dictation cleanup assistant. Transform spoken text into clean, written text.
Rules:
- Remove filler words (um, uh, like, you know)
- Fix grammar and punctuation
- Maintain the speaker's voice and intent
- Tone: ${args.tone}
- Known words/names to preserve exactly: ${args.userDictionary.join(", ")}
${args.customInstructions ? `- Additional instructions: ${args.customInstructions}` : ""}
Return ONLY the cleaned text, nothing else.`;
const response = await anthropic.messages.create({
model: "claude-3-haiku-20240307",
max_tokens: 1024,
system: systemPrompt,
messages: [{ role: "user", content: args.rawTranscript }],
});
return response.content[0].type === "text"
? response.content[0].text
: args.rawTranscript;
},
});
export const transcribeAudio = action({
args: {
audioBase64: v.string(),
language: v.optional(v.string()),
},
handler: async (ctx, args) => {
const response = await fetch(
"https://api.deepgram.com/v1/listen?model=nova-2&smart_format=true",
{
method: "POST",
headers: {
Authorization: `Token ${process.env.DEEPGRAM_API_KEY}`,
"Content-Type": "audio/webm",
},
body: Buffer.from(args.audioBase64, "base64"),
}
);
const result = await response.json();
return result.results?.channels[0]?.alternatives[0]?.transcript ?? "";
},
});Electron Main Process
// apps/desktop/main/index.ts
import { app, BrowserWindow, globalShortcut, ipcMain } from "electron";
import { createTray } from "./tray";
import { registerHotkeys } from "./hotkeys";
import { AudioCapture } from "./audio";
import { TextInjector } from "./injection";
let mainWindow: BrowserWindow | null = null;
let audioCapture: AudioCapture;
let textInjector: TextInjector;
async function createWindow() {
mainWindow = new BrowserWindow({
width: 400,
height: 600,
frame: false,
transparent: true,
alwaysOnTop: true,
webPreferences: {
nodeIntegration: false,
contextIsolation: true,
preload: path.join(__dirname, "preload.js"),
},
});
// Load Next.js app
if (process.env.NODE_ENV === "development") {
mainWindow.loadURL("http://localhost:3000");
} else {
mainWindow.loadFile("renderer/out/index.html");
}
}
app.whenReady().then(async () => {
await createWindow();
createTray(mainWindow);
audioCapture = new AudioCapture();
textInjector = new TextInjector();
registerHotkeys({
onStartRecording: () => audioCapture.start(),
onStopRecording: () => audioCapture.stop(),
});
// IPC handlers
ipcMain.handle("inject-text", async (_, text: string) => {
await textInjector.inject(text);
});
ipcMain.handle("get-active-app", async () => {
return textInjector.getActiveApp();
});
});Deepgram Streaming Hook
// apps/desktop/renderer/hooks/useDeepgram.ts
import { useCallback, useRef, useState } from "react";
interface UseDeepgramOptions {
onTranscript: (text: string, isFinal: boolean) => void;
onError: (error: Error) => void;
}
export function useDeepgram({ onTranscript, onError }: UseDeepgramOptions) {
const wsRef = useRef<WebSocket | null>(null);
const [isConnected, setIsConnected] = useState(false);
const connect = useCallback(async () => {
const ws = new WebSocket(
"wss://api.deepgram.com/v1/listen?model=nova-2&smart_format=true&interim_results=true",
["token", process.env.NEXT_PUBLIC_DEEPGRAM_API_KEY!]
);
ws.onopen = () => setIsConnected(true);
ws.onclose = () => setIsConnected(false);
ws.onerror = (e) => onError(new Error("WebSocket error"));
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
const transcript = data.channel?.alternatives?.[0]?.transcript;
const isFinal = data.is_final;
if (transcript) {
onTranscript(transcript, isFinal);
}
};
wsRef.current = ws;
}, [onTranscript, onError]);
const sendAudio = useCallback((audioData: ArrayBuffer) => {
if (wsRef.current?.readyState === WebSocket.OPEN) {
wsRef.current.send(audioData);
}
}, []);
const disconnect = useCallback(() => {
wsRef.current?.close();
wsRef.current = null;
}, []);
return { connect, sendAudio, disconnect, isConnected };
}Environment Variables
# .env.local (Convex)
CONVEX_DEPLOYMENT=your-deployment-name
WORKOS_CLIENT_ID=your-workos-client-id
DEEPGRAM_API_KEY=your-deepgram-key
ANTHROPIC_API_KEY=your-anthropic-key
# .env.local (Web App)
NEXT_PUBLIC_CONVEX_URL=https://your-deployment.convex.cloud
WORKOS_CLIENT_ID=your-workos-client-id
WORKOS_API_KEY=your-workos-api-key
WORKOS_COOKIE_PASSWORD=your-32-char-random-string
NEXT_PUBLIC_WORKOS_REDIRECT_URI=http://localhost:3012/callback
NEXT_PUBLIC_DEEPGRAM_API_KEY=your-deepgram-keyGetting Started Commands
# 1. Create the monorepo
mkdir transcribe-dev && cd transcribe-dev
pnpm init
# 2. Set up Turborepo
pnpm add -D turbo
echo '{"$schema": "https://turbo.build/schema.json", "tasks": {"build": {}, "dev": {"cache": false}}}' > turbo.json
# 3. Initialize Convex
pnpm create convex@latest
cd convex && pnpm install && cd ..
# 4. Create Electron app with Next.js
mkdir -p apps/desktop
cd apps/desktop
pnpm create next-app@latest renderer --typescript --tailwind --eslint --app --src-dir=false
pnpm add electron electron-builder
pnpm add -D concurrently wait-on
# 5. Install key dependencies
pnpm add convex @workos-inc/authkit-nextjs @anthropic-ai/sdk
pnpm add nut-js # For text injection
# 6. Run development
pnpm devMVP Feature Checklist
Phase 1: Core Dictation
- Electron app with system tray
- Global hotkey (Cmd+Shift+Space) to start/stop
- Deepgram streaming transcription
- Basic text injection into active app
- Minimal floating UI showing waveform
Phase 2: AI Cleanup
- Claude Haiku cleanup via Convex action
- Basic filler word removal
- Grammar correction
- Punctuation
Phase 3: Personalization
- Personal dictionary (sync via Convex)
- Snippets/voice shortcuts
- Per-app tone settings
- Settings UI
Phase 4: Polish
- Onboarding flow
- WorkOS AuthKit integration
- Usage history
- Stripe billing integration
- Auto-updates via electron-builder
API Keys Needed
- Deepgram - https://console.deepgram.com (free tier: $200 credit)
- Anthropic - https://console.anthropic.com (for Claude Haiku cleanup)
- WorkOS - https://dashboard.workos.com (auth)
- Convex - https://dashboard.convex.dev (free tier generous)
Notes
- Start with cloud-only (Deepgram) for MVP, add local Whisper later for privacy mode
- Use
nut-jsoverrobotjs- better maintained, TypeScript support - Electron's
globalShortcutfor hotkeys, not browser shortcuts - Consider
electron-storefor local preferences that don't need sync