Phase 1 Implementation Summary
This document describes the technical implementation of Phase 1 (Desktop Dictation App) as of December 2024.
Overview
Phase 1 establishes the foundation for the talk.dev desktop dictation application - a Wispr Flow competitor that provides system-wide voice-to-text with AI cleanup.
Tech Stack
| Component | Technology | Purpose |
|---|---|---|
| Build Tool | electron-vite | Fast Electron + Vite bundling |
| Desktop Framework | Electron 33+ | Cross-platform desktop app |
| UI Framework | React 19 | Renderer UI components |
| Styling | Tailwind CSS v4 | Utility-first CSS |
| Backend | Convex | Real-time database & serverless functions |
| Auth | WorkOS AuthKit | User authentication (shared with do.dev) |
| Types | TypeScript 5.7 | Type safety throughout |
Project Structure
talk-dev/
├── apps/
│ ├── talk/ # Marketing website (Next.js)
│ └── desktop/ # Electron dictation app
│ ├── src/
│ │ ├── main/ # Electron main process
│ │ ├── preload/ # Context bridge (IPC)
│ │ └── renderer/ # React UI
│ ├── electron.vite.config.ts
│ ├── electron-builder.yml
│ └── package.json
├── convex/ # Shared Convex backend
│ ├── schema.ts # Database schema
│ ├── users.ts # User CRUD
│ ├── dictionaries.ts # Custom words
│ ├── snippets.ts # Voice shortcuts
│ ├── appTones.ts # Per-app tone settings
│ ├── history.ts # Dictation history
│ └── voice.ts # Deepgram + Claude actions
├── packages/
│ ├── shared/ # Shared types & constants
│ │ └── src/
│ │ ├── types/ # TypeScript interfaces
│ │ └── constants.ts # App constants
│ └── ui/ # ShadCN UI components
└── docs/ # DocumentationConvex Schema
The backend uses Convex for real-time data sync. Schema defined in convex/schema.ts:
Tables
-
users - User profiles
authId- WorkOS authentication IDemail,name,imageUrl- Profile infoplan- "free" | "pro" | "team"settings- User preferences (language, hotkey, etc.)- Indexes:
by_auth,by_email
-
dictionaries - Custom words for transcription
userId,word,pronunciation,category- Indexes:
by_user,by_user_word
-
snippets - Voice shortcuts (trigger → expansion)
userId,trigger,expansion,description,isEnabled- Indexes:
by_user,by_user_trigger
-
appTones - Per-application tone settings
userId,appIdentifier,appName,tone,customInstructions- Indexes:
by_user,by_user_app
-
history - Dictation history
userId,rawTranscript,cleanedText,targetApp,tonedurationMs,characterCount,createdAt- Indexes:
by_user,by_user_time
Desktop App Architecture
Main Process (src/main/index.ts)
- Window management (400x600 initial size)
- Global hotkey registration (
CommandOrControl+Shift+Space) - IPC handlers for dictation state
- App lifecycle management
Preload Script (src/preload/index.ts)
Exposes secure APIs to renderer via context bridge:
window.api = {
dictation: {
getState(): Promise<boolean>,
onStart(callback): () => void,
onStop(callback): () => void,
},
hotkey: {
register(hotkey): Promise<boolean>,
getCurrent(): Promise<string>,
},
textInjection: {
inject(text): Promise<void>,
getActiveApp(): Promise<string | null>,
},
}Renderer (src/renderer/)
- React 19 with TypeScript
- Tailwind CSS v4 for styling
- Shows dictation state (idle, listening, processing, injecting, error)
- Displays current hotkey and transcript
Shared Types
Located in packages/shared/src/types/:
User Types (user.ts)
User- Full user profileUserSettings- Preferences (language, hotkey, etc.)UserPlan- "free" | "pro" | "team"DEFAULT_USER_SETTINGS- Default configuration
Dictation Types (dictation.ts)
DictionaryWord- Custom vocabulary entrySnippet- Voice shortcut mappingAppTone- Per-app tone configurationHistoryEntry- Past dictation recordUsageStats- Aggregated statisticsDictationState- "idle" | "listening" | "processing" | "injecting" | "error"DictationSession- Current session state
Constants
Defined in packages/shared/src/constants.ts:
DEFAULT_HOTKEY- "CommandOrControl+Shift+Space"SUPPORTED_LANGUAGES- Array of language codes with namesTONE_OPTIONS- casual, professional, technical, friendlyAUDIO_CONFIG- Sample rate, channels, MIME typeDEEPGRAM_CONFIG- Nova-2 model settingsCOMMON_APPS- App identifier → display name mappingPLAN_LIMITS- Usage limits per plan tier
Environment Variables
Required in .env.local:
# Convex
CONVEX_DEPLOYMENT="dev:your-project"
NEXT_PUBLIC_CONVEX_URL="https://your-project.convex.cloud"
# WorkOS AuthKit
WORKOS_CLIENT_ID="client_..."
WORKOS_API_KEY="sk_test_..."
WORKOS_COOKIE_PASSWORD="your-32-char-random-string"
NEXT_PUBLIC_WORKOS_REDIRECT_URI="http://localhost:3012/callback"
# Voice APIs (for Convex actions)
DEEPGRAM_API_KEY="..."
ANTHROPIC_API_KEY="..."Build Commands
# Development
pnpm dev # All apps
pnpm dev:desktop # Desktop app only
pnpm dev:talk # Marketing site only
pnpm convex # Convex dev server
# Production
pnpm build # Build all
pnpm convex:deploy # Deploy Convex functions
# Desktop builds
pnpm --filter desktop build:mac
pnpm --filter desktop build:win
pnpm --filter desktop build:linuxNext Steps (Not Yet Implemented)
- Audio Recording - Capture microphone input in main process
- Deepgram Integration - Real-time STT via WebSocket
- Claude Haiku Cleanup - Grammar, punctuation, tone adjustment
- Text Injection - Use nut.js to type into active app
- Settings UI - Configure hotkey, language, tone preferences
- Tray Icon - System tray for always-on access
- Local Mode - whisper.cpp for offline transcription
Related Documentation
- PRD - Full product specification
- Project Structure - Monorepo layout
- TypeScript Best Practices - Code guidelines