Phase 1 Implementation Summary

This document describes the technical implementation of Phase 1 (Desktop Dictation App) as of December 2024.

Overview

Phase 1 establishes the foundation for the talk.dev desktop dictation application - a Wispr Flow competitor that provides system-wide voice-to-text with AI cleanup.

Tech Stack

Component	Technology	Purpose
Build Tool	electron-vite	Fast Electron + Vite bundling
Desktop Framework	Electron 33+	Cross-platform desktop app
UI Framework	React 19	Renderer UI components
Styling	Tailwind CSS v4	Utility-first CSS
Backend	Convex	Real-time database & serverless functions
Auth	WorkOS AuthKit	User authentication (shared with do.dev)
Types	TypeScript 5.7	Type safety throughout

Project Structure

talk-dev/
├── apps/
│   ├── talk/                    # Marketing website (Next.js)
│   └── desktop/                 # Electron dictation app
│       ├── src/
│       │   ├── main/           # Electron main process
│       │   ├── preload/        # Context bridge (IPC)
│       │   └── renderer/       # React UI
│       ├── electron.vite.config.ts
│       ├── electron-builder.yml
│       └── package.json
├── convex/                      # Shared Convex backend
│   ├── schema.ts               # Database schema
│   ├── users.ts                # User CRUD
│   ├── dictionaries.ts         # Custom words
│   ├── snippets.ts             # Voice shortcuts
│   ├── appTones.ts             # Per-app tone settings
│   ├── history.ts              # Dictation history
│   └── voice.ts                # Deepgram + Claude actions
├── packages/
│   ├── shared/                 # Shared types & constants
│   │   └── src/
│   │       ├── types/          # TypeScript interfaces
│   │       └── constants.ts    # App constants
│   └── ui/                     # ShadCN UI components
└── docs/                       # Documentation

Convex Schema

The backend uses Convex for real-time data sync. Schema defined in convex/schema.ts:

Tables

users - User profiles
- authId - WorkOS authentication ID
- email, name, imageUrl - Profile info
- plan - "free" | "pro" | "team"
- settings - User preferences (language, hotkey, etc.)
- Indexes: by_auth, by_email
dictionaries - Custom words for transcription
- userId, word, pronunciation, category
- Indexes: by_user, by_user_word
snippets - Voice shortcuts (trigger → expansion)
- userId, trigger, expansion, description, isEnabled
- Indexes: by_user, by_user_trigger
appTones - Per-application tone settings
- userId, appIdentifier, appName, tone, customInstructions
- Indexes: by_user, by_user_app
history - Dictation history
- userId, rawTranscript, cleanedText, targetApp, tone
- durationMs, characterCount, createdAt
- Indexes: by_user, by_user_time

Desktop App Architecture

Main Process (`src/main/index.ts`)

Window management (400x600 initial size)
Global hotkey registration (CommandOrControl+Shift+Space)
IPC handlers for dictation state
App lifecycle management

Preload Script (`src/preload/index.ts`)

Exposes secure APIs to renderer via context bridge:

window.api = {
  dictation: {
    getState(): Promise<boolean>,
    onStart(callback): () => void,
    onStop(callback): () => void,
  },
  hotkey: {
    register(hotkey): Promise<boolean>,
    getCurrent(): Promise<string>,
  },
  textInjection: {
    inject(text): Promise<void>,
    getActiveApp(): Promise<string | null>,
  },
}

Renderer (`src/renderer/`)

React 19 with TypeScript
Tailwind CSS v4 for styling
Shows dictation state (idle, listening, processing, injecting, error)
Displays current hotkey and transcript

Shared Types

Located in packages/shared/src/types/:

User Types (`user.ts`)

User - Full user profile
UserSettings - Preferences (language, hotkey, etc.)
UserPlan - "free" | "pro" | "team"
DEFAULT_USER_SETTINGS - Default configuration

Dictation Types (`dictation.ts`)

DictionaryWord - Custom vocabulary entry
Snippet - Voice shortcut mapping
AppTone - Per-app tone configuration
HistoryEntry - Past dictation record
UsageStats - Aggregated statistics
DictationState - "idle" | "listening" | "processing" | "injecting" | "error"
DictationSession - Current session state

Constants

Defined in packages/shared/src/constants.ts:

DEFAULT_HOTKEY - "CommandOrControl+Shift+Space"
SUPPORTED_LANGUAGES - Array of language codes with names
TONE_OPTIONS - casual, professional, technical, friendly
AUDIO_CONFIG - Sample rate, channels, MIME type
DEEPGRAM_CONFIG - Nova-2 model settings
COMMON_APPS - App identifier → display name mapping
PLAN_LIMITS - Usage limits per plan tier

Environment Variables

Required in .env.local:

# Convex
CONVEX_DEPLOYMENT="dev:your-project"
NEXT_PUBLIC_CONVEX_URL="https://your-project.convex.cloud"

# WorkOS AuthKit
WORKOS_CLIENT_ID="client_..."
WORKOS_API_KEY="sk_test_..."
WORKOS_COOKIE_PASSWORD="your-32-char-random-string"
NEXT_PUBLIC_WORKOS_REDIRECT_URI="http://localhost:3012/callback"

# Voice APIs (for Convex actions)
DEEPGRAM_API_KEY="..."
ANTHROPIC_API_KEY="..."

Build Commands

# Development
pnpm dev              # All apps
pnpm dev:desktop      # Desktop app only
pnpm dev:talk         # Marketing site only
pnpm convex           # Convex dev server

# Production
pnpm build            # Build all
pnpm convex:deploy    # Deploy Convex functions

# Desktop builds
pnpm --filter desktop build:mac
pnpm --filter desktop build:win
pnpm --filter desktop build:linux

Next Steps (Not Yet Implemented)

Audio Recording - Capture microphone input in main process
Deepgram Integration - Real-time STT via WebSocket
Claude Haiku Cleanup - Grammar, punctuation, tone adjustment
Text Injection - Use nut.js to type into active app
Settings UI - Configure hotkey, language, tone preferences
Tray Icon - System tray for always-on access
Local Mode - whisper.cpp for offline transcription

PRD - Full product specification
Project Structure - Monorepo layout
TypeScript Best Practices - Code guidelines

On this page