Phase 1 Implementation Summary

This document describes the technical implementation of Phase 1 (Desktop Dictation App) as of December 2024.

Overview

Phase 1 establishes the foundation for the talk.dev desktop dictation application - a Wispr Flow competitor that provides system-wide voice-to-text with AI cleanup.

Tech Stack

ComponentTechnologyPurpose
Build Toolelectron-viteFast Electron + Vite bundling
Desktop FrameworkElectron 33+Cross-platform desktop app
UI FrameworkReact 19Renderer UI components
StylingTailwind CSS v4Utility-first CSS
BackendConvexReal-time database & serverless functions
AuthWorkOS AuthKitUser authentication (shared with do.dev)
TypesTypeScript 5.7Type safety throughout

Project Structure

talk-dev/
├── apps/
│   ├── talk/                    # Marketing website (Next.js)
│   └── desktop/                 # Electron dictation app
│       ├── src/
│       │   ├── main/           # Electron main process
│       │   ├── preload/        # Context bridge (IPC)
│       │   └── renderer/       # React UI
│       ├── electron.vite.config.ts
│       ├── electron-builder.yml
│       └── package.json
├── convex/                      # Shared Convex backend
│   ├── schema.ts               # Database schema
│   ├── users.ts                # User CRUD
│   ├── dictionaries.ts         # Custom words
│   ├── snippets.ts             # Voice shortcuts
│   ├── appTones.ts             # Per-app tone settings
│   ├── history.ts              # Dictation history
│   └── voice.ts                # Deepgram + Claude actions
├── packages/
│   ├── shared/                 # Shared types & constants
│   │   └── src/
│   │       ├── types/          # TypeScript interfaces
│   │       └── constants.ts    # App constants
│   └── ui/                     # ShadCN UI components
└── docs/                       # Documentation

Convex Schema

The backend uses Convex for real-time data sync. Schema defined in convex/schema.ts:

Tables

  1. users - User profiles

    • authId - WorkOS authentication ID
    • email, name, imageUrl - Profile info
    • plan - "free" | "pro" | "team"
    • settings - User preferences (language, hotkey, etc.)
    • Indexes: by_auth, by_email
  2. dictionaries - Custom words for transcription

    • userId, word, pronunciation, category
    • Indexes: by_user, by_user_word
  3. snippets - Voice shortcuts (trigger → expansion)

    • userId, trigger, expansion, description, isEnabled
    • Indexes: by_user, by_user_trigger
  4. appTones - Per-application tone settings

    • userId, appIdentifier, appName, tone, customInstructions
    • Indexes: by_user, by_user_app
  5. history - Dictation history

    • userId, rawTranscript, cleanedText, targetApp, tone
    • durationMs, characterCount, createdAt
    • Indexes: by_user, by_user_time

Desktop App Architecture

Main Process (src/main/index.ts)

  • Window management (400x600 initial size)
  • Global hotkey registration (CommandOrControl+Shift+Space)
  • IPC handlers for dictation state
  • App lifecycle management

Preload Script (src/preload/index.ts)

Exposes secure APIs to renderer via context bridge:

window.api = {
  dictation: {
    getState(): Promise<boolean>,
    onStart(callback): () => void,
    onStop(callback): () => void,
  },
  hotkey: {
    register(hotkey): Promise<boolean>,
    getCurrent(): Promise<string>,
  },
  textInjection: {
    inject(text): Promise<void>,
    getActiveApp(): Promise<string | null>,
  },
}

Renderer (src/renderer/)

  • React 19 with TypeScript
  • Tailwind CSS v4 for styling
  • Shows dictation state (idle, listening, processing, injecting, error)
  • Displays current hotkey and transcript

Shared Types

Located in packages/shared/src/types/:

User Types (user.ts)

  • User - Full user profile
  • UserSettings - Preferences (language, hotkey, etc.)
  • UserPlan - "free" | "pro" | "team"
  • DEFAULT_USER_SETTINGS - Default configuration

Dictation Types (dictation.ts)

  • DictionaryWord - Custom vocabulary entry
  • Snippet - Voice shortcut mapping
  • AppTone - Per-app tone configuration
  • HistoryEntry - Past dictation record
  • UsageStats - Aggregated statistics
  • DictationState - "idle" | "listening" | "processing" | "injecting" | "error"
  • DictationSession - Current session state

Constants

Defined in packages/shared/src/constants.ts:

  • DEFAULT_HOTKEY - "CommandOrControl+Shift+Space"
  • SUPPORTED_LANGUAGES - Array of language codes with names
  • TONE_OPTIONS - casual, professional, technical, friendly
  • AUDIO_CONFIG - Sample rate, channels, MIME type
  • DEEPGRAM_CONFIG - Nova-2 model settings
  • COMMON_APPS - App identifier → display name mapping
  • PLAN_LIMITS - Usage limits per plan tier

Environment Variables

Required in .env.local:

# Convex
CONVEX_DEPLOYMENT="dev:your-project"
NEXT_PUBLIC_CONVEX_URL="https://your-project.convex.cloud"

# WorkOS AuthKit
WORKOS_CLIENT_ID="client_..."
WORKOS_API_KEY="sk_test_..."
WORKOS_COOKIE_PASSWORD="your-32-char-random-string"
NEXT_PUBLIC_WORKOS_REDIRECT_URI="http://localhost:3012/callback"

# Voice APIs (for Convex actions)
DEEPGRAM_API_KEY="..."
ANTHROPIC_API_KEY="..."

Build Commands

# Development
pnpm dev              # All apps
pnpm dev:desktop      # Desktop app only
pnpm dev:talk         # Marketing site only
pnpm convex           # Convex dev server

# Production
pnpm build            # Build all
pnpm convex:deploy    # Deploy Convex functions

# Desktop builds
pnpm --filter desktop build:mac
pnpm --filter desktop build:win
pnpm --filter desktop build:linux

Next Steps (Not Yet Implemented)

  1. Audio Recording - Capture microphone input in main process
  2. Deepgram Integration - Real-time STT via WebSocket
  3. Claude Haiku Cleanup - Grammar, punctuation, tone adjustment
  4. Text Injection - Use nut.js to type into active app
  5. Settings UI - Configure hotkey, language, tone preferences
  6. Tray Icon - System tray for always-on access
  7. Local Mode - whisper.cpp for offline transcription

On this page