# MyVoxtral Implementation Plan > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. **Goal:** Build a macOS menu bar app that streams microphone audio to Mistral Voxtral Realtime API and outputs transcribed text to a floating window or the active cursor. **Architecture:** SwiftUI menu bar app with four layers — AudioCapture (AVAudioEngine), VoxtralWebSocketClient (URLSessionWebSocketTask), TranscriptionManager (ObservableObject orchestrator), and UI (MenuBarExtra + floating panel). No third-party dependencies. **Tech Stack:** Swift, SwiftUI, AVFoundation, WebSocket (URLSession), CGEvent, macOS 14+ --- ## File Structure ``` MyVoxtral/ ├── MyVoxtralApp.swift # App entry point, MenuBarExtra, app lifecycle ├── Models/ │ ├── TranscriptionManager.swift # State machine orchestrator, owns audio + WS client │ └── AppSettings.swift # @AppStorage wrapper, settings model ├── Audio/ │ └── AudioCapture.swift # AVAudioEngine mic tap, PCM conversion, async chunk stream ├── Network/ │ ├── VoxtralWebSocketClient.swift # WebSocket connect/send/receive, session management │ └── VoxtralMessages.swift # Codable structs for all WS JSON messages ├── Views/ │ ├── TranscriptionWindow.swift # NSPanel-based floating text window │ ├── SettingsView.swift # API key, shortcut, mode, latency │ └── MenuBarView.swift # Menu bar dropdown content ├── Utilities/ │ ├── CursorInjector.swift # CGEvent keystroke simulation │ ├── GlobalShortcut.swift # NSEvent global monitor, key combo storage │ └── TranscriptionLogger.swift # Append sessions to log file └── Info.plist # Mic + accessibility usage descriptions ``` --- ### Task 1: Xcode Project Scaffold **Files:** - Create: `MyVoxtral.xcodeproj` (via `swift package init` or Xcode project) - Create: `MyVoxtral/MyVoxtralApp.swift` - Create: `MyVoxtral/Info.plist` - [ ] **Step 1: Create the Swift package / Xcode project** Create a new macOS app project. Use SwiftUI lifecycle. ```bash mkdir -p MyVoxtral/MyVoxtral ``` Create `MyVoxtral/MyVoxtral/MyVoxtralApp.swift`: ```swift import SwiftUI @main struct MyVoxtralApp: App { var body: some Scene { MenuBarExtra("MyVoxtral", systemImage: "mic.fill") { Text("MyVoxtral") Divider() Button("Quit") { NSApplication.shared.terminate(nil) } } } } ``` - [ ] **Step 2: Create Info.plist with privacy descriptions** Create `MyVoxtral/MyVoxtral/Info.plist`: ```xml NSMicrophoneUsageDescription MyVoxtral needs microphone access to transcribe your speech in real time. LSUIElement ``` `LSUIElement` = true makes it a menu-bar-only app (no Dock icon). - [ ] **Step 3: Create the Xcode project file** ```bash cd MyVoxtral cat > Package.swift << 'SWIFT' // swift-tools-version: 5.10 import PackageDescription let package = Package( name: "MyVoxtral", platforms: [.macOS(.v14)], targets: [ .executableTarget( name: "MyVoxtral", path: "MyVoxtral" ) ] ) SWIFT ``` - [ ] **Step 4: Build and verify the empty shell runs** ```bash swift build ``` Expected: Build succeeds. Running shows a mic icon in the menu bar with "MyVoxtral" and "Quit". - [ ] **Step 5: Commit** ```bash git init git add -A git commit -m "chore: scaffold MyVoxtral macOS menu bar app" ``` --- ### Task 2: WebSocket Message Types **Files:** - Create: `MyVoxtral/MyVoxtral/Network/VoxtralMessages.swift` - [ ] **Step 1: Define all outbound message types** Create `MyVoxtral/MyVoxtral/Network/VoxtralMessages.swift`: ```swift import Foundation // MARK: - Outbound Messages (Client → Server) struct AudioAppendMessage: Encodable { let type = "input_audio.append" let audio: String // base64-encoded PCM } struct AudioFlushMessage: Encodable { let type = "input_audio.flush" } struct AudioEndMessage: Encodable { let type = "input_audio.end" } struct SessionUpdateMessage: Encodable { let type = "session.update" let session: SessionConfig } struct SessionConfig: Encodable { let audioFormat: AudioFormatConfig let targetStreamingDelayMs: Int enum CodingKeys: String, CodingKey { case audioFormat = "audio_format" case targetStreamingDelayMs = "target_streaming_delay_ms" } } struct AudioFormatConfig: Encodable { let encoding = "pcm_s16le" let sampleRate = 16000 enum CodingKeys: String, CodingKey { case encoding case sampleRate = "sample_rate" } } // MARK: - Inbound Messages (Server → Client) enum VoxtralEvent { case sessionCreated case textDelta(String) case segment(text: String, start: Double, end: Double) case language(String) case done(text: String) case error(String) case unknown(String) } struct IncomingEvent: Decodable { let type: String } struct TextDeltaEvent: Decodable { let text: String } struct LanguageEvent: Decodable { let audioLanguage: String enum CodingKeys: String, CodingKey { case audioLanguage = "audio_language" } } struct SegmentEvent: Decodable { let text: String let start: Double let end: Double } struct DoneEvent: Decodable { let text: String } struct ErrorEvent: Decodable { let error: ErrorDetail? } struct ErrorDetail: Decodable { let message: ErrorMessage? } struct ErrorMessage: Decodable { let detail: String? } // MARK: - Event Parsing func parseVoxtralEvent(from data: Data) -> VoxtralEvent { guard let envelope = try? JSONDecoder().decode(IncomingEvent.self, from: data) else { return .unknown(String(data: data, encoding: .utf8) ?? "") } switch envelope.type { case "session.created": return .sessionCreated case "transcription.text.delta": guard let e = try? JSONDecoder().decode(TextDeltaEvent.self, from: data) else { return .unknown("") } return .textDelta(e.text) case "transcription.segment": guard let e = try? JSONDecoder().decode(SegmentEvent.self, from: data) else { return .unknown("") } return .segment(text: e.text, start: e.start, end: e.end) case "transcription.language": guard let e = try? JSONDecoder().decode(LanguageEvent.self, from: data) else { return .unknown("") } return .language(e.audioLanguage) case "transcription.done": guard let e = try? JSONDecoder().decode(DoneEvent.self, from: data) else { return .unknown("") } return .done(text: e.text) case "error": if let e = try? JSONDecoder().decode(ErrorEvent.self, from: data) { return .error(e.error?.message?.detail ?? "Unknown error") } return .error("Unknown error") default: return .unknown(envelope.type) } } ``` - [ ] **Step 2: Build to verify it compiles** ```bash swift build ``` Expected: Build succeeds. - [ ] **Step 3: Commit** ```bash git add MyVoxtral/Network/VoxtralMessages.swift git commit -m "feat: add Voxtral WebSocket message types and parser" ``` --- ### Task 3: WebSocket Client **Files:** - Create: `MyVoxtral/MyVoxtral/Network/VoxtralWebSocketClient.swift` - [ ] **Step 1: Implement the WebSocket client** Create `MyVoxtral/MyVoxtral/Network/VoxtralWebSocketClient.swift`: ```swift import Foundation @MainActor final class VoxtralWebSocketClient { private var webSocketTask: URLSessionWebSocketTask? private var session: URLSession? private let encoder = JSONEncoder() var onEvent: ((VoxtralEvent) -> Void)? func connect(apiKey: String, delayMs: Int) { guard let url = URL(string: "wss://api.mistral.ai/v1/audio/transcriptions/realtime") else { return } var request = URLRequest(url: url) request.setValue("Bearer \(apiKey)", forHTTPHeaderField: "Authorization") session = URLSession(configuration: .default) webSocketTask = session?.webSocketTask(with: request) webSocketTask?.resume() // Send session config let config = SessionUpdateMessage( session: SessionConfig( audioFormat: AudioFormatConfig(), targetStreamingDelayMs: delayMs ) ) sendJSON(config) // Start receiving receiveLoop() } func sendAudio(_ pcmData: Data) { let base64 = pcmData.base64EncodedString() let msg = AudioAppendMessage(audio: base64) sendJSON(msg) } func flush() { sendJSON(AudioFlushMessage()) } func disconnect() { sendJSON(AudioEndMessage()) webSocketTask?.cancel(with: .normalClosure, reason: nil) webSocketTask = nil session?.invalidateAndCancel() session = nil } private func sendJSON(_ value: T) { guard let data = try? encoder.encode(value), let string = String(data: data, encoding: .utf8) else { return } webSocketTask?.send(.string(string)) { error in if let error { print("WebSocket send error: \(error)") } } } private func receiveLoop() { webSocketTask?.receive { [weak self] result in switch result { case .success(let message): switch message { case .string(let text): if let data = text.data(using: .utf8) { let event = parseVoxtralEvent(from: data) Task { @MainActor in self?.onEvent?(event) } } case .data(let data): let event = parseVoxtralEvent(from: data) Task { @MainActor in self?.onEvent?(event) } @unknown default: break } self?.receiveLoop() case .failure(let error): Task { @MainActor in self?.onEvent?(.error("Connection lost: \(error.localizedDescription)")) } } } } } ``` - [ ] **Step 2: Build to verify** ```bash swift build ``` Expected: Build succeeds. - [ ] **Step 3: Commit** ```bash git add MyVoxtral/Network/VoxtralWebSocketClient.swift git commit -m "feat: add Voxtral WebSocket client with connect/send/receive" ``` --- ### Task 4: Audio Capture **Files:** - Create: `MyVoxtral/MyVoxtral/Audio/AudioCapture.swift` - [ ] **Step 1: Implement AVAudioEngine mic capture** Create `MyVoxtral/MyVoxtral/Audio/AudioCapture.swift`: ```swift import AVFoundation final class AudioCapture { private let engine = AVAudioEngine() private let targetSampleRate: Double = 16000 private let chunkDurationMs: Double = 480 var onChunk: ((Data) -> Void)? private var buffer = Data() private let bytesPerChunk: Int init() { // 16kHz * 2 bytes (16-bit) * 1 channel * 0.48s = 15360 bytes bytesPerChunk = Int(targetSampleRate * 2 * chunkDurationMs / 1000) } func start() throws { let inputNode = engine.inputNode let inputFormat = inputNode.outputFormat(forBus: 0) let targetFormat = AVAudioFormat( commonFormat: .pcmFormatInt16, sampleRate: targetSampleRate, channels: 1, interleaved: true )! guard let converter = AVAudioConverter(from: inputFormat, to: targetFormat) else { throw AudioCaptureError.converterCreationFailed } let bufferSize = AVAudioFrameCount(inputFormat.sampleRate * chunkDurationMs / 1000) inputNode.installTap(onBus: 0, bufferSize: bufferSize, format: inputFormat) { [weak self] pcmBuffer, _ in guard let self else { return } let frameCount = AVAudioFrameCount( Double(pcmBuffer.frameLength) * self.targetSampleRate / inputFormat.sampleRate ) guard let convertedBuffer = AVAudioPCMBuffer( pcmFormat: targetFormat, frameCapacity: frameCount ) else { return } var error: NSError? let status = converter.convert(to: convertedBuffer, error: &error) { _, outStatus in outStatus.pointee = .haveData return pcmBuffer } guard status != .error, error == nil else { return } let byteCount = Int(convertedBuffer.frameLength) * 2 // 16-bit = 2 bytes guard let int16Ptr = convertedBuffer.int16ChannelData?[0] else { return } let data = Data(bytes: int16Ptr, count: byteCount) self.buffer.append(data) while self.buffer.count >= self.bytesPerChunk { let chunk = self.buffer.prefix(self.bytesPerChunk) self.buffer = Data(self.buffer.dropFirst(self.bytesPerChunk)) self.onChunk?(Data(chunk)) } } engine.prepare() try engine.start() } func stop() { engine.inputNode.removeTap(onBus: 0) engine.stop() // Flush remaining buffer if !buffer.isEmpty { onChunk?(buffer) buffer = Data() } } } enum AudioCaptureError: Error, LocalizedError { case converterCreationFailed var errorDescription: String? { switch self { case .converterCreationFailed: return "Failed to create audio format converter" } } } ``` - [ ] **Step 2: Build to verify** ```bash swift build ``` Expected: Build succeeds. - [ ] **Step 3: Commit** ```bash git add MyVoxtral/Audio/AudioCapture.swift git commit -m "feat: add AudioCapture with AVAudioEngine mic tap and PCM conversion" ``` --- ### Task 5: App Settings **Files:** - Create: `MyVoxtral/MyVoxtral/Models/AppSettings.swift` - [ ] **Step 1: Implement settings model** Create `MyVoxtral/MyVoxtral/Models/AppSettings.swift`: ```swift import SwiftUI import Carbon.HIToolbox enum OutputMode: String, CaseIterable { case textBox = "Text Window" case cursorInjection = "Type at Cursor" } final class AppSettings: ObservableObject { static let shared = AppSettings() @AppStorage("apiKey") var apiKey: String = "" @AppStorage("outputMode") var outputMode: OutputMode = .textBox @AppStorage("streamingDelayMs") var streamingDelayMs: Int = 480 @AppStorage("shortcutKeyCode") var shortcutKeyCode: UInt16 = 0 @AppStorage("shortcutModifiers") var shortcutModifiers: UInt = 0 var hasAPIKey: Bool { !apiKey.isEmpty } var hasShortcut: Bool { shortcutKeyCode != 0 || shortcutModifiers != 0 } var shortcutDisplayString: String { guard hasShortcut else { return "Not Set" } var parts: [String] = [] let mods = NSEvent.ModifierFlags(rawValue: shortcutModifiers) if mods.contains(.control) { parts.append("^") } if mods.contains(.option) { parts.append("\u{2325}") } if mods.contains(.shift) { parts.append("\u{21E7}") } if mods.contains(.command) { parts.append("\u{2318}") } // Map key code to character if let scalar = Unicode.Scalar(shortcutKeyCode) { parts.append(String(Character(scalar)).uppercased()) } return parts.joined() } } extension OutputMode: RawRepresentable where RawValue == String {} ``` - [ ] **Step 2: Build to verify** ```bash swift build ``` Expected: Build succeeds. - [ ] **Step 3: Commit** ```bash git add MyVoxtral/Models/AppSettings.swift git commit -m "feat: add AppSettings with API key, shortcut, output mode, delay" ``` --- ### Task 6: Transcription Logger **Files:** - Create: `MyVoxtral/MyVoxtral/Utilities/TranscriptionLogger.swift` - [ ] **Step 1: Implement the log file writer** Create `MyVoxtral/MyVoxtral/Utilities/TranscriptionLogger.swift`: ```swift import Foundation struct TranscriptionLogger { private static var logFileURL: URL { let appSupport = FileManager.default.urls(for: .applicationSupportDirectory, in: .userDomainMask).first! let dir = appSupport.appendingPathComponent("MyVoxtral", isDirectory: true) try? FileManager.default.createDirectory(at: dir, withIntermediateDirectories: true) return dir.appendingPathComponent("transcription.log") } static func append(text: String) { let timestamp = ISO8601DateFormatter().string(from: Date()) let entry = "[\(timestamp)]\n\(text)\n---\n\n" guard let data = entry.data(using: .utf8) else { return } if FileManager.default.fileExists(atPath: logFileURL.path) { if let handle = try? FileHandle(forWritingTo: logFileURL) { handle.seekToEndOfFile() handle.write(data) handle.closeFile() } } else { try? data.write(to: logFileURL) } } } ``` - [ ] **Step 2: Build to verify** ```bash swift build ``` Expected: Build succeeds. - [ ] **Step 3: Commit** ```bash git add MyVoxtral/Utilities/TranscriptionLogger.swift git commit -m "feat: add TranscriptionLogger for append-only session log" ``` --- ### Task 7: Cursor Injector **Files:** - Create: `MyVoxtral/MyVoxtral/Utilities/CursorInjector.swift` - [ ] **Step 1: Implement CGEvent keystroke simulation** Create `MyVoxtral/MyVoxtral/Utilities/CursorInjector.swift`: ```swift import ApplicationServices struct CursorInjector { static var isAccessibilityGranted: Bool { AXIsProcessTrusted() } static func promptAccessibilityPermission() { let options = [kAXTrustedCheckOptionPrompt.takeUnretainedValue() as String: true] as CFDictionary AXIsProcessTrustedWithOptions(options) } static func typeText(_ text: String) { guard isAccessibilityGranted else { return } let source = CGEventSource(stateID: .hidSystemState) for character in text { let string = String(character) let event = CGEvent(keyboardEventSource: source, virtualKey: 0, keyDown: true) event?.keyboardSetUnicodeString(stringLength: string.utf16.count, unicodeString: Array(string.utf16)) event?.post(tap: .cghidEventTap) let eventUp = CGEvent(keyboardEventSource: source, virtualKey: 0, keyDown: false) eventUp?.keyboardSetUnicodeString(stringLength: string.utf16.count, unicodeString: Array(string.utf16)) eventUp?.post(tap: .cghidEventTap) } } } ``` - [ ] **Step 2: Build to verify** ```bash swift build ``` Expected: Build succeeds. - [ ] **Step 3: Commit** ```bash git add MyVoxtral/Utilities/CursorInjector.swift git commit -m "feat: add CursorInjector with CGEvent keystroke simulation" ``` --- ### Task 8: Global Shortcut **Files:** - Create: `MyVoxtral/MyVoxtral/Utilities/GlobalShortcut.swift` - [ ] **Step 1: Implement global keyboard shortcut monitor** Create `MyVoxtral/MyVoxtral/Utilities/GlobalShortcut.swift`: ```swift import Cocoa final class GlobalShortcut { private var monitor: Any? var onTrigger: (() -> Void)? func register(keyCode: UInt16, modifiers: UInt) { unregister() guard keyCode != 0 || modifiers != 0 else { return } let requiredFlags = NSEvent.ModifierFlags(rawValue: modifiers) monitor = NSEvent.addGlobalMonitorForEvents(matching: .keyDown) { [weak self] event in let mask: NSEvent.ModifierFlags = [.command, .option, .control, .shift] if event.keyCode == keyCode && event.modifierFlags.intersection(mask) == requiredFlags { self?.onTrigger?() } } } func unregister() { if let monitor { NSEvent.removeMonitor(monitor) } monitor = nil } deinit { unregister() } } ``` - [ ] **Step 2: Build to verify** ```bash swift build ``` Expected: Build succeeds. - [ ] **Step 3: Commit** ```bash git add MyVoxtral/Utilities/GlobalShortcut.swift git commit -m "feat: add configurable global keyboard shortcut" ``` --- ### Task 9: Transcription Manager **Files:** - Create: `MyVoxtral/MyVoxtral/Models/TranscriptionManager.swift` - [ ] **Step 1: Implement the orchestrator** Create `MyVoxtral/MyVoxtral/Models/TranscriptionManager.swift`: ```swift import SwiftUI enum RecordingState: Equatable { case idle case recording case error(String) } @MainActor final class TranscriptionManager: ObservableObject { @Published var state: RecordingState = .idle @Published var currentText: String = "" private let audioCapture = AudioCapture() private let wsClient = VoxtralWebSocketClient() private let settings = AppSettings.shared private var hasRetried = false var isRecording: Bool { state == .recording } func toggle() { if isRecording { stop() } else { start() } } func start() { guard settings.hasAPIKey else { state = .error("No API key set. Open Settings.") return } currentText = "" hasRetried = false wsClient.onEvent = { [weak self] event in self?.handleEvent(event) } wsClient.connect(apiKey: settings.apiKey, delayMs: settings.streamingDelayMs) audioCapture.onChunk = { [weak self] chunk in Task { @MainActor in self?.wsClient.sendAudio(chunk) } } do { try audioCapture.start() state = .recording } catch { state = .error("Mic error: \(error.localizedDescription)") } } func stop() { audioCapture.stop() wsClient.flush() wsClient.disconnect() state = .idle if !currentText.isEmpty { TranscriptionLogger.append(text: currentText) } } private func handleEvent(_ event: VoxtralEvent) { switch event { case .sessionCreated: break case .textDelta(let text): currentText += text if settings.outputMode == .cursorInjection { CursorInjector.typeText(text) } case .segment: break // text deltas already handle text accumulation case .language: break case .done(let text): if currentText.isEmpty { currentText = text } case .error(let message): if !hasRetried && state == .recording { hasRetried = true wsClient.disconnect() wsClient.connect(apiKey: settings.apiKey, delayMs: settings.streamingDelayMs) } else { state = .error(message) audioCapture.stop() } case .unknown: break } } } ``` - [ ] **Step 2: Build to verify** ```bash swift build ``` Expected: Build succeeds. - [ ] **Step 3: Commit** ```bash git add MyVoxtral/Models/TranscriptionManager.swift git commit -m "feat: add TranscriptionManager orchestrating audio, WS, and output" ``` --- ### Task 10: Floating Transcription Window **Files:** - Create: `MyVoxtral/MyVoxtral/Views/TranscriptionWindow.swift` - [ ] **Step 1: Implement NSPanel-based floating window** Create `MyVoxtral/MyVoxtral/Views/TranscriptionWindow.swift`: ```swift import SwiftUI import AppKit struct TranscriptionContentView: View { @ObservedObject var manager: TranscriptionManager var body: some View { VStack(alignment: .leading, spacing: 8) { HStack { Circle() .fill(manager.isRecording ? .red : .gray) .frame(width: 8, height: 8) Text(manager.isRecording ? "Recording..." : "Idle") .font(.caption) .foregroundStyle(.secondary) Spacer() Button { NSPasteboard.general.clearContents() NSPasteboard.general.setString(manager.currentText, forType: .string) } label: { Image(systemName: "doc.on.doc") } .buttonStyle(.borderless) .disabled(manager.currentText.isEmpty) } ScrollViewReader { proxy in ScrollView { Text(manager.currentText.isEmpty ? "Transcription will appear here..." : manager.currentText) .frame(maxWidth: .infinity, alignment: .leading) .foregroundStyle(manager.currentText.isEmpty ? .secondary : .primary) .textSelection(.enabled) .id("bottom") } .onChange(of: manager.currentText) { proxy.scrollTo("bottom", anchor: .bottom) } } } .padding() .frame(width: 320, height: 200) } } final class TranscriptionPanel { private var panel: NSPanel? private let manager: TranscriptionManager init(manager: TranscriptionManager) { self.manager = manager } func show() { if panel == nil { let panel = NSPanel( contentRect: NSRect(x: 0, y: 0, width: 320, height: 200), styleMask: [.titled, .closable, .resizable, .nonactivatingPanel, .utilityWindow], backing: .buffered, defer: false ) panel.title = "MyVoxtral" panel.isFloatingPanel = true panel.level = .floating panel.contentView = NSHostingView(rootView: TranscriptionContentView(manager: manager)) panel.center() self.panel = panel } panel?.orderFront(nil) } func hide() { panel?.orderOut(nil) } var isVisible: Bool { panel?.isVisible ?? false } } ``` - [ ] **Step 2: Build to verify** ```bash swift build ``` Expected: Build succeeds. - [ ] **Step 3: Commit** ```bash git add MyVoxtral/Views/TranscriptionWindow.swift git commit -m "feat: add floating transcription panel with auto-scroll and copy" ``` --- ### Task 11: Settings View **Files:** - Create: `MyVoxtral/MyVoxtral/Views/SettingsView.swift` - [ ] **Step 1: Implement settings window** Create `MyVoxtral/MyVoxtral/Views/SettingsView.swift`: ```swift import SwiftUI struct SettingsView: View { @ObservedObject var settings = AppSettings.shared @State private var isRecordingShortcut = false var body: some View { Form { Section("API") { SecureField("Mistral API Key", text: $settings.apiKey) } Section("Output") { Picker("Mode", selection: $settings.outputMode) { ForEach(OutputMode.allCases, id: \.self) { mode in Text(mode.rawValue).tag(mode) } } .pickerStyle(.segmented) if settings.outputMode == .cursorInjection && !CursorInjector.isAccessibilityGranted { HStack { Image(systemName: "exclamationmark.triangle.fill") .foregroundStyle(.yellow) Text("Accessibility permission required") .font(.caption) Button("Grant") { CursorInjector.promptAccessibilityPermission() } .font(.caption) } } } Section("Shortcut") { HStack { Text("Toggle Recording:") Spacer() Button(isRecordingShortcut ? "Press keys..." : settings.shortcutDisplayString) { isRecordingShortcut = true } .onKeyPress { press in guard isRecordingShortcut else { return .ignored } settings.shortcutKeyCode = press.key.character.flatMap { UInt16($0.asciiValue ?? 0) } ?? 0 settings.shortcutModifiers = press.modifiers.rawValue isRecordingShortcut = false return .handled } } } Section("Latency") { VStack(alignment: .leading) { Text("Streaming delay: \(settings.streamingDelayMs)ms") .font(.caption) Slider( value: Binding( get: { Double(settings.streamingDelayMs) }, set: { settings.streamingDelayMs = Int($0) } ), in: 240...2400, step: 120 ) HStack { Text("Fast").font(.caption2).foregroundStyle(.secondary) Spacer() Text("Accurate").font(.caption2).foregroundStyle(.secondary) } } } } .formStyle(.grouped) .frame(width: 360, height: 340) } } ``` - [ ] **Step 2: Build to verify** ```bash swift build ``` Expected: Build succeeds. - [ ] **Step 3: Commit** ```bash git add MyVoxtral/Views/SettingsView.swift git commit -m "feat: add SettingsView with API key, shortcut, mode, and latency" ``` --- ### Task 12: Menu Bar View **Files:** - Create: `MyVoxtral/MyVoxtral/Views/MenuBarView.swift` - [ ] **Step 1: Implement the menu bar dropdown** Create `MyVoxtral/MyVoxtral/Views/MenuBarView.swift`: ```swift import SwiftUI struct MenuBarView: View { @ObservedObject var manager: TranscriptionManager @ObservedObject var settings = AppSettings.shared let onShowTranscription: () -> Void let onShowSettings: () -> Void var body: some View { VStack(spacing: 4) { Button(manager.isRecording ? "Stop Recording" : "Start Recording") { manager.toggle() } .keyboardShortcut("r") if case .error(let msg) = manager.state { Text(msg) .font(.caption) .foregroundStyle(.red) .lineLimit(2) .padding(.horizontal, 8) } Divider() Button("Show Transcription") { onShowTranscription() } Button("Settings...") { onShowSettings() } Divider() Button("Quit") { NSApplication.shared.terminate(nil) } .keyboardShortcut("q") } .padding(.vertical, 4) } } ``` - [ ] **Step 2: Build to verify** ```bash swift build ``` Expected: Build succeeds. - [ ] **Step 3: Commit** ```bash git add MyVoxtral/Views/MenuBarView.swift git commit -m "feat: add MenuBarView with start/stop, settings, and error display" ``` --- ### Task 13: Wire Everything Together in App Entry **Files:** - Modify: `MyVoxtral/MyVoxtral/MyVoxtralApp.swift` - [ ] **Step 1: Update the app entry point to integrate all components** Replace the contents of `MyVoxtral/MyVoxtral/MyVoxtralApp.swift` with: ```swift import SwiftUI @main struct MyVoxtralApp: App { @StateObject private var manager = TranscriptionManager() @StateObject private var settings = AppSettings.shared @State private var transcriptionPanel: TranscriptionPanel? @State private var settingsWindow: NSWindow? private let globalShortcut = GlobalShortcut() var body: some Scene { MenuBarExtra { MenuBarView( manager: manager, onShowTranscription: { showTranscriptionWindow() }, onShowSettings: { showSettingsWindow() } ) } label: { Image(systemName: manager.isRecording ? "mic.fill" : "mic") .symbolRenderingMode(.palette) .foregroundStyle(manager.isRecording ? .red : .primary) } .onChange(of: settings.shortcutKeyCode) { registerShortcut() } .onChange(of: settings.shortcutModifiers) { registerShortcut() } .onAppear { if !settings.hasAPIKey { showSettingsWindow() } registerShortcut() } .onChange(of: manager.isRecording) { if manager.isRecording && settings.outputMode == .textBox { showTranscriptionWindow() } } } private func registerShortcut() { globalShortcut.register( keyCode: settings.shortcutKeyCode, modifiers: settings.shortcutModifiers ) globalShortcut.onTrigger = { [weak manager] in Task { @MainActor in manager?.toggle() } } } private func showTranscriptionWindow() { if transcriptionPanel == nil { transcriptionPanel = TranscriptionPanel(manager: manager) } transcriptionPanel?.show() } private func showSettingsWindow() { if let settingsWindow, settingsWindow.isVisible { settingsWindow.makeKeyAndOrderFront(nil) return } let window = NSWindow( contentRect: NSRect(x: 0, y: 0, width: 360, height: 340), styleMask: [.titled, .closable], backing: .buffered, defer: false ) window.title = "MyVoxtral Settings" window.contentView = NSHostingView(rootView: SettingsView()) window.center() window.makeKeyAndOrderFront(nil) NSApp.activate(ignoringOtherApps: true) self.settingsWindow = window } } ``` - [ ] **Step 2: Build the full app** ```bash swift build ``` Expected: Build succeeds with all components linked. - [ ] **Step 3: Run and verify basic functionality** ```bash swift run ``` Expected: Menu bar icon appears. Clicking shows the dropdown with Start/Stop, Settings, Quit. Settings window opens if no API key. - [ ] **Step 4: Commit** ```bash git add MyVoxtral/MyVoxtralApp.swift git commit -m "feat: wire all components into app entry with menu bar, shortcut, and windows" ``` --- ### Task 14: End-to-End Smoke Test - [ ] **Step 1: Verify the full recording flow** ```bash swift run ``` Manual test checklist: 1. App launches as menu bar icon (no Dock icon) 2. Click icon → dropdown shows Start Recording, Settings, Quit 3. Open Settings → enter a valid Mistral API key 4. Set output mode to "Text Window" 5. Click "Start Recording" → mic icon turns red, transcription window opens 6. Speak into microphone → text appears in the floating window 7. Click "Stop Recording" → icon returns to normal 8. Verify `~/Library/Application Support/MyVoxtral/transcription.log` contains the session 9. Switch to "Type at Cursor" mode → grant Accessibility permission if prompted 10. Open TextEdit, click Start Recording, speak → text appears at cursor in TextEdit 11. Configure a keyboard shortcut in Settings → verify it toggles recording from any app - [ ] **Step 2: Fix any issues found during smoke testing** Address any build errors, runtime crashes, or connection issues. - [ ] **Step 3: Final commit** ```bash git add -A git commit -m "test: verify end-to-end transcription flow" ```