# MyVoxtral Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Build a macOS menu bar app that streams microphone audio to Mistral Voxtral Realtime API and outputs transcribed text to a floating window or the active cursor.
**Architecture:** SwiftUI menu bar app with four layers — AudioCapture (AVAudioEngine), VoxtralWebSocketClient (URLSessionWebSocketTask), TranscriptionManager (ObservableObject orchestrator), and UI (MenuBarExtra + floating panel). No third-party dependencies.
**Tech Stack:** Swift, SwiftUI, AVFoundation, WebSocket (URLSession), CGEvent, macOS 14+
---
## File Structure
```
MyVoxtral/
├── MyVoxtralApp.swift # App entry point, MenuBarExtra, app lifecycle
├── Models/
│ ├── TranscriptionManager.swift # State machine orchestrator, owns audio + WS client
│ └── AppSettings.swift # @AppStorage wrapper, settings model
├── Audio/
│ └── AudioCapture.swift # AVAudioEngine mic tap, PCM conversion, async chunk stream
├── Network/
│ ├── VoxtralWebSocketClient.swift # WebSocket connect/send/receive, session management
│ └── VoxtralMessages.swift # Codable structs for all WS JSON messages
├── Views/
│ ├── TranscriptionWindow.swift # NSPanel-based floating text window
│ ├── SettingsView.swift # API key, shortcut, mode, latency
│ └── MenuBarView.swift # Menu bar dropdown content
├── Utilities/
│ ├── CursorInjector.swift # CGEvent keystroke simulation
│ ├── GlobalShortcut.swift # NSEvent global monitor, key combo storage
│ └── TranscriptionLogger.swift # Append sessions to log file
└── Info.plist # Mic + accessibility usage descriptions
```
---
### Task 1: Xcode Project Scaffold
**Files:**
- Create: `MyVoxtral.xcodeproj` (via `swift package init` or Xcode project)
- Create: `MyVoxtral/MyVoxtralApp.swift`
- Create: `MyVoxtral/Info.plist`
- [ ] **Step 1: Create the Swift package / Xcode project**
Create a new macOS app project. Use SwiftUI lifecycle.
```bash
mkdir -p MyVoxtral/MyVoxtral
```
Create `MyVoxtral/MyVoxtral/MyVoxtralApp.swift`:
```swift
import SwiftUI
@main
struct MyVoxtralApp: App {
var body: some Scene {
MenuBarExtra("MyVoxtral", systemImage: "mic.fill") {
Text("MyVoxtral")
Divider()
Button("Quit") {
NSApplication.shared.terminate(nil)
}
}
}
}
```
- [ ] **Step 2: Create Info.plist with privacy descriptions**
Create `MyVoxtral/MyVoxtral/Info.plist`:
```xml
NSMicrophoneUsageDescription
MyVoxtral needs microphone access to transcribe your speech in real time.
LSUIElement
```
`LSUIElement` = true makes it a menu-bar-only app (no Dock icon).
- [ ] **Step 3: Create the Xcode project file**
```bash
cd MyVoxtral
cat > Package.swift << 'SWIFT'
// swift-tools-version: 5.10
import PackageDescription
let package = Package(
name: "MyVoxtral",
platforms: [.macOS(.v14)],
targets: [
.executableTarget(
name: "MyVoxtral",
path: "MyVoxtral"
)
]
)
SWIFT
```
- [ ] **Step 4: Build and verify the empty shell runs**
```bash
swift build
```
Expected: Build succeeds. Running shows a mic icon in the menu bar with "MyVoxtral" and "Quit".
- [ ] **Step 5: Commit**
```bash
git init
git add -A
git commit -m "chore: scaffold MyVoxtral macOS menu bar app"
```
---
### Task 2: WebSocket Message Types
**Files:**
- Create: `MyVoxtral/MyVoxtral/Network/VoxtralMessages.swift`
- [ ] **Step 1: Define all outbound message types**
Create `MyVoxtral/MyVoxtral/Network/VoxtralMessages.swift`:
```swift
import Foundation
// MARK: - Outbound Messages (Client → Server)
struct AudioAppendMessage: Encodable {
let type = "input_audio.append"
let audio: String // base64-encoded PCM
}
struct AudioFlushMessage: Encodable {
let type = "input_audio.flush"
}
struct AudioEndMessage: Encodable {
let type = "input_audio.end"
}
struct SessionUpdateMessage: Encodable {
let type = "session.update"
let session: SessionConfig
}
struct SessionConfig: Encodable {
let audioFormat: AudioFormatConfig
let targetStreamingDelayMs: Int
enum CodingKeys: String, CodingKey {
case audioFormat = "audio_format"
case targetStreamingDelayMs = "target_streaming_delay_ms"
}
}
struct AudioFormatConfig: Encodable {
let encoding = "pcm_s16le"
let sampleRate = 16000
enum CodingKeys: String, CodingKey {
case encoding
case sampleRate = "sample_rate"
}
}
// MARK: - Inbound Messages (Server → Client)
enum VoxtralEvent {
case sessionCreated
case textDelta(String)
case segment(text: String, start: Double, end: Double)
case language(String)
case done(text: String)
case error(String)
case unknown(String)
}
struct IncomingEvent: Decodable {
let type: String
}
struct TextDeltaEvent: Decodable {
let text: String
}
struct LanguageEvent: Decodable {
let audioLanguage: String
enum CodingKeys: String, CodingKey {
case audioLanguage = "audio_language"
}
}
struct SegmentEvent: Decodable {
let text: String
let start: Double
let end: Double
}
struct DoneEvent: Decodable {
let text: String
}
struct ErrorEvent: Decodable {
let error: ErrorDetail?
}
struct ErrorDetail: Decodable {
let message: ErrorMessage?
}
struct ErrorMessage: Decodable {
let detail: String?
}
// MARK: - Event Parsing
func parseVoxtralEvent(from data: Data) -> VoxtralEvent {
guard let envelope = try? JSONDecoder().decode(IncomingEvent.self, from: data) else {
return .unknown(String(data: data, encoding: .utf8) ?? "")
}
switch envelope.type {
case "session.created":
return .sessionCreated
case "transcription.text.delta":
guard let e = try? JSONDecoder().decode(TextDeltaEvent.self, from: data) else { return .unknown("") }
return .textDelta(e.text)
case "transcription.segment":
guard let e = try? JSONDecoder().decode(SegmentEvent.self, from: data) else { return .unknown("") }
return .segment(text: e.text, start: e.start, end: e.end)
case "transcription.language":
guard let e = try? JSONDecoder().decode(LanguageEvent.self, from: data) else { return .unknown("") }
return .language(e.audioLanguage)
case "transcription.done":
guard let e = try? JSONDecoder().decode(DoneEvent.self, from: data) else { return .unknown("") }
return .done(text: e.text)
case "error":
if let e = try? JSONDecoder().decode(ErrorEvent.self, from: data) {
return .error(e.error?.message?.detail ?? "Unknown error")
}
return .error("Unknown error")
default:
return .unknown(envelope.type)
}
}
```
- [ ] **Step 2: Build to verify it compiles**
```bash
swift build
```
Expected: Build succeeds.
- [ ] **Step 3: Commit**
```bash
git add MyVoxtral/Network/VoxtralMessages.swift
git commit -m "feat: add Voxtral WebSocket message types and parser"
```
---
### Task 3: WebSocket Client
**Files:**
- Create: `MyVoxtral/MyVoxtral/Network/VoxtralWebSocketClient.swift`
- [ ] **Step 1: Implement the WebSocket client**
Create `MyVoxtral/MyVoxtral/Network/VoxtralWebSocketClient.swift`:
```swift
import Foundation
@MainActor
final class VoxtralWebSocketClient {
private var webSocketTask: URLSessionWebSocketTask?
private var session: URLSession?
private let encoder = JSONEncoder()
var onEvent: ((VoxtralEvent) -> Void)?
func connect(apiKey: String, delayMs: Int) {
guard let url = URL(string: "wss://api.mistral.ai/v1/audio/transcriptions/realtime") else { return }
var request = URLRequest(url: url)
request.setValue("Bearer \(apiKey)", forHTTPHeaderField: "Authorization")
session = URLSession(configuration: .default)
webSocketTask = session?.webSocketTask(with: request)
webSocketTask?.resume()
// Send session config
let config = SessionUpdateMessage(
session: SessionConfig(
audioFormat: AudioFormatConfig(),
targetStreamingDelayMs: delayMs
)
)
sendJSON(config)
// Start receiving
receiveLoop()
}
func sendAudio(_ pcmData: Data) {
let base64 = pcmData.base64EncodedString()
let msg = AudioAppendMessage(audio: base64)
sendJSON(msg)
}
func flush() {
sendJSON(AudioFlushMessage())
}
func disconnect() {
sendJSON(AudioEndMessage())
webSocketTask?.cancel(with: .normalClosure, reason: nil)
webSocketTask = nil
session?.invalidateAndCancel()
session = nil
}
private func sendJSON(_ value: T) {
guard let data = try? encoder.encode(value),
let string = String(data: data, encoding: .utf8) else { return }
webSocketTask?.send(.string(string)) { error in
if let error {
print("WebSocket send error: \(error)")
}
}
}
private func receiveLoop() {
webSocketTask?.receive { [weak self] result in
switch result {
case .success(let message):
switch message {
case .string(let text):
if let data = text.data(using: .utf8) {
let event = parseVoxtralEvent(from: data)
Task { @MainActor in
self?.onEvent?(event)
}
}
case .data(let data):
let event = parseVoxtralEvent(from: data)
Task { @MainActor in
self?.onEvent?(event)
}
@unknown default:
break
}
self?.receiveLoop()
case .failure(let error):
Task { @MainActor in
self?.onEvent?(.error("Connection lost: \(error.localizedDescription)"))
}
}
}
}
}
```
- [ ] **Step 2: Build to verify**
```bash
swift build
```
Expected: Build succeeds.
- [ ] **Step 3: Commit**
```bash
git add MyVoxtral/Network/VoxtralWebSocketClient.swift
git commit -m "feat: add Voxtral WebSocket client with connect/send/receive"
```
---
### Task 4: Audio Capture
**Files:**
- Create: `MyVoxtral/MyVoxtral/Audio/AudioCapture.swift`
- [ ] **Step 1: Implement AVAudioEngine mic capture**
Create `MyVoxtral/MyVoxtral/Audio/AudioCapture.swift`:
```swift
import AVFoundation
final class AudioCapture {
private let engine = AVAudioEngine()
private let targetSampleRate: Double = 16000
private let chunkDurationMs: Double = 480
var onChunk: ((Data) -> Void)?
private var buffer = Data()
private let bytesPerChunk: Int
init() {
// 16kHz * 2 bytes (16-bit) * 1 channel * 0.48s = 15360 bytes
bytesPerChunk = Int(targetSampleRate * 2 * chunkDurationMs / 1000)
}
func start() throws {
let inputNode = engine.inputNode
let inputFormat = inputNode.outputFormat(forBus: 0)
let targetFormat = AVAudioFormat(
commonFormat: .pcmFormatInt16,
sampleRate: targetSampleRate,
channels: 1,
interleaved: true
)!
guard let converter = AVAudioConverter(from: inputFormat, to: targetFormat) else {
throw AudioCaptureError.converterCreationFailed
}
let bufferSize = AVAudioFrameCount(inputFormat.sampleRate * chunkDurationMs / 1000)
inputNode.installTap(onBus: 0, bufferSize: bufferSize, format: inputFormat) { [weak self] pcmBuffer, _ in
guard let self else { return }
let frameCount = AVAudioFrameCount(
Double(pcmBuffer.frameLength) * self.targetSampleRate / inputFormat.sampleRate
)
guard let convertedBuffer = AVAudioPCMBuffer(
pcmFormat: targetFormat,
frameCapacity: frameCount
) else { return }
var error: NSError?
let status = converter.convert(to: convertedBuffer, error: &error) { _, outStatus in
outStatus.pointee = .haveData
return pcmBuffer
}
guard status != .error, error == nil else { return }
let byteCount = Int(convertedBuffer.frameLength) * 2 // 16-bit = 2 bytes
guard let int16Ptr = convertedBuffer.int16ChannelData?[0] else { return }
let data = Data(bytes: int16Ptr, count: byteCount)
self.buffer.append(data)
while self.buffer.count >= self.bytesPerChunk {
let chunk = self.buffer.prefix(self.bytesPerChunk)
self.buffer = Data(self.buffer.dropFirst(self.bytesPerChunk))
self.onChunk?(Data(chunk))
}
}
engine.prepare()
try engine.start()
}
func stop() {
engine.inputNode.removeTap(onBus: 0)
engine.stop()
// Flush remaining buffer
if !buffer.isEmpty {
onChunk?(buffer)
buffer = Data()
}
}
}
enum AudioCaptureError: Error, LocalizedError {
case converterCreationFailed
var errorDescription: String? {
switch self {
case .converterCreationFailed:
return "Failed to create audio format converter"
}
}
}
```
- [ ] **Step 2: Build to verify**
```bash
swift build
```
Expected: Build succeeds.
- [ ] **Step 3: Commit**
```bash
git add MyVoxtral/Audio/AudioCapture.swift
git commit -m "feat: add AudioCapture with AVAudioEngine mic tap and PCM conversion"
```
---
### Task 5: App Settings
**Files:**
- Create: `MyVoxtral/MyVoxtral/Models/AppSettings.swift`
- [ ] **Step 1: Implement settings model**
Create `MyVoxtral/MyVoxtral/Models/AppSettings.swift`:
```swift
import SwiftUI
import Carbon.HIToolbox
enum OutputMode: String, CaseIterable {
case textBox = "Text Window"
case cursorInjection = "Type at Cursor"
}
final class AppSettings: ObservableObject {
static let shared = AppSettings()
@AppStorage("apiKey") var apiKey: String = ""
@AppStorage("outputMode") var outputMode: OutputMode = .textBox
@AppStorage("streamingDelayMs") var streamingDelayMs: Int = 480
@AppStorage("shortcutKeyCode") var shortcutKeyCode: UInt16 = 0
@AppStorage("shortcutModifiers") var shortcutModifiers: UInt = 0
var hasAPIKey: Bool { !apiKey.isEmpty }
var hasShortcut: Bool { shortcutKeyCode != 0 || shortcutModifiers != 0 }
var shortcutDisplayString: String {
guard hasShortcut else { return "Not Set" }
var parts: [String] = []
let mods = NSEvent.ModifierFlags(rawValue: shortcutModifiers)
if mods.contains(.control) { parts.append("^") }
if mods.contains(.option) { parts.append("\u{2325}") }
if mods.contains(.shift) { parts.append("\u{21E7}") }
if mods.contains(.command) { parts.append("\u{2318}") }
// Map key code to character
if let scalar = Unicode.Scalar(shortcutKeyCode) {
parts.append(String(Character(scalar)).uppercased())
}
return parts.joined()
}
}
extension OutputMode: RawRepresentable where RawValue == String {}
```
- [ ] **Step 2: Build to verify**
```bash
swift build
```
Expected: Build succeeds.
- [ ] **Step 3: Commit**
```bash
git add MyVoxtral/Models/AppSettings.swift
git commit -m "feat: add AppSettings with API key, shortcut, output mode, delay"
```
---
### Task 6: Transcription Logger
**Files:**
- Create: `MyVoxtral/MyVoxtral/Utilities/TranscriptionLogger.swift`
- [ ] **Step 1: Implement the log file writer**
Create `MyVoxtral/MyVoxtral/Utilities/TranscriptionLogger.swift`:
```swift
import Foundation
struct TranscriptionLogger {
private static var logFileURL: URL {
let appSupport = FileManager.default.urls(for: .applicationSupportDirectory, in: .userDomainMask).first!
let dir = appSupport.appendingPathComponent("MyVoxtral", isDirectory: true)
try? FileManager.default.createDirectory(at: dir, withIntermediateDirectories: true)
return dir.appendingPathComponent("transcription.log")
}
static func append(text: String) {
let timestamp = ISO8601DateFormatter().string(from: Date())
let entry = "[\(timestamp)]\n\(text)\n---\n\n"
guard let data = entry.data(using: .utf8) else { return }
if FileManager.default.fileExists(atPath: logFileURL.path) {
if let handle = try? FileHandle(forWritingTo: logFileURL) {
handle.seekToEndOfFile()
handle.write(data)
handle.closeFile()
}
} else {
try? data.write(to: logFileURL)
}
}
}
```
- [ ] **Step 2: Build to verify**
```bash
swift build
```
Expected: Build succeeds.
- [ ] **Step 3: Commit**
```bash
git add MyVoxtral/Utilities/TranscriptionLogger.swift
git commit -m "feat: add TranscriptionLogger for append-only session log"
```
---
### Task 7: Cursor Injector
**Files:**
- Create: `MyVoxtral/MyVoxtral/Utilities/CursorInjector.swift`
- [ ] **Step 1: Implement CGEvent keystroke simulation**
Create `MyVoxtral/MyVoxtral/Utilities/CursorInjector.swift`:
```swift
import ApplicationServices
struct CursorInjector {
static var isAccessibilityGranted: Bool {
AXIsProcessTrusted()
}
static func promptAccessibilityPermission() {
let options = [kAXTrustedCheckOptionPrompt.takeUnretainedValue() as String: true] as CFDictionary
AXIsProcessTrustedWithOptions(options)
}
static func typeText(_ text: String) {
guard isAccessibilityGranted else { return }
let source = CGEventSource(stateID: .hidSystemState)
for character in text {
let string = String(character)
let event = CGEvent(keyboardEventSource: source, virtualKey: 0, keyDown: true)
event?.keyboardSetUnicodeString(stringLength: string.utf16.count, unicodeString: Array(string.utf16))
event?.post(tap: .cghidEventTap)
let eventUp = CGEvent(keyboardEventSource: source, virtualKey: 0, keyDown: false)
eventUp?.keyboardSetUnicodeString(stringLength: string.utf16.count, unicodeString: Array(string.utf16))
eventUp?.post(tap: .cghidEventTap)
}
}
}
```
- [ ] **Step 2: Build to verify**
```bash
swift build
```
Expected: Build succeeds.
- [ ] **Step 3: Commit**
```bash
git add MyVoxtral/Utilities/CursorInjector.swift
git commit -m "feat: add CursorInjector with CGEvent keystroke simulation"
```
---
### Task 8: Global Shortcut
**Files:**
- Create: `MyVoxtral/MyVoxtral/Utilities/GlobalShortcut.swift`
- [ ] **Step 1: Implement global keyboard shortcut monitor**
Create `MyVoxtral/MyVoxtral/Utilities/GlobalShortcut.swift`:
```swift
import Cocoa
final class GlobalShortcut {
private var monitor: Any?
var onTrigger: (() -> Void)?
func register(keyCode: UInt16, modifiers: UInt) {
unregister()
guard keyCode != 0 || modifiers != 0 else { return }
let requiredFlags = NSEvent.ModifierFlags(rawValue: modifiers)
monitor = NSEvent.addGlobalMonitorForEvents(matching: .keyDown) { [weak self] event in
let mask: NSEvent.ModifierFlags = [.command, .option, .control, .shift]
if event.keyCode == keyCode && event.modifierFlags.intersection(mask) == requiredFlags {
self?.onTrigger?()
}
}
}
func unregister() {
if let monitor {
NSEvent.removeMonitor(monitor)
}
monitor = nil
}
deinit {
unregister()
}
}
```
- [ ] **Step 2: Build to verify**
```bash
swift build
```
Expected: Build succeeds.
- [ ] **Step 3: Commit**
```bash
git add MyVoxtral/Utilities/GlobalShortcut.swift
git commit -m "feat: add configurable global keyboard shortcut"
```
---
### Task 9: Transcription Manager
**Files:**
- Create: `MyVoxtral/MyVoxtral/Models/TranscriptionManager.swift`
- [ ] **Step 1: Implement the orchestrator**
Create `MyVoxtral/MyVoxtral/Models/TranscriptionManager.swift`:
```swift
import SwiftUI
enum RecordingState: Equatable {
case idle
case recording
case error(String)
}
@MainActor
final class TranscriptionManager: ObservableObject {
@Published var state: RecordingState = .idle
@Published var currentText: String = ""
private let audioCapture = AudioCapture()
private let wsClient = VoxtralWebSocketClient()
private let settings = AppSettings.shared
private var hasRetried = false
var isRecording: Bool { state == .recording }
func toggle() {
if isRecording {
stop()
} else {
start()
}
}
func start() {
guard settings.hasAPIKey else {
state = .error("No API key set. Open Settings.")
return
}
currentText = ""
hasRetried = false
wsClient.onEvent = { [weak self] event in
self?.handleEvent(event)
}
wsClient.connect(apiKey: settings.apiKey, delayMs: settings.streamingDelayMs)
audioCapture.onChunk = { [weak self] chunk in
Task { @MainActor in
self?.wsClient.sendAudio(chunk)
}
}
do {
try audioCapture.start()
state = .recording
} catch {
state = .error("Mic error: \(error.localizedDescription)")
}
}
func stop() {
audioCapture.stop()
wsClient.flush()
wsClient.disconnect()
state = .idle
if !currentText.isEmpty {
TranscriptionLogger.append(text: currentText)
}
}
private func handleEvent(_ event: VoxtralEvent) {
switch event {
case .sessionCreated:
break
case .textDelta(let text):
currentText += text
if settings.outputMode == .cursorInjection {
CursorInjector.typeText(text)
}
case .segment:
break // text deltas already handle text accumulation
case .language:
break
case .done(let text):
if currentText.isEmpty {
currentText = text
}
case .error(let message):
if !hasRetried && state == .recording {
hasRetried = true
wsClient.disconnect()
wsClient.connect(apiKey: settings.apiKey, delayMs: settings.streamingDelayMs)
} else {
state = .error(message)
audioCapture.stop()
}
case .unknown:
break
}
}
}
```
- [ ] **Step 2: Build to verify**
```bash
swift build
```
Expected: Build succeeds.
- [ ] **Step 3: Commit**
```bash
git add MyVoxtral/Models/TranscriptionManager.swift
git commit -m "feat: add TranscriptionManager orchestrating audio, WS, and output"
```
---
### Task 10: Floating Transcription Window
**Files:**
- Create: `MyVoxtral/MyVoxtral/Views/TranscriptionWindow.swift`
- [ ] **Step 1: Implement NSPanel-based floating window**
Create `MyVoxtral/MyVoxtral/Views/TranscriptionWindow.swift`:
```swift
import SwiftUI
import AppKit
struct TranscriptionContentView: View {
@ObservedObject var manager: TranscriptionManager
var body: some View {
VStack(alignment: .leading, spacing: 8) {
HStack {
Circle()
.fill(manager.isRecording ? .red : .gray)
.frame(width: 8, height: 8)
Text(manager.isRecording ? "Recording..." : "Idle")
.font(.caption)
.foregroundStyle(.secondary)
Spacer()
Button {
NSPasteboard.general.clearContents()
NSPasteboard.general.setString(manager.currentText, forType: .string)
} label: {
Image(systemName: "doc.on.doc")
}
.buttonStyle(.borderless)
.disabled(manager.currentText.isEmpty)
}
ScrollViewReader { proxy in
ScrollView {
Text(manager.currentText.isEmpty ? "Transcription will appear here..." : manager.currentText)
.frame(maxWidth: .infinity, alignment: .leading)
.foregroundStyle(manager.currentText.isEmpty ? .secondary : .primary)
.textSelection(.enabled)
.id("bottom")
}
.onChange(of: manager.currentText) {
proxy.scrollTo("bottom", anchor: .bottom)
}
}
}
.padding()
.frame(width: 320, height: 200)
}
}
final class TranscriptionPanel {
private var panel: NSPanel?
private let manager: TranscriptionManager
init(manager: TranscriptionManager) {
self.manager = manager
}
func show() {
if panel == nil {
let panel = NSPanel(
contentRect: NSRect(x: 0, y: 0, width: 320, height: 200),
styleMask: [.titled, .closable, .resizable, .nonactivatingPanel, .utilityWindow],
backing: .buffered,
defer: false
)
panel.title = "MyVoxtral"
panel.isFloatingPanel = true
panel.level = .floating
panel.contentView = NSHostingView(rootView: TranscriptionContentView(manager: manager))
panel.center()
self.panel = panel
}
panel?.orderFront(nil)
}
func hide() {
panel?.orderOut(nil)
}
var isVisible: Bool {
panel?.isVisible ?? false
}
}
```
- [ ] **Step 2: Build to verify**
```bash
swift build
```
Expected: Build succeeds.
- [ ] **Step 3: Commit**
```bash
git add MyVoxtral/Views/TranscriptionWindow.swift
git commit -m "feat: add floating transcription panel with auto-scroll and copy"
```
---
### Task 11: Settings View
**Files:**
- Create: `MyVoxtral/MyVoxtral/Views/SettingsView.swift`
- [ ] **Step 1: Implement settings window**
Create `MyVoxtral/MyVoxtral/Views/SettingsView.swift`:
```swift
import SwiftUI
struct SettingsView: View {
@ObservedObject var settings = AppSettings.shared
@State private var isRecordingShortcut = false
var body: some View {
Form {
Section("API") {
SecureField("Mistral API Key", text: $settings.apiKey)
}
Section("Output") {
Picker("Mode", selection: $settings.outputMode) {
ForEach(OutputMode.allCases, id: \.self) { mode in
Text(mode.rawValue).tag(mode)
}
}
.pickerStyle(.segmented)
if settings.outputMode == .cursorInjection && !CursorInjector.isAccessibilityGranted {
HStack {
Image(systemName: "exclamationmark.triangle.fill")
.foregroundStyle(.yellow)
Text("Accessibility permission required")
.font(.caption)
Button("Grant") {
CursorInjector.promptAccessibilityPermission()
}
.font(.caption)
}
}
}
Section("Shortcut") {
HStack {
Text("Toggle Recording:")
Spacer()
Button(isRecordingShortcut ? "Press keys..." : settings.shortcutDisplayString) {
isRecordingShortcut = true
}
.onKeyPress { press in
guard isRecordingShortcut else { return .ignored }
settings.shortcutKeyCode = press.key.character.flatMap {
UInt16($0.asciiValue ?? 0)
} ?? 0
settings.shortcutModifiers = press.modifiers.rawValue
isRecordingShortcut = false
return .handled
}
}
}
Section("Latency") {
VStack(alignment: .leading) {
Text("Streaming delay: \(settings.streamingDelayMs)ms")
.font(.caption)
Slider(
value: Binding(
get: { Double(settings.streamingDelayMs) },
set: { settings.streamingDelayMs = Int($0) }
),
in: 240...2400,
step: 120
)
HStack {
Text("Fast").font(.caption2).foregroundStyle(.secondary)
Spacer()
Text("Accurate").font(.caption2).foregroundStyle(.secondary)
}
}
}
}
.formStyle(.grouped)
.frame(width: 360, height: 340)
}
}
```
- [ ] **Step 2: Build to verify**
```bash
swift build
```
Expected: Build succeeds.
- [ ] **Step 3: Commit**
```bash
git add MyVoxtral/Views/SettingsView.swift
git commit -m "feat: add SettingsView with API key, shortcut, mode, and latency"
```
---
### Task 12: Menu Bar View
**Files:**
- Create: `MyVoxtral/MyVoxtral/Views/MenuBarView.swift`
- [ ] **Step 1: Implement the menu bar dropdown**
Create `MyVoxtral/MyVoxtral/Views/MenuBarView.swift`:
```swift
import SwiftUI
struct MenuBarView: View {
@ObservedObject var manager: TranscriptionManager
@ObservedObject var settings = AppSettings.shared
let onShowTranscription: () -> Void
let onShowSettings: () -> Void
var body: some View {
VStack(spacing: 4) {
Button(manager.isRecording ? "Stop Recording" : "Start Recording") {
manager.toggle()
}
.keyboardShortcut("r")
if case .error(let msg) = manager.state {
Text(msg)
.font(.caption)
.foregroundStyle(.red)
.lineLimit(2)
.padding(.horizontal, 8)
}
Divider()
Button("Show Transcription") {
onShowTranscription()
}
Button("Settings...") {
onShowSettings()
}
Divider()
Button("Quit") {
NSApplication.shared.terminate(nil)
}
.keyboardShortcut("q")
}
.padding(.vertical, 4)
}
}
```
- [ ] **Step 2: Build to verify**
```bash
swift build
```
Expected: Build succeeds.
- [ ] **Step 3: Commit**
```bash
git add MyVoxtral/Views/MenuBarView.swift
git commit -m "feat: add MenuBarView with start/stop, settings, and error display"
```
---
### Task 13: Wire Everything Together in App Entry
**Files:**
- Modify: `MyVoxtral/MyVoxtral/MyVoxtralApp.swift`
- [ ] **Step 1: Update the app entry point to integrate all components**
Replace the contents of `MyVoxtral/MyVoxtral/MyVoxtralApp.swift` with:
```swift
import SwiftUI
@main
struct MyVoxtralApp: App {
@StateObject private var manager = TranscriptionManager()
@StateObject private var settings = AppSettings.shared
@State private var transcriptionPanel: TranscriptionPanel?
@State private var settingsWindow: NSWindow?
private let globalShortcut = GlobalShortcut()
var body: some Scene {
MenuBarExtra {
MenuBarView(
manager: manager,
onShowTranscription: { showTranscriptionWindow() },
onShowSettings: { showSettingsWindow() }
)
} label: {
Image(systemName: manager.isRecording ? "mic.fill" : "mic")
.symbolRenderingMode(.palette)
.foregroundStyle(manager.isRecording ? .red : .primary)
}
.onChange(of: settings.shortcutKeyCode) { registerShortcut() }
.onChange(of: settings.shortcutModifiers) { registerShortcut() }
.onAppear {
if !settings.hasAPIKey {
showSettingsWindow()
}
registerShortcut()
}
.onChange(of: manager.isRecording) {
if manager.isRecording && settings.outputMode == .textBox {
showTranscriptionWindow()
}
}
}
private func registerShortcut() {
globalShortcut.register(
keyCode: settings.shortcutKeyCode,
modifiers: settings.shortcutModifiers
)
globalShortcut.onTrigger = { [weak manager] in
Task { @MainActor in
manager?.toggle()
}
}
}
private func showTranscriptionWindow() {
if transcriptionPanel == nil {
transcriptionPanel = TranscriptionPanel(manager: manager)
}
transcriptionPanel?.show()
}
private func showSettingsWindow() {
if let settingsWindow, settingsWindow.isVisible {
settingsWindow.makeKeyAndOrderFront(nil)
return
}
let window = NSWindow(
contentRect: NSRect(x: 0, y: 0, width: 360, height: 340),
styleMask: [.titled, .closable],
backing: .buffered,
defer: false
)
window.title = "MyVoxtral Settings"
window.contentView = NSHostingView(rootView: SettingsView())
window.center()
window.makeKeyAndOrderFront(nil)
NSApp.activate(ignoringOtherApps: true)
self.settingsWindow = window
}
}
```
- [ ] **Step 2: Build the full app**
```bash
swift build
```
Expected: Build succeeds with all components linked.
- [ ] **Step 3: Run and verify basic functionality**
```bash
swift run
```
Expected: Menu bar icon appears. Clicking shows the dropdown with Start/Stop, Settings, Quit. Settings window opens if no API key.
- [ ] **Step 4: Commit**
```bash
git add MyVoxtral/MyVoxtralApp.swift
git commit -m "feat: wire all components into app entry with menu bar, shortcut, and windows"
```
---
### Task 14: End-to-End Smoke Test
- [ ] **Step 1: Verify the full recording flow**
```bash
swift run
```
Manual test checklist:
1. App launches as menu bar icon (no Dock icon)
2. Click icon → dropdown shows Start Recording, Settings, Quit
3. Open Settings → enter a valid Mistral API key
4. Set output mode to "Text Window"
5. Click "Start Recording" → mic icon turns red, transcription window opens
6. Speak into microphone → text appears in the floating window
7. Click "Stop Recording" → icon returns to normal
8. Verify `~/Library/Application Support/MyVoxtral/transcription.log` contains the session
9. Switch to "Type at Cursor" mode → grant Accessibility permission if prompted
10. Open TextEdit, click Start Recording, speak → text appears at cursor in TextEdit
11. Configure a keyboard shortcut in Settings → verify it toggles recording from any app
- [ ] **Step 2: Fix any issues found during smoke testing**
Address any build errors, runtime crashes, or connection issues.
- [ ] **Step 3: Final commit**
```bash
git add -A
git commit -m "test: verify end-to-end transcription flow"
```