fluidaudio
SKILL.md
FluidAudio SDK
Swift SDK for fully local audio AI on Apple platforms. All inference runs on Apple Neural Engine (ANE) via CoreML — no cloud, no latency, no data leaves the device.
Repository: https://github.com/FluidInference/FluidAudio.git
Version: 0.12.1+
Platforms: macOS 14.0+ / iOS 17.0+ (arm64 only, Apple Silicon)
Quick Start
import FluidAudio
// ASR — batch transcription
let models = try await AsrModels.downloadAndLoad(version: .v3) // multilingual, 25 languages
let asr = AsrManager(config: .default)
try await asr.initialize(models: models)
let result = try await asr.transcribe(audioURL)
print(result.text)
Models auto-download from HuggingFace on first use, then cache at ~/.cache/fluidaudio/Models/.
Installation (SPM)
dependencies: [
.package(url: "https://github.com/FluidInference/FluidAudio.git", from: "0.12.1")
]
// Product: "FluidAudio" (core, Apache 2.0) or "FluidAudioTTS" (+ Kokoro TTS, GPL-3.0)
Components & Performance
| Component | Manager | RTFx (M4 Pro) | Key Metric |
|---|---|---|---|
| ASR (batch) | AsrManager |
~190x | WER 3.21% |
| ASR (streaming) | StreamingEouAsrManager |
real-time | 160ms–1.6s EOU latency |
| Diarization (offline) | OfflineDiarizerManager |
~150x | DER 13.89% |
| Diarization (online) | DiarizerManager |
~150x | DER 17.7% |
| VAD | VadManager |
~1000x+ | F1 0.85 |
| TTS (PocketTTS) | PocketTTS |
— | ~80ms latency |
| TTS (Kokoro) | KokoroModel |
— | High quality, SSML |
All audio auto-converts to 16kHz mono Float32 via AudioConverter.
Detailed References
Read the appropriate reference file based on your task:
- ASR (batch + streaming transcription): See references/asr.md — models, batch/streaming APIs, configuration, EOU detection, token merging, CLI
- Speaker diarization: See references/diarization.md — offline/online pipelines, Sortformer, configuration, CLI
- Voice activity detection: See references/vad.md — batch/streaming VAD, segmentation config, CLI
- Text-to-speech: See references/tts.md — PocketTTS vs Kokoro, SSML, licensing
- Infrastructure (install, models, audio, platform): See references/infrastructure.md — SPM/CocoaPods setup, model management, caching, AudioConverter, ANE optimization, package structure
Key Integration Patterns
Typical macOS Voice-to-Text App
import FluidAudio
class TranscriptionService {
private var asrManager: AsrManager?
func setup() async throws {
let models = try await AsrModels.downloadAndLoad(version: .v3)
let manager = AsrManager(config: .default)
try await manager.initialize(models: models)
self.asrManager = manager
}
func transcribe(url: URL) async throws -> String {
guard let manager = asrManager else { throw TranscriptionError.notInitialized }
let result = try await manager.transcribe(url)
return result.text
}
}
Real-Time Streaming with EOU Detection
let streaming = StreamingEouAsrManager(config: .streaming)
try await streaming.start(models: models)
Task {
for await update in await streaming.transcriptionUpdates {
if update.isConfirmed {
// Final text — safe to paste/display
} else {
// Volatile — may change, show as preview
}
}
}
// Feed microphone audio
for chunk in microphoneBufferStream {
await streaming.streamAudio(chunk)
}
let final = try await streaming.finish()
Combined ASR + Diarization
let samples = try AudioConverter().resampleAudioFile(path: "meeting.wav")
// Transcribe
let asrResult = try await asrManager.transcribe(samples)
// Identify speakers
let diarizer = OfflineDiarizerManager(config: OfflineDiarizerConfig())
try await diarizer.prepareModels()
let diarResult = try await diarizer.process(audio: samples)
Weekly Installs
4
Repository
bbssppllvv/esse…l-skillsGitHub Stars
1
First Seen
14 days ago
Security Audits
Installed on
opencode4
gemini-cli4
codebuddy4
github-copilot4
codex4
kimi-cli4