OmniVoice Studio — use it
01/08
What’s OmniVoice Studio?
OmniVoice Studio is an open-source desktop utility for voice cloning, video dubbing, real-time dictation, and speaker diarization. All the pieces runs regionally in your machine. No API key, cloud account, or subscription required.
Helps 646 languages with TTS by way of the default OmniVoice engine 99 languages with transcription by way of WhisperX Out there on macOS, Home windows, and Linux GPU optionally available – full pipeline runs on CPU Free for private, instructional, and analysis use (FSL-1.1-ALv2)
OmniVoice Studio — use it
02/08
System necessities
GPU is optionally available. With out this, TTS runs about 3 instances slower on the CPU. If VRAM is 8 GB or much less, TTS robotically offloads to the CPU throughout transcription. No configuration required.
Minimal advisable parts OSWin 10 / macOS 12+ / Ubuntu 20.04+ Newest 64-bit OS RAM8 GB16 GB+ VRAM4 GB (auto offload)8 GB+ (RTX 3060+) Disk free house 10 GB20 GB+ SSD Python3.10+3.11–3.12 GPU choices CUDA / MPS / ROCm
OmniVoice Studio — use it
03/08
set up
We suggest operating the mission from supply. First, set up three stipulations: ffmpeg, Bun (JS runtime), and uv (Python bundle supervisor).
git clone https://github.com/debpalash/OmniVoice-Studio.git cd OmniVoice-Studio uv sync bun set up bun dev
The entrance finish is loaded at http://localhost:5173. The API runs on port 8000.
Mannequin weights are robotically downloaded within the first technology.
Pre-built installers accessible: macOS DMG, Home windows MSI, Linux AppImage, and .deb — see GitHub releases web page.
OmniVoice Studio — use it
04/08
voice cloning
Voice cloning makes use of zero-shot studying. Because of this audio can be cloned from clips so long as 3 seconds with none prior audio coaching. The default OmniVoice engine adjusts a diffusion-based TTS mannequin on the reference audio.
UI[音声クローン]Go to tab. Add or file a 3-second audio clip of your goal voice. Enter your textual content and choose your goal language (646 languages accessible).[生成]Click on. The output is saved to the mission library.
Audio Gallery: Construct your audio library by looking YouTube, searching classes, and downloading reference clips immediately inside the app.
OmniVoice Studio — use it
05/08
video dubbing
The whole dubbing pipeline is executed regionally: transcription → translation → synthesis → multiplexer. Demucs separates the vocals so the unique background audio is preserved within the last export.
[吹き替え]Go to the tab — Paste your YouTube URL or add a neighborhood file WhisperX will transcribe your audio with word-level alignment Choose your goal language. Translation is carried out robotically and the TTS engine re-vocalizes the transcript. Demucs protect background audio. Export the ultimate MP4 with the dubbed audio combined.
Batch queue: Drop and exit as much as 50 movies. Every job has its personal progress bar that tracks the complete pipeline.
OmniVoice Studio — use it
06/08
Dictation and speaker diarization
Dictation works system-wide from any utility. Diarization makes use of Pyannote + WhisperX to determine particular person audio system in multi-speaker audio information.
Press ⌘+⇧+Area (macOS) to open the floating dictation widget Stream audio by way of WebSocket and auto-paste into the energetic enter subject Add multi-speaker information to the diarization tab Pyannote identifies who mentioned what. Every speaker will get an auto-extracted audio profile Assign TTS audio per speaker for per-speaker dubbing
Pyannote diary requires Hug Face Token. See docs/setup/huggingface-token.md within the repository.
OmniVoice Studio — use it
07/08
TTS engine
It consists of six TTS engines.[設定]→[TTS エンジン]Or change with surroundings variables.
OMNIVOICE_TTS_BACKEND=Cozy Voice
Engine Language Clone Platform OmniVoice (default)600+✓CUDA / MPS / CPU CosyVoice 39 + 18 Dialects✓CUDA / MPS / CPU MLX-AudioMultiVariesApple Silicon Solely VoxCPM230✓CUDA / MPS / CPU MOSS-TTS-Nano20✓CUDA / CPU KittenTTSEnglish✗CPU Solely
Customized engine: Subclass TTSBackend in backend/providers/tts_backend.py and add it to _REGISTRY. Roughly 50 traces of Python.
OmniVoice Studio — use it
08/08
MCP servers and sources
OmniVoice Studio ships with a built-in MCP server that exposes voice and dubbing performance to MCP-compatible purchasers (claudes, cursors, or your personal instruments) with out opening the desktop UI.
The MCP server begins in parallel with the FastAPI backend on bun dev Factors the MCP shopper to the native server to entry all endpoints AudioSeal (meta) embeds an invisible neural watermark on all generated audio for AI provenance GitHub: github.com/debpalash/OmniVoice-Studio Set up documentation: docs/set up/ (macos / home windows / linux / docker) Troubleshooting: docs/set up/troubleshooting.md Discord: discord.gg/bzQavDfVV9


