Note
This project was previously named faster-whisper-server
. I've decided to change the name from faster-whisper-server
, as the project has evolved to support more than just transcription.
Note
These docs are a work in progress.
Speaches
speaches
is an OpenAI API-compatible server supporting streaming transcription, translation, and speech generation. Speach-to-Text is powered by faster-whisper and for Text-to-Speech piper and Kokoro are used. This project aims to be Ollama, but for TTS/STT models.
Try it out on the HuggingFace Space
Features:
- GPU and CPU support.
- Deployable via Docker Compose / Docker
- Highly configurable
- OpenAI API compatible. All tools and SDKs that work with OpenAI's API should work with
speaches
. -
Streaming support (transcription is sent via SSE as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).
- LocalAgreement2 (paper | original implementation) algorithm is used for live transcription.
- Live transcription support (audio is sent via websocket as it's generated).
- Dynamic model loading / offloading. Just specify which model you want to use in the request and it will be loaded automatically. It will then be unloaded after a period of inactivity.
- Text-to-Speech via
kokoro
(Ranked #1 in the TTS Arena) andpiper
models. - Coming soon: Audio generation (chat completions endpoint) | OpenAI Documentation
- Generate a spoken audio summary of a body of text (text in, audio out)
- Perform sentiment analysis on a recording (audio in, text out)
- Async speech to speech interactions with a model (audio in, audio out)
- Coming soon: Realtime API | OpenAI Documentation
Please create an issue if you find a bug, have a question, or a feature suggestion.
Demo
Streaming Transcription
TODO