Railway

Deploy Speaches | Open-Source OpenAI Whisper Alternative on Railway

Self Host Speaches — OpenAI-compatible STT/TTS API server

Deploy Speaches | Open-Source OpenAI Whisper Alternative on Railway

/home/ubuntu/.cache/huggingface/hub

Deploy and Host Speaches on Railway

Deploy Speaches on Railway to run a fully self-hosted, OpenAI API-compatible speech-to-text and text-to-speech server. Speaches uses faster-whisper for transcription and Kokoro/Piper for speech synthesis — like Ollama, but for audio models. This template pre-configures Speaches with CPU-optimized int8 quantization, API key authentication, a persistent volume for model caching, and the Gradio web UI for interactive testing.

Self-host Speaches to process audio without sending data to third-party APIs. The deployment includes a single Speaches service with a HuggingFace model cache volume — no database required.

Speaches dashboard screenshot

Getting Started with Speaches on Railway

After deployment completes, open your Railway-generated URL to access the Speaches Gradio web UI. The web UI lets you test speech-to-text transcription and text-to-speech generation directly in your browser. To use the API, send requests to /v1/audio/transcriptions (STT) or /v1/audio/speech (TTS) with your API key in the Authorization: Bearer header. Models download automatically on first use and are cached in the persistent volume. Check the /docs endpoint for the full OpenAPI specification. Point any OpenAI-compatible SDK at your Speaches URL to start integrating.

About Hosting Speaches

Speaches is an open-source (MIT) server that provides OpenAI-compatible speech-to-text and text-to-speech APIs. It dynamically loads and unloads models on demand — specify the model in your API request and Speaches downloads it from HuggingFace, runs inference, and optionally offloads it after a configurable TTL.

  • STT via faster-whisper (4x faster than standard Whisper on CPU)
  • TTS via Kokoro (#1 ranked on TTS Arena) and Piper (20+ languages)
  • Realtime WebSocket API at /v1/realtime for two-way voice conversations
  • Streaming transcription via Server-Sent Events
  • Dynamic model management — models auto-download and auto-unload based on TTL settings

Why Deploy Speaches on Railway

Railway handles infrastructure so you can focus on building voice-enabled applications.

  • Full data privacy — audio never leaves your server
  • Zero per-minute costs — pay only for Railway infrastructure
  • OpenAI API drop-in replacement — existing code works unchanged
  • Persistent model cache survives redeployments
  • API key authentication included out of the box

Common Use Cases for Self-Hosted Speaches

  • Voice-enabled AI agents: Use as the audio layer for LLM-powered assistants with real-time WebSocket support for two-way conversations
  • Meeting and call transcription: Stream audio for real-time transcription in internal tools, call centers, or accessibility captioning
  • Home automation voice control: Integrate with Home Assistant via the wyoming_openai proxy for private, cloud-free voice commands
  • Multilingual content production: Generate voiceovers for videos, e-learning, audiobooks, or IVR systems using Kokoro and Piper TTS models

Dependencies for Speaches on Railway

  • Speachesghcr.io/speaches-ai/speaches:0.9.0-rc.3-cpu (CPU-only build, ~1.2 GB image)
  • Volume — HuggingFace model cache at /home/ubuntu/.cache/huggingface/hub

Environment Variables Reference for Speaches

VariableDescriptionDefault
API_KEYAPI key for endpoint authentication(none — open)
ENABLE_UIEnable Gradio web interfacetrue
WHISPER__COMPUTE_TYPEQuantization type (int8 reduces memory ~40%)default
WHISPER__INFERENCE_DEVICEInference device (cpu/cuda/auto)auto
STT_MODEL_TTLSeconds before STT model unloads (-1 = never)300
TTS_MODEL_TTLSeconds before TTS model unloads (-1 = never)300
PRELOAD_MODELSJSON array of model IDs to download at startup[]
LOG_LEVELLogging level (debug/info/warning/error)debug

Deployment Dependencies

Hardware Requirements for Self-Hosting Speaches

ResourceMinimum (tiny/base models)Recommended (small + Kokoro)
CPU2 vCPU4+ vCPU
RAM1 GB4 GB
Storage500 MB (model cache)5 GB (multiple models)
RuntimeDockerDocker

For the large-v3 Whisper model, allocate 8 GB RAM. Use WHISPER__COMPUTE_TYPE=int8 to reduce memory by approximately 40% on CPU deployments.

Self-Hosting Speaches

Pull and run the CPU image with Docker:

docker run -d \
  -p 8000:8000 \
  -v speaches-cache:/home/ubuntu/.cache/huggingface/hub \
  -e API_KEY=your-secret-key \
  -e WHISPER__COMPUTE_TYPE=int8 \
  ghcr.io/speaches-ai/speaches:0.9.0-rc.3-cpu

Or use docker-compose for a persistent setup:

services:
  speaches:
    image: ghcr.io/speaches-ai/speaches:0.9.0-rc.3-cpu
    ports:
      - "8000:8000"
    environment:
      - API_KEY=your-secret-key
      - WHISPER__COMPUTE_TYPE=int8
      - WHISPER__INFERENCE_DEVICE=cpu
      - STT_MODEL_TTL=-1
    volumes:
      - hf-cache:/home/ubuntu/.cache/huggingface/hub
volumes:
  hf-cache:

How Much Does Speaches Cost to Self-Host?

Speaches is free and open-source under the MIT license — no per-minute or per-character fees. On Railway, you pay only for compute and storage. Cloud alternatives like OpenAI Whisper API charge $0.006/minute for transcription and $15/1M characters for TTS. Self-hosting Speaches on Railway eliminates these recurring costs entirely, making it cost-effective at any volume beyond a few hundred minutes per month.

Speaches vs LocalAI for Self-Hosted Audio

FeatureSpeachesLocalAI
FocusSTT + TTS specialistMulti-modal (LLM, images, audio, embeddings)
STT Enginefaster-whisperwhisper.cpp / faster-whisper
TTS EnginesKokoro, PiperPiper, Coqui, Kokoro
Realtime APIWebSocket at /v1/realtimeNot available
Model ManagementDynamic load/unload with TTLStatic configuration
Resource UsageLightweight (audio only)Heavier (full LLM stack)

Speaches is the better choice when you need a dedicated, lightweight audio server. LocalAI is better when you want a single server for LLMs, images, and audio combined.

FAQ for Speaches on Railway

What is Speaches and why self-host it? Speaches is an open-source server that provides OpenAI-compatible speech-to-text and text-to-speech APIs. Self-hosting gives you full data privacy, zero per-minute costs, and the ability to run audio processing without depending on cloud APIs.

What does this Railway template deploy? This template deploys a single Speaches container with CPU-optimized configuration, API key authentication, and a persistent volume for caching HuggingFace models. No database is required.

Why does the Speaches Railway template include a volume? The volume caches downloaded AI models (Whisper for STT, Kokoro/Piper for TTS) so they persist across redeployments. Without a volume, models would re-download on every container restart, adding minutes of cold-start delay.

How do I use Speaches as an OpenAI API drop-in replacement? Point your OpenAI SDK's base_url to your Speaches Railway URL (e.g. https://your-app.up.railway.app/v1) and set any API key value matching your API_KEY environment variable. All /v1/audio/transcriptions, /v1/audio/speech, and /v1/models endpoints are compatible.

Can I run Speaches on Railway without a GPU? Yes. This template uses the CPU-only image with int8 quantization for reduced memory usage. The small Whisper model processes audio at approximately 4x real-time speed on CPU. For production workloads requiring faster processing, consider a GPU-enabled host.

How do I preload models in self-hosted Speaches to avoid cold starts? Set the PRELOAD_MODELS environment variable to a JSON array of HuggingFace model IDs, for example ["Systran/faster-whisper-small","speaches-ai/Kokoro-82M-v1.0-ONNX"]. Models download during container startup and are ready for immediate use.


Template Content

More templates in this category

View Template
Chat Chat
Chat Chat, your own unified chat and search to AI platform.

okisdev
View Template
EchoDeck
Generate a mp4 from powerpoint with TTS

Fixed Scope
View Template
Rift
Rift Its a OSS AI Chat for teams

Compound