Railway

Deploy Deploy and Host Speaches — Self-Hosted STT/TTS API on Railway

Self-host OpenAI-compatible STT & TTS. No per-minute fee. Audio stays local

Deploy Deploy and Host Speaches — Self-Hosted STT/TTS API on Railway

/home/ubuntu/.cache/huggingface/hub

Deploy and Host Speaches on Railway

Speaches self-hosted STT/TTS dashboard

Speaches is an open-source, OpenAI API-compatible speech server — like Ollama, but for audio models. Drop-in replacement for OpenAI's /v1/audio/transcriptions (STT) and /v1/audio/speech (TTS) endpoints. Powered by faster-whisper (4× faster than standard Whisper on CPU), Kokoro (ranked #1 on TTS Arena), and Piper (20+ languages) — all running locally on your Railway instance with no audio ever leaving your server.

Self-host on Railway for ~$5–10/month versus OpenAI Whisper API at $0.36/hour or ElevenLabs at $5–22/month with character caps. Zero per-minute fees. Unlimited transcriptions and speech generation at flat compute cost.


What This Template Deploys

ServicePurpose
Speaches v0.9.0OpenAI-compatible STT/TTS server — faster-whisper transcription, Kokoro/Piper speech synthesis, Realtime WebSocket API, and Gradio web UI on port 8000
HuggingFace Model Cache (/home/ubuntu/.cache/huggingface/hub)Persistent volume — downloaded models survive redeploys; no re-downloading on container restart

Single-service architecture. No database, no Redis, no external services. Models download from HuggingFace on first use and are cached permanently on the volume.


About Hosting Speaches

Running a production STT/TTS server requires managing model downloads, CPU/GPU inference configuration, API authentication, and a public HTTPS endpoint for application integration. Without a managed host, you're configuring Docker, model storage volumes, SSL, and compute resource allocation manually.

Railway pre-configures Speaches with CPU-optimized int8 quantization, API key auth, a persistent HuggingFace cache volume, and Gradio web UI — all at deploy time.

Typical cost: ~$5–10/month on Railway's Hobby plan. OpenAI Whisper API costs $0.36/hour — 30 hours of audio costs $10.80 before any TTS usage. ElevenLabs caps at 30,000 characters on the starter plan. Speaches gives you unlimited STT and TTS at flat compute pricing.


Deploy in Under 5 Minutes

  1. Click Deploy on Railway — Speaches builds automatically (~2–3 minutes)
  2. Set API_KEY to a strong random string in the Variables tab — this secures all endpoints
  3. Open your Railway-assigned URL — the Gradio web UI loads immediately
  4. Test STT at /v1/audio/transcriptions and TTS at /v1/audio/speech with your API key
  5. Point any OpenAI-compatible SDK at your Railway URL — existing code works unchanged

No SSH. No model management. No infrastructure configuration.


Common Use Cases

  • Self-hosted alternative to OpenAI Whisper API — run faster-whisper locally at flat compute cost instead of $0.36/hour; existing code using OpenAI's audio endpoints works unchanged — just swap the base URL to your Railway domain
  • Self-hosted alternative to ElevenLabs — generate unlimited speech with Kokoro (#1 TTS Arena) and Piper (20+ languages) without $5–22/month subscription or character caps
  • Voice-enabled AI agent audio layer — use Speaches as the STT/TTS backend for LLM-powered voice assistants; the Realtime WebSocket API at /v1/realtime enables two-way voice conversations with sub-second latency
  • Meeting and call transcription — stream audio for real-time transcription in internal tools, call centres, or accessibility captioning pipelines without per-minute API billing
  • Home Assistant private voice control — integrate via the wyoming_openai proxy for fully local, cloud-free voice commands; audio never leaves your network
  • Multilingual audio content production — generate voiceovers for video, e-learning, audiobooks, and IVR systems using Piper's 20+ language voices at no per-character cost

Configuration

VariableRequiredDescription
API_KEYRecommendedSecures all API endpoints — set before exposing your Railway URL publicly
ENABLE_UIOptionalSet to false to disable the Gradio web UI in production — reduces memory usage
WHISPER__COMPUTE_TYPEOptionalint8 (default, ~40% memory reduction), float16, or float32
WHISPER__INFERENCE_DEVICEOptionalcpu (default), cuda (requires GPU), or auto
STT_MODEL_TTLOptionalSeconds before STT model unloads from memory — -1 to keep loaded permanently
TTS_MODEL_TTLOptionalSeconds before TTS model unloads from memory — -1 to keep loaded permanently
PRELOAD_MODELSOptionalJSON array of HuggingFace model IDs to download at startup — avoids cold-start delays
LOG_LEVELOptionalinfo for production, debug for troubleshooting

PORT is injected automatically by Railway. The default Speaches port is 8000.


Speaches vs. Managed STT/TTS APIs

Speaches (Railway)OpenAI Whisper APIElevenLabsDeepgram
Monthly cost~$5–10 flat$0.36/hourFrom $5/monthFrom $0.0043/min
Per-minute/character fees✅ None❌ $0.36/hr❌ Character caps❌ Per minute
Audio leaves your server✅ Never❌ OpenAI servers❌ ElevenLabs servers❌ Deepgram servers
OpenAI API drop-in✅ Yes — same endpoints✅ Yes❌ No❌ No
STT (transcription)✅ faster-whisper✅ Whisper❌ TTS only✅ Yes
TTS (speech synthesis)✅ Kokoro + Piper✅ TTS-1✅ Yes❌ STT only
Realtime WebSocket/v1/realtime✅ Yes❌ No✅ Yes
Self-hostable✅ Yes❌ No❌ No❌ No
Multilingual (20+ langs)✅ Piper✅ 99 languages✅ Yes✅ Yes

Dependencies for Speaches Hosting

  • Railway account — Hobby plan (~$5–10/month) covers the service and model cache volume
  • No external API keys required — models download from HuggingFace at runtime for free
  • Optional: NVIDIA GPU for accelerated inference (not available on Railway Hobby plan)

Deployment Dependencies

Implementation Details

This template deploys ghcr.io/speaches-ai/speaches:0.9.0-rc.3-cpu with a persistent Railway volume at /home/ubuntu/.cache/huggingface/hub. Models are downloaded from HuggingFace on first request and cached permanently — redeploys and version updates do not require re-downloading models. The CPU build uses int8 quantization by default, reducing memory usage by approximately 40% vs full precision.

The API server exposes OpenAI-compatible endpoints: /v1/audio/transcriptions for STT, /v1/audio/speech for TTS, and /v1/realtime for WebSocket-based two-way voice. Full OpenAPI documentation is available at /docs on your Railway domain after deploy.


Frequently Asked Questions

How much does Speaches cost on Railway vs OpenAI Whisper API? Speaches on Railway runs at ~$5–10/month flat with unlimited transcriptions and speech generation. OpenAI Whisper API charges $0.36/hour — at 30 hours of audio that's $10.80 in API costs before any TTS usage. ElevenLabs starts at $5/month but caps characters. Speaches gives you unlimited STT and TTS at flat compute pricing with no audio leaving your server.

Is Speaches a drop-in replacement for OpenAI's audio API? Yes. Speaches implements the same /v1/audio/transcriptions and /v1/audio/speech endpoints as OpenAI. Change the base URL in your OpenAI SDK to your Railway domain and set your API key — existing code works unchanged without any other modifications.

Do my audio files leave my Railway instance? No. All transcription and speech synthesis runs inside your Railway container using local model inference. No audio data is sent to any external API. This makes Speaches suitable for HIPAA-regulated audio, internal voice tools, and privacy-sensitive applications.

How long does the first transcription take? The first request triggers a model download from HuggingFace — faster-whisper tiny takes ~75 MB, small ~244 MB, and large-v3 ~1.5 GB. Subsequent requests use the cached model and run immediately. Use PRELOAD_MODELS to pre-download models at startup and eliminate cold-start delays on the first request.

What is the difference between Kokoro and Piper for TTS? Kokoro is ranked #1 on TTS Arena for voice quality — best for English speech generation where naturalness matters. Piper is optimised for speed and supports 20+ languages — best for multilingual applications or latency-sensitive deployments. Both are available in the same Speaches instance and selectable per API request.

Do I lose my downloaded models if Railway redeploys? No. All downloaded models are stored on the Railway persistent volume at /home/ubuntu/.cache/huggingface/hub, not inside the container. Redeploys, version updates, and container restarts do not require re-downloading models.


Why Deploy and Host Speaches on Railway?

Railway is a singular platform to deploy your infrastructure stack. Railway will host your infrastructure so you don't have to deal with configuration, while allowing you to vertically and horizontally scale it.

By deploying Speaches on Railway, you get a fully OpenAI-compatible STT/TTS server — local faster-whisper transcription, Kokoro and Piper speech synthesis, Realtime WebSocket support, and persistent model caching — at ~$5–10/month flat with no per-minute fees and no audio ever leaving your infrastructure.


Template Content

More templates in this category

View Template
Chat Chat
Chat Chat, your own unified chat and search to AI platform.

okisdev
View Template
Hermes Agent | OpenClaw Alternative with Dashboard
Self-improving AI agent with memory, skills, and web dashboard 🤖

codestorm
View Template
EchoDeck
Generate a mp4 from powerpoint with TTS

Fixed Scope