Deploy and Host Speaches on Railway

Speaches self-hosted STT/TTS dashboard

Speaches is an open-source, OpenAI API-compatible speech server — like Ollama, but for audio models. Drop-in replacement for OpenAI's /v1/audio/transcriptions (STT) and /v1/audio/speech (TTS) endpoints. Powered by faster-whisper (4× faster than standard Whisper on CPU), Kokoro (ranked #1 on TTS Arena), and Piper (20+ languages) — all running locally on your Railway instance with no audio ever leaving your server.

Self-host on Railway for ~$5–10/month versus OpenAI Whisper API at $0.36/hour or ElevenLabs at $5–22/month with character caps. Zero per-minute fees. Unlimited transcriptions and speech generation at flat compute cost.

What This Template Deploys

Service	Purpose
Speaches v0.9.0	OpenAI-compatible STT/TTS server — faster-whisper transcription, Kokoro/Piper speech synthesis, Realtime WebSocket API, and Gradio web UI on port `8000`
HuggingFace Model Cache (`/home/ubuntu/.cache/huggingface/hub`)	Persistent volume — downloaded models survive redeploys; no re-downloading on container restart

Single-service architecture. No database, no Redis, no external services. Models download from HuggingFace on first use and are cached permanently on the volume.

About Hosting Speaches

Running a production STT/TTS server requires managing model downloads, CPU/GPU inference configuration, API authentication, and a public HTTPS endpoint for application integration. Without a managed host, you're configuring Docker, model storage volumes, SSL, and compute resource allocation manually.

Railway pre-configures Speaches with CPU-optimized int8 quantization, API key auth, a persistent HuggingFace cache volume, and Gradio web UI — all at deploy time.

Typical cost: ~$5–10/month on Railway's Hobby plan. OpenAI Whisper API costs $0.36/hour — 30 hours of audio costs $10.80 before any TTS usage. ElevenLabs caps at 30,000 characters on the starter plan. Speaches gives you unlimited STT and TTS at flat compute pricing.

Deploy in Under 5 Minutes

Click Deploy on Railway — Speaches builds automatically (~2–3 minutes)
Set API_KEY to a strong random string in the Variables tab — this secures all endpoints
Open your Railway-assigned URL — the Gradio web UI loads immediately
Test STT at /v1/audio/transcriptions and TTS at /v1/audio/speech with your API key
Point any OpenAI-compatible SDK at your Railway URL — existing code works unchanged

No SSH. No model management. No infrastructure configuration.

Common Use Cases

Self-hosted alternative to OpenAI Whisper API — run faster-whisper locally at flat compute cost instead of $0.36/hour; existing code using OpenAI's audio endpoints works unchanged — just swap the base URL to your Railway domain
Self-hosted alternative to ElevenLabs — generate unlimited speech with Kokoro (#1 TTS Arena) and Piper (20+ languages) without $5–22/month subscription or character caps
Voice-enabled AI agent audio layer — use Speaches as the STT/TTS backend for LLM-powered voice assistants; the Realtime WebSocket API at /v1/realtime enables two-way voice conversations with sub-second latency
Meeting and call transcription — stream audio for real-time transcription in internal tools, call centres, or accessibility captioning pipelines without per-minute API billing
Home Assistant private voice control — integrate via the wyoming_openai proxy for fully local, cloud-free voice commands; audio never leaves your network
Multilingual audio content production — generate voiceovers for video, e-learning, audiobooks, and IVR systems using Piper's 20+ language voices at no per-character cost

Configuration

Variable	Required	Description
`API_KEY`	Recommended	Secures all API endpoints — set before exposing your Railway URL publicly
`ENABLE_UI`	Optional	Set to `false` to disable the Gradio web UI in production — reduces memory usage
`WHISPER__COMPUTE_TYPE`	Optional	`int8` (default, ~40% memory reduction), `float16`, or `float32`
`WHISPER__INFERENCE_DEVICE`	Optional	`cpu` (default), `cuda` (requires GPU), or `auto`
`STT_MODEL_TTL`	Optional	Seconds before STT model unloads from memory — `-1` to keep loaded permanently
`TTS_MODEL_TTL`	Optional	Seconds before TTS model unloads from memory — `-1` to keep loaded permanently
`PRELOAD_MODELS`	Optional	JSON array of HuggingFace model IDs to download at startup — avoids cold-start delays
`LOG_LEVEL`	Optional	`info` for production, `debug` for troubleshooting

PORT is injected automatically by Railway. The default Speaches port is 8000.

Speaches vs. Managed STT/TTS APIs

	Speaches (Railway)	OpenAI Whisper API	ElevenLabs	Deepgram
Monthly cost	~$5–10 flat	$0.36/hour	From $5/month	From $0.0043/min
Per-minute/character fees	✅ None	❌ $0.36/hr	❌ Character caps	❌ Per minute
Audio leaves your server	✅ Never	❌ OpenAI servers	❌ ElevenLabs servers	❌ Deepgram servers
OpenAI API drop-in	✅ Yes — same endpoints	✅ Yes	❌ No	❌ No
STT (transcription)	✅ faster-whisper	✅ Whisper	❌ TTS only	✅ Yes
TTS (speech synthesis)	✅ Kokoro + Piper	✅ TTS-1	✅ Yes	❌ STT only
Realtime WebSocket	✅ `/v1/realtime`	✅ Yes	❌ No	✅ Yes
Self-hostable	✅ Yes	❌ No	❌ No	❌ No
Multilingual (20+ langs)	✅ Piper	✅ 99 languages	✅ Yes	✅ Yes

Dependencies for Speaches Hosting

Railway account — Hobby plan (~$5–10/month) covers the service and model cache volume
No external API keys required — models download from HuggingFace at runtime for free
Optional: NVIDIA GPU for accelerated inference (not available on Railway Hobby plan)

Deployment Dependencies

Speaches GitHub Repository — source and releases
Speaches Documentation — full API and model configuration reference
HuggingFace — faster-whisper models — STT model options
HuggingFace — Kokoro TTS — TTS model
Railway Volumes Documentation — persistent storage setup

Implementation Details

This template deploys ghcr.io/speaches-ai/speaches:0.9.0-rc.3-cpu with a persistent Railway volume at /home/ubuntu/.cache/huggingface/hub. Models are downloaded from HuggingFace on first request and cached permanently — redeploys and version updates do not require re-downloading models. The CPU build uses int8 quantization by default, reducing memory usage by approximately 40% vs full precision.

The API server exposes OpenAI-compatible endpoints: /v1/audio/transcriptions for STT, /v1/audio/speech for TTS, and /v1/realtime for WebSocket-based two-way voice. Full OpenAPI documentation is available at /docs on your Railway domain after deploy.

Frequently Asked Questions

How much does Speaches cost on Railway vs OpenAI Whisper API? Speaches on Railway runs at ~$5–10/month flat with unlimited transcriptions and speech generation. OpenAI Whisper API charges $0.36/hour — at 30 hours of audio that's $10.80 in API costs before any TTS usage. ElevenLabs starts at $5/month but caps characters. Speaches gives you unlimited STT and TTS at flat compute pricing with no audio leaving your server.

Is Speaches a drop-in replacement for OpenAI's audio API? Yes. Speaches implements the same /v1/audio/transcriptions and /v1/audio/speech endpoints as OpenAI. Change the base URL in your OpenAI SDK to your Railway domain and set your API key — existing code works unchanged without any other modifications.

Do my audio files leave my Railway instance? No. All transcription and speech synthesis runs inside your Railway container using local model inference. No audio data is sent to any external API. This makes Speaches suitable for HIPAA-regulated audio, internal voice tools, and privacy-sensitive applications.

How long does the first transcription take? The first request triggers a model download from HuggingFace — faster-whisper tiny takes ~75 MB, small ~244 MB, and large-v3 ~1.5 GB. Subsequent requests use the cached model and run immediately. Use PRELOAD_MODELS to pre-download models at startup and eliminate cold-start delays on the first request.

What is the difference between Kokoro and Piper for TTS? Kokoro is ranked #1 on TTS Arena for voice quality — best for English speech generation where naturalness matters. Piper is optimised for speed and supports 20+ languages — best for multilingual applications or latency-sensitive deployments. Both are available in the same Speaches instance and selectable per API request.

Do I lose my downloaded models if Railway redeploys? No. All downloaded models are stored on the Railway persistent volume at /home/ubuntu/.cache/huggingface/hub, not inside the container. Redeploys, version updates, and container restarts do not require re-downloading models.

Why Deploy and Host Speaches on Railway?

Railway is a singular platform to deploy your infrastructure stack. Railway will host your infrastructure so you don't have to deal with configuration, while allowing you to vertically and horizontally scale it.

By deploying Speaches on Railway, you get a fully OpenAI-compatible STT/TTS server — local faster-whisper transcription, Kokoro and Piper speech synthesis, Realtime WebSocket support, and persistent model caching — at ~$5–10/month flat with no per-minute fees and no audio ever leaving your infrastructure.