Deploy and Host FreeLLMAPI on Railway

One OpenAI-compatible endpoint. Sixteen free LLM providers. ~1.7B tokens per month.

! (unofficial template a.k.a "port" of official repo https://github.com/tashfeenahmed/freellmapi) !

Fallback chain with per-provider token budget

About Hosting FreeLLMAPI

Aggregate the free tiers from Google, Groq, Cerebras, NVIDIA, Mistral, OpenRouter, GitHub Models, Cohere, Cloudflare, HuggingFace, Z.ai (Zhipu), Ollama, Kilo, Pollinations, LLM7, OVH AI Endpoints, and OpenCode Zen — plus custom OpenAI-compatible chat, embedding, image, and audio endpoints — behind a single /v1 API. Keys are stored encrypted. A router picks the best available model for each request, falls over to the next provider when one is rate-limited, and tracks per-key usage so you stay under every free-tier cap.

Common Use Cases

OpenAI-compatible — POST /v1/chat/completions and GET /v1/models work with the official OpenAI SDKs and any OpenAI-compatible client (LangChain, LlamaIndex, Continue, Hermes, etc.). Just change base_url.
Responses API — POST /v1/responses (the wire format current Codex CLI versions require) is implemented as a translating shim over the same router, with full streaming events and tool calls.
Editor autocomplete — POST /v1/completions translates legacy prompt/suffix requests into the same router, so VS Code ghost-text clients such as Continue can use FreeLLMAPI for inline suggestions.
Anthropic Messages API — POST /v1/messages (plus /v1/messages/count_tokens) speaks Anthropic's wire format over the same router, so Claude Code and the official Anthropic SDKs run against your free pool. GET /v1/models is content-negotiated (Anthropic shape when the client sends anthropic-version, OpenAI shape otherwise), and Claude families (opus / sonnet / haiku / default) map to auto or a pinned model on the Keys page. See Anthropic / Claude clients.
Image generation & text-to-speech — POST /v1/images/generations and POST /v1/audio/speech route across the providers that serve media models, including custom OpenAI-compatible media endpoints. Browse and toggle them on the dashboard's Models → Image / Audio tabs.
Streaming and non-streaming — Server-Sent Events for stream: true, JSON response otherwise. Every provider adapter implements both.
Tool calling — OpenAI-style tools / tool_choice requests are passed through, and assistant tool_calls + tool role follow-up messages round-trip across providers.
Embeddings — /v1/embeddings with family-based routing, including custom OpenAI-compatible embedding endpoints: failover only ever happens between providers serving the same model (vectors from different models are incompatible), never across models. See Embeddings.
Automatic fallover — If the chosen provider returns a 429, 5xx, or times out, the router skips it, puts the key on a short cooldown, and retries on the next model in your fallback chain (up to 20 attempts).
Per-key rate tracking — RPM, RPD, TPM, and TPD counters per (platform, model, key) so the router always picks a key that's under its caps.
Sticky sessions — Multi-turn conversations keep talking to the same model for 30 minutes to avoid the hallucination spike that comes from mid-conversation model switches.
Encrypted key storage — API keys are encrypted with AES-256-GCM before hitting SQLite; decryption happens in-memory just before a request.
Unified API key — Clients authenticate to your proxy with a single freellmapi-… bearer token. You never expose upstream provider keys to your apps.
Dashboard login — The admin UI and all /api/* routes are gated behind an email + password account (scrypt-hashed, session-token auth), set on first run. The /v1 proxy keeps its own unified-key auth for apps.
Health checks — Periodic probes mark keys as healthy, rate_limited, invalid, or error so the router skips dead ones automatically.
Admin dashboard — React + Vite UI to manage keys, reorder the fallback chain, inspect analytics, and run prompts in a playground. Dark mode included.
Analytics — Per-request logging with latency, token counts, success rate, and per-provider breakdowns.
Context handoff on model switch — Optional. When a session falls over to a different model, injects one compact system message so the new model knows it is continuing an existing task. Disabled by default; enable with FREELLMAPI_CONTEXT_HANDOFF=on_model_switch. See Context Handoff.
Runs anywhere — Even a small ARM SBC (Raspberry Pi included). ~40 MB RSS at idle behind PM2 / systemd / whatever supervisor you prefer.

Dependencies for FreeLLMAPI Hosting

Docker
Docker Compose
OpenSSL for generating ENCRYPTION_KEY

Deployment Dependencies

GitHub repository: https://github.com/tashfeenahmed/freellmapi
Live model catalog: https://freellmapi.co
Docker guide: https://github.com/tashfeenahmed/freellmapi/blob/main/docker/README.md

Implementation Details

FREEAPI_DB_PATH=/tmp/freellmapi.db
RAILWAY_RUN_UID=0

freellmapi-volume is required to be mounted at /tmp
SQLite data is persisted in the Railway volume so rebuilds do not reset the database (i hope)

Why Deploy FreeLLMAPI on Railway?

Railway is a singular platform to deploy your infrastructure stack. Railway will host your infrastructure so you don't have to deal with configuration, while allowing you to vertically and horizontally scale it.

By deploying FreeLLMAPI on Railway, you are one step closer to supporting a complete full-stack application with minimal burden. Host your servers, databases, AI agents, and more on Railway.