Railway

Deploy FreeLLMAPI

One OpenAI endpoint. Sixteen free LLM providers. ~1.7B tokens per month.

Deploy FreeLLMAPI

Just deployed

/tmp

Deploy and Host FreeLLMAPI on Railway

One OpenAI-compatible endpoint. Sixteen free LLM providers. ~1.7B tokens per month.

! (unofficial template a.k.a "port" of official repo https://github.com/tashfeenahmed/freellmapi) !

Fallback chain with per-provider token budget

About Hosting FreeLLMAPI

Aggregate the free tiers from Google, Groq, Cerebras, NVIDIA, Mistral, OpenRouter, GitHub Models, Cohere, Cloudflare, HuggingFace, Z.ai (Zhipu), Ollama, Kilo, Pollinations, LLM7, OVH AI Endpoints, and OpenCode Zen — plus custom OpenAI-compatible chat, embedding, image, and audio endpoints — behind a single /v1 API. Keys are stored encrypted. A router picks the best available model for each request, falls over to the next provider when one is rate-limited, and tracks per-key usage so you stay under every free-tier cap.

Common Use Cases

  • OpenAI-compatiblePOST /v1/chat/completions and GET /v1/models work with the official OpenAI SDKs and any OpenAI-compatible client (LangChain, LlamaIndex, Continue, Hermes, etc.). Just change base_url.
  • Responses APIPOST /v1/responses (the wire format current Codex CLI versions require) is implemented as a translating shim over the same router, with full streaming events and tool calls.
  • Editor autocompletePOST /v1/completions translates legacy prompt/suffix requests into the same router, so VS Code ghost-text clients such as Continue can use FreeLLMAPI for inline suggestions.
  • Anthropic Messages APIPOST /v1/messages (plus /v1/messages/count_tokens) speaks Anthropic's wire format over the same router, so Claude Code and the official Anthropic SDKs run against your free pool. GET /v1/models is content-negotiated (Anthropic shape when the client sends anthropic-version, OpenAI shape otherwise), and Claude families (opus / sonnet / haiku / default) map to auto or a pinned model on the Keys page. See Anthropic / Claude clients.
  • Image generation & text-to-speechPOST /v1/images/generations and POST /v1/audio/speech route across the providers that serve media models, including custom OpenAI-compatible media endpoints. Browse and toggle them on the dashboard's Models → Image / Audio tabs.
  • Streaming and non-streaming — Server-Sent Events for stream: true, JSON response otherwise. Every provider adapter implements both.
  • Tool calling — OpenAI-style tools / tool_choice requests are passed through, and assistant tool_calls + tool role follow-up messages round-trip across providers.
  • Embeddings/v1/embeddings with family-based routing, including custom OpenAI-compatible embedding endpoints: failover only ever happens between providers serving the same model (vectors from different models are incompatible), never across models. See Embeddings.
  • Automatic fallover — If the chosen provider returns a 429, 5xx, or times out, the router skips it, puts the key on a short cooldown, and retries on the next model in your fallback chain (up to 20 attempts).
  • Per-key rate tracking — RPM, RPD, TPM, and TPD counters per (platform, model, key) so the router always picks a key that's under its caps.
  • Sticky sessions — Multi-turn conversations keep talking to the same model for 30 minutes to avoid the hallucination spike that comes from mid-conversation model switches.
  • Encrypted key storage — API keys are encrypted with AES-256-GCM before hitting SQLite; decryption happens in-memory just before a request.
  • Unified API key — Clients authenticate to your proxy with a single freellmapi-… bearer token. You never expose upstream provider keys to your apps.
  • Dashboard login — The admin UI and all /api/* routes are gated behind an email + password account (scrypt-hashed, session-token auth), set on first run. The /v1 proxy keeps its own unified-key auth for apps.
  • Health checks — Periodic probes mark keys as healthy, rate_limited, invalid, or error so the router skips dead ones automatically.
  • Admin dashboard — React + Vite UI to manage keys, reorder the fallback chain, inspect analytics, and run prompts in a playground. Dark mode included.
  • Analytics — Per-request logging with latency, token counts, success rate, and per-provider breakdowns.
  • Context handoff on model switch — Optional. When a session falls over to a different model, injects one compact system message so the new model knows it is continuing an existing task. Disabled by default; enable with FREELLMAPI_CONTEXT_HANDOFF=on_model_switch. See Context Handoff.
  • Runs anywhere — Even a small ARM SBC (Raspberry Pi included). ~40 MB RSS at idle behind PM2 / systemd / whatever supervisor you prefer.

Dependencies for FreeLLMAPI Hosting

  • Docker
  • Docker Compose
  • OpenSSL for generating ENCRYPTION_KEY

Deployment Dependencies

Implementation Details

FREEAPI_DB_PATH=/tmp/freellmapi.db
RAILWAY_RUN_UID=0
  • freellmapi-volume is required to be mounted at /tmp
  • SQLite data is persisted in the Railway volume so rebuilds do not reset the database (i hope)

Why Deploy FreeLLMAPI on Railway?

Railway is a singular platform to deploy your infrastructure stack. Railway will host your infrastructure so you don't have to deal with configuration, while allowing you to vertically and horizontally scale it.

By deploying FreeLLMAPI on Railway, you are one step closer to supporting a complete full-stack application with minimal burden. Host your servers, databases, AI agents, and more on Railway.


Template Content

More templates in this category

View Template
Chat Chat
Chat Chat, your own unified chat and search to AI platform.

okisdev
112
View Template
Hermes Agent | OpenClaw Alternative with Dashboard
[Jun'26] Self-improving AI agent with memory, skills, and web dashboard 🤖

codestorm
42
View Template
EchoDeck
Generate a mp4 from powerpoint with TTS

Fixed Scope
7