
Deploy FreeLLMAPI
One OpenAI endpoint. Sixteen free LLM providers. ~1.7B tokens per month.
freellmapi
Just deployed
/tmp
Deploy and Host FreeLLMAPI on Railway
One OpenAI-compatible endpoint. Sixteen free LLM providers. ~1.7B tokens per month.
! (unofficial template a.k.a "port" of official repo https://github.com/tashfeenahmed/freellmapi) !

About Hosting FreeLLMAPI
Aggregate the free tiers from Google, Groq, Cerebras, NVIDIA, Mistral, OpenRouter, GitHub Models, Cohere, Cloudflare, HuggingFace, Z.ai (Zhipu), Ollama, Kilo, Pollinations, LLM7, OVH AI Endpoints, and OpenCode Zen — plus custom OpenAI-compatible chat, embedding, image, and audio endpoints — behind a single /v1 API. Keys are stored encrypted. A router picks the best available model for each request, falls over to the next provider when one is rate-limited, and tracks per-key usage so you stay under every free-tier cap.
Common Use Cases
- OpenAI-compatible —
POST /v1/chat/completionsandGET /v1/modelswork with the official OpenAI SDKs and any OpenAI-compatible client (LangChain, LlamaIndex, Continue, Hermes, etc.). Just changebase_url. - Responses API —
POST /v1/responses(the wire format current Codex CLI versions require) is implemented as a translating shim over the same router, with full streaming events and tool calls. - Editor autocomplete —
POST /v1/completionstranslates legacy prompt/suffix requests into the same router, so VS Code ghost-text clients such as Continue can use FreeLLMAPI for inline suggestions. - Anthropic Messages API —
POST /v1/messages(plus/v1/messages/count_tokens) speaks Anthropic's wire format over the same router, so Claude Code and the official Anthropic SDKs run against your free pool.GET /v1/modelsis content-negotiated (Anthropic shape when the client sendsanthropic-version, OpenAI shape otherwise), and Claude families (opus/sonnet/haiku/default) map toautoor a pinned model on the Keys page. See Anthropic / Claude clients. - Image generation & text-to-speech —
POST /v1/images/generationsandPOST /v1/audio/speechroute across the providers that serve media models, including custom OpenAI-compatible media endpoints. Browse and toggle them on the dashboard's Models → Image / Audio tabs. - Streaming and non-streaming — Server-Sent Events for
stream: true, JSON response otherwise. Every provider adapter implements both. - Tool calling — OpenAI-style
tools/tool_choicerequests are passed through, and assistanttool_calls+toolrole follow-up messages round-trip across providers. - Embeddings —
/v1/embeddingswith family-based routing, including custom OpenAI-compatible embedding endpoints: failover only ever happens between providers serving the same model (vectors from different models are incompatible), never across models. See Embeddings. - Automatic fallover — If the chosen provider returns a 429, 5xx, or times out, the router skips it, puts the key on a short cooldown, and retries on the next model in your fallback chain (up to 20 attempts).
- Per-key rate tracking — RPM, RPD, TPM, and TPD counters per
(platform, model, key)so the router always picks a key that's under its caps. - Sticky sessions — Multi-turn conversations keep talking to the same model for 30 minutes to avoid the hallucination spike that comes from mid-conversation model switches.
- Encrypted key storage — API keys are encrypted with AES-256-GCM before hitting SQLite; decryption happens in-memory just before a request.
- Unified API key — Clients authenticate to your proxy with a single
freellmapi-…bearer token. You never expose upstream provider keys to your apps. - Dashboard login — The admin UI and all
/api/*routes are gated behind an email + password account (scrypt-hashed, session-token auth), set on first run. The/v1proxy keeps its own unified-key auth for apps. - Health checks — Periodic probes mark keys as
healthy,rate_limited,invalid, orerrorso the router skips dead ones automatically. - Admin dashboard — React + Vite UI to manage keys, reorder the fallback chain, inspect analytics, and run prompts in a playground. Dark mode included.
- Analytics — Per-request logging with latency, token counts, success rate, and per-provider breakdowns.
- Context handoff on model switch — Optional. When a session falls over to a different model, injects one compact system message so the new model knows it is continuing an existing task. Disabled by default; enable with
FREELLMAPI_CONTEXT_HANDOFF=on_model_switch. See Context Handoff. - Runs anywhere — Even a small ARM SBC (Raspberry Pi included). ~40 MB RSS at idle behind PM2 / systemd / whatever supervisor you prefer.
Dependencies for FreeLLMAPI Hosting
- Docker
- Docker Compose
- OpenSSL for generating
ENCRYPTION_KEY
Deployment Dependencies
- GitHub repository: https://github.com/tashfeenahmed/freellmapi
- Live model catalog: https://freellmapi.co
- Docker guide: https://github.com/tashfeenahmed/freellmapi/blob/main/docker/README.md
Implementation Details
FREEAPI_DB_PATH=/tmp/freellmapi.db
RAILWAY_RUN_UID=0
- freellmapi-volume is required to be mounted at
/tmp - SQLite data is persisted in the Railway volume so rebuilds do not reset the database (i hope)
Why Deploy FreeLLMAPI on Railway?
Railway is a singular platform to deploy your infrastructure stack. Railway will host your infrastructure so you don't have to deal with configuration, while allowing you to vertically and horizontally scale it.
By deploying FreeLLMAPI on Railway, you are one step closer to supporting a complete full-stack application with minimal burden. Host your servers, databases, AI agents, and more on Railway.
Template Content
freellmapi
ghcr.io/tashfeenahmed/freellmapi:latest