Deploy qwen3-embedding-4b
Deploy Qwen3-Embedding(or other models) use llama.cpp
qwen3-embedding-railway
Just deployed
/models
Deploy and Host genuine-appreciation on Railway
Run a production-ready llama.cpp server that serves OpenAI-compatible embeddings from GGUF models. It auto-downloads the specified model from Hugging Face or ModelScope on first start and persists it in a Railway volume to avoid re-downloading on redeploys. Authentication is enforced via API key.
About Hosting genuine-appreciation
This template builds and deploys a Dockerized llama.cpp server configured for embeddings. On startup, it checks a persistent volume for the GGUF model; if missing, it fetches it from your chosen platform (Hugging Face or ModelScope). Core runtime settings—port, model name, filename, context size—are injected via environment variables. An API_KEY is required and used for Bearer-token auth. The template includes a healthcheck and uses Railway volumes to persist the model across deployments.
Common Use Cases
- Serve embeddings for applications (RAG, search, similarity) via an OpenAI-compatible API
- Host Qwen3 Embedding GGUF models with simple, scalable deployment
- Rapid prototyping of embedding-powered features without managing servers manually
Dependencies for genuine-appreciation Hosting
- Railway account (to deploy and manage environment variables/volumes)
- Optionally: Hugging Face token (HF_TOKEN) if downloading private or restricted models
Deployment Dependencies
llama.cpp server image (Docker): ghcr.io/ggml-org/llama.cpp:server Hugging Face: https://huggingface.co/ ModelScope: https://modelscope.cn/ Railway Volumes: https://docs.railway.app/develop/volumes Railway Variables: https://docs.railway.app/develop/variables
Implementation Details
Environment variables:
Required: API_KEY Defaults (overridable in Railway): PORT=8080 PLATFORM=hf (supports hf or modelscope) MODEL_NAME=Qwen/Qwen3-Embedding-4B-GGUF MODEL_FILENAME=Qwen3-Embedding-4B-Q4_K_M.gguf CONTEXT_SIZE=40960 Optional: HF_TOKEN (for Hugging Face gated/private models), MODEL_URL (explicit artifact URL) API:
Embeddings: POST /v1/embeddings with Authorization: Bearer ${API_KEY} Why Deploy llama-gguf-embeddings on Railway? Railway is a singular platform to deploy your infrastructure stack. Railway will host your infrastructure so you don't have to deal with configuration, while allowing you to vertically and horizontally scale it.
By deploying llama-gguf-embeddings on Railway, you are one step closer to supporting a complete full-stack application with minimal burden. Host your servers, databases, AI agents, and more on Railway.
Why Deploy genuine-appreciation on Railway?
Railway is a singular platform to deploy your infrastructure stack. Railway will host your infrastructure so you don't have to deal with configuration, while allowing you to vertically and horizontally scale it.
By deploying genuine-appreciation on Railway, you are one step closer to supporting a complete full-stack application with minimal burden. Host your servers, databases, AI agents, and more on Railway.
Template Content
qwen3-embedding-railway
xkos/qwen3-embedding-railwayAPI_KEY