Deploy and Host genuine-appreciation on Railway

Run a production-ready llama.cpp server that serves OpenAI-compatible embeddings from GGUF models. It auto-downloads the specified model from Hugging Face or ModelScope on first start and persists it in a Railway volume to avoid re-downloading on redeploys. Authentication is enforced via API key.

About Hosting genuine-appreciation

This template builds and deploys a Dockerized llama.cpp server configured for embeddings. On startup, it checks a persistent volume for the GGUF model; if missing, it fetches it from your chosen platform (Hugging Face or ModelScope). Core runtime settings—port, model name, filename, context size—are injected via environment variables. An API_KEY is required and used for Bearer-token auth. The template includes a healthcheck and uses Railway volumes to persist the model across deployments.

Common Use Cases

Serve embeddings for applications (RAG, search, similarity) via an OpenAI-compatible API
Host Qwen3 Embedding GGUF models with simple, scalable deployment
Rapid prototyping of embedding-powered features without managing servers manually

Dependencies for genuine-appreciation Hosting

Railway account (to deploy and manage environment variables/volumes)
Optionally: Hugging Face token (HF_TOKEN) if downloading private or restricted models

Deployment Dependencies

llama.cpp server image (Docker): ghcr.io/ggml-org/llama.cpp:server Hugging Face: https://huggingface.co/ ModelScope: https://modelscope.cn/ Railway Volumes: https://docs.railway.app/develop/volumes Railway Variables: https://docs.railway.app/develop/variables

Implementation Details

Environment variables:

Required: API_KEY Defaults (overridable in Railway): PORT=8080 PLATFORM=hf (supports hf or modelscope) MODEL_NAME=Qwen/Qwen3-Embedding-4B-GGUF MODEL_FILENAME=Qwen3-Embedding-4B-Q4_K_M.gguf CONTEXT_SIZE=40960 Optional: HF_TOKEN (for Hugging Face gated/private models), MODEL_URL (explicit artifact URL) API:

Embeddings: POST /v1/embeddings with Authorization: Bearer ${API_KEY} Why Deploy llama-gguf-embeddings on Railway? Railway is a singular platform to deploy your infrastructure stack. Railway will host your infrastructure so you don't have to deal with configuration, while allowing you to vertically and horizontally scale it.

By deploying llama-gguf-embeddings on Railway, you are one step closer to supporting a complete full-stack application with minimal burden. Host your servers, databases, AI agents, and more on Railway.

Why Deploy genuine-appreciation on Railway?

Railway is a singular platform to deploy your infrastructure stack. Railway will host your infrastructure so you don't have to deal with configuration, while allowing you to vertically and horizontally scale it.

By deploying genuine-appreciation on Railway, you are one step closer to supporting a complete full-stack application with minimal burden. Host your servers, databases, AI agents, and more on Railway.