Deploy Qwen3 Chat
[Dec '25] Self-host Alibaba's Qwen models locally, using Ollama+OpenWebUI.
Open-WebUI
Just deployed
/app/backend/data
qwen3
Just deployed
/root/.ollama
Deploy and Host Qwen
Qwen3 is Alibaba's open-source large language model series, offering efficient performance with smaller parameter sizes (0.6B to 235B). It excels at multilingual tasks, code generation, and reasoning while using significantly less memory than GPT-class models. You can run Qwen3 locally or on private infrastructure without API costs.
About Hosting Qwen
This template deploys Ollama, pulls Qwen3 automatically at boot, and boots up OpenWebUI so you have a clean chat interface the moment your service goes live. Instead of provisioning GPU servers or manually wiring API routes, Railway builds, boots, and networks both services for you.
You get: – An Ollama runtime hosting the Qwen3 model – A private API endpoint for inference – A browser-based chat UI to test the model immediately – Full flexibility to pull additional models, swap versions, or load your own modelfile
This is one of the fastest ways to self-host an open-source model without dealing with CUDA versions, driver mismatches, or system-level setup.
Getting Started in 2 minutes
-
Click Deploy on Railway: The template creates two services (Ollama + Open WebUI) with pre-configured networking.
-
Wait for Model Download: Ollama pulls Qwen3:1.7b (~2GB). Check deployment logs—you'll see "successfully loaded model" when ready (2 minutes).
-
Access Open WebUI: Click the public URL Railway generates for the Open WebUI service. You'll see a ChatGPT-like interface.
-
Create Your Account: Open WebUI requires signup on first visit. Your credentials stay local—no data leaves your Railway deployment.
-
Start Chatting: Select "qwen3:1.7b" from the model dropdown. Your first message might take 10-15 seconds as Ollama loads the model into memory.
Common Use Cases
• Running Qwen3 for chat, reasoning, or coding without paying for API tokens
• Prototyping apps powered by Qwen3 through a simple /api/generate endpoint
• Private inference for internal tools, customer data, or R&D
• RAG experiments using Qwen3 + embeddings via Ollama
• Hosting multiple models (Qwen, Llama, Mistral, DeepSeek) behind one API
• Self-hosted chatbot with OpenWebUI’s model switching and prompt tools
Qwen3 vs Other Open-Source Models
Qwen3 competes directly with Llama 3.1, Mistral, Phi 3, and DeepSeek, but offers strong multilingual capabilities, excellent coding performance, and efficient small-model variants like 1.7B that run well on CPU-only environments. Compared to Llama and Mistral, Qwen3’s smaller models tend to be more capable in reasoning and follow instructions more reliably. For cost-conscious deployments or CPU-only environments, Qwen3 is one of the best choices available.
Railway vs Other Hosting Options
Railway gives you immediate container deployments with private networking between Ollama and OpenWebUI. • Faster setup than AWS or GCP • Fewer configuration steps than Docker on VPS • No GPU driver issues • One-click redeploys and secret management • Simple logs, persistent storage, and environment controls
For users who want to self-host a model quickly and reliably, Railway is a sweet spot between full infrastructure control and painless developer experience.
Dependencies for Qwen Hosting
Ollama: Model serving engine that downloads and runs Qwen3. Handles model loading, tokenization, and inference requests. Version 0.1.0+ required.
Open WebUI: ChatGPT-style frontend with conversation management, markdown rendering, and model switching. Connects to Ollama via internal Railway networking.
Qwen3 Model Files: Downloaded automatically at boot from Ollama's library. The 1.7B variant requires ~2GB disk space; larger models (7B, 14B, 72B) scale linearly.
Railway Volumes (Recommended): Attach persistent storage to cache downloaded models. Without volumes, Railway re-downloads Qwen3 on every deployment, adding 2-5 minutes to startup time.
Deployment Dependencies
- Ollama Library: ollama.com/library – Browse all available models
- Open WebUI Docs: docs.openwebui.com – Frontend customization guides
- Qwen3 Model Cards: huggingface.co/Qwen – Technical specs and benchmarks
- Railway Documentation: docs.railway.app – Platform limits and pricing
Implementation Details
[Include any code snippets or implementation details. This section is optional. Exclude if nothing to add.]
Environment Variables
Your template comes preconfigured with the following variables for both Ollama and OpenWebUI:
Ollama Variables
– OLLAMA_HOST: Allows Ollama to listen on all interfaces.
– OLLAMA_ORIGINS: Sets allowed CORS origins when OpenWebUI is hosted separately.
– OLLAMA_DEFAULT_MODELS: The model that Ollama will automatically download at boot (qwen3:1.7b).
OpenWebUI Variables – OLLAMA_BASE_URL: Points OpenWebUI to the private internal Ollama API endpoint. – WEBUI_SECRET_KEY: Used by OpenWebUI to secure sessions and authentication. – CORS_ALLOW_ORIGIN: Allows the UI to connect from any origin.
These variables are pre-filled so everything works out-of-the-box the moment the deployment boots.
Pricing and Cost Expectations
Running Qwen3 1.7B is extremely lightweight. You can usually run it comfortably on CPU-only Railway plans. Larger models may require higher RAM tiers. You pay only for the compute and storage you allocate—no per-token charges like cloud APIs.
Switching to Larger Qwen3 Models
Change OLLAMA_DEFAULT_MODELS to upgrade:
- qwen3:1.7b: Fast responses, good for simple queries (2GB RAM)
- qwen3:7b: Balanced performance, handles complex reasoning (8GB RAM)
- qwen3:14b: Near-GPT-3.5 quality, slower inference (16GB RAM)
- qwen3:72b: Best quality, requires Railway Pro plan (64GB+ RAM)
Railway will re-download the model on next deployment. Attach a volume to /root/.ollama/models to persist downloads across deploys.
When to Use Qwen3
Best For:
- Teams building internal AI tools without per-token costs
- Developers testing LLM integrations before production
- Projects requiring data privacy (healthcare, legal, finance)
- Multilingual applications where Qwen3 outperforms English-only models
FAQ
Can I use multiple Qwen3 models simultaneously?
Yes. Set OLLAMA_DEFAULT_MODELS="qwen3:1.7b,qwen3:7b" to download multiple models. Open WebUI lets you switch between them in the interface dropdown. Each model consumes memory only when actively generating responses.
How much does this cost on Railway?
Qwen3:1.7b runs comfortably on the $5/month Hobby plan. Qwen3:7b needs the $20/month Pro plan. Add $1-3/month for persistent volumes to cache models.
Can I connect other frontends besides Open WebUI?
Absolutely. Ollama exposes an OpenAI-compatible API at http://${{ollama.RAILWAY_PRIVATE_DOMAIN}}:11434/v1. Use it with LibreChat, Chatbot UI, or custom React/Next.js apps by swapping the API endpoint.
How do I enable authentication for Open WebUI?
Open WebUI requires account creation by default. Configure admin-only signups by setting WEBUI_AUTH=true and ENABLE_SIGNUP=false in Open WebUI's environment variables. The first user becomes admin automatically.
Why Deploy Qwen on Railway?
Railway is a singular platform to deploy your infrastructure stack. Railway will host your infrastructure so you don't have to deal with configuration, while allowing you to vertically and horizontally scale it.
By deploying Qwen on Railway, you are one step closer to supporting a complete full-stack application with minimal burden. Host your servers, databases, AI agents, and more on Railway.
Template Content
Open-WebUI
ghcr.io/open-webui/open-webuiqwen3
ollama/ollama
