Deploy Qwen3 Chat

[Dec '25] Self-host Alibaba's Qwen models locally, using Ollama+OpenWebUI.

Deploy Qwen3 Chat

Just deployed

/app/backend/data

Just deployed

/root/.ollama

QWEN3

Deploy and Host Qwen

Qwen3 is Alibaba's open-source large language model series, offering efficient performance with smaller parameter sizes (0.6B to 235B). It excels at multilingual tasks, code generation, and reasoning while using significantly less memory than GPT-class models. You can run Qwen3 locally or on private infrastructure without API costs.

About Hosting Qwen

This template deploys Ollama, pulls Qwen3 automatically at boot, and boots up OpenWebUI so you have a clean chat interface the moment your service goes live. Instead of provisioning GPU servers or manually wiring API routes, Railway builds, boots, and networks both services for you.

You get: – An Ollama runtime hosting the Qwen3 model – A private API endpoint for inference – A browser-based chat UI to test the model immediately – Full flexibility to pull additional models, swap versions, or load your own modelfile

This is one of the fastest ways to self-host an open-source model without dealing with CUDA versions, driver mismatches, or system-level setup.

Getting Started in 2 minutes

  1. Click Deploy on Railway: The template creates two services (Ollama + Open WebUI) with pre-configured networking. Deploy on Railway

  2. Wait for Model Download: Ollama pulls Qwen3:1.7b (~2GB). Check deployment logs—you'll see "successfully loaded model" when ready (2 minutes).

  3. Access Open WebUI: Click the public URL Railway generates for the Open WebUI service. You'll see a ChatGPT-like interface.

  4. Create Your Account: Open WebUI requires signup on first visit. Your credentials stay local—no data leaves your Railway deployment.

  5. Start Chatting: Select "qwen3:1.7b" from the model dropdown. Your first message might take 10-15 seconds as Ollama loads the model into memory.

Common Use Cases

• Running Qwen3 for chat, reasoning, or coding without paying for API tokens • Prototyping apps powered by Qwen3 through a simple /api/generate endpoint • Private inference for internal tools, customer data, or R&D • RAG experiments using Qwen3 + embeddings via Ollama • Hosting multiple models (Qwen, Llama, Mistral, DeepSeek) behind one API • Self-hosted chatbot with OpenWebUI’s model switching and prompt tools

Qwen3 vs Other Open-Source Models

Qwen3 competes directly with Llama 3.1, Mistral, Phi 3, and DeepSeek, but offers strong multilingual capabilities, excellent coding performance, and efficient small-model variants like 1.7B that run well on CPU-only environments. Compared to Llama and Mistral, Qwen3’s smaller models tend to be more capable in reasoning and follow instructions more reliably. For cost-conscious deployments or CPU-only environments, Qwen3 is one of the best choices available.

Railway vs Other Hosting Options

Railway gives you immediate container deployments with private networking between Ollama and OpenWebUI. • Faster setup than AWS or GCP • Fewer configuration steps than Docker on VPS • No GPU driver issues • One-click redeploys and secret management • Simple logs, persistent storage, and environment controls

For users who want to self-host a model quickly and reliably, Railway is a sweet spot between full infrastructure control and painless developer experience.

Dependencies for Qwen Hosting

Ollama: Model serving engine that downloads and runs Qwen3. Handles model loading, tokenization, and inference requests. Version 0.1.0+ required.

Open WebUI: ChatGPT-style frontend with conversation management, markdown rendering, and model switching. Connects to Ollama via internal Railway networking.

Qwen3 Model Files: Downloaded automatically at boot from Ollama's library. The 1.7B variant requires ~2GB disk space; larger models (7B, 14B, 72B) scale linearly.

Railway Volumes (Recommended): Attach persistent storage to cache downloaded models. Without volumes, Railway re-downloads Qwen3 on every deployment, adding 2-5 minutes to startup time.

Deployment Dependencies

Implementation Details

[Include any code snippets or implementation details. This section is optional. Exclude if nothing to add.]

Environment Variables

Your template comes preconfigured with the following variables for both Ollama and OpenWebUI:

Ollama Variables – OLLAMA_HOST: Allows Ollama to listen on all interfaces. – OLLAMA_ORIGINS: Sets allowed CORS origins when OpenWebUI is hosted separately. – OLLAMA_DEFAULT_MODELS: The model that Ollama will automatically download at boot (qwen3:1.7b).

OpenWebUI Variables – OLLAMA_BASE_URL: Points OpenWebUI to the private internal Ollama API endpoint. – WEBUI_SECRET_KEY: Used by OpenWebUI to secure sessions and authentication. – CORS_ALLOW_ORIGIN: Allows the UI to connect from any origin.

These variables are pre-filled so everything works out-of-the-box the moment the deployment boots.

Pricing and Cost Expectations

Running Qwen3 1.7B is extremely lightweight. You can usually run it comfortably on CPU-only Railway plans. Larger models may require higher RAM tiers. You pay only for the compute and storage you allocate—no per-token charges like cloud APIs.

Switching to Larger Qwen3 Models

Change OLLAMA_DEFAULT_MODELS to upgrade:

  • qwen3:1.7b: Fast responses, good for simple queries (2GB RAM)
  • qwen3:7b: Balanced performance, handles complex reasoning (8GB RAM)
  • qwen3:14b: Near-GPT-3.5 quality, slower inference (16GB RAM)
  • qwen3:72b: Best quality, requires Railway Pro plan (64GB+ RAM)

Railway will re-download the model on next deployment. Attach a volume to /root/.ollama/models to persist downloads across deploys.

When to Use Qwen3

Best For:

  • Teams building internal AI tools without per-token costs
  • Developers testing LLM integrations before production
  • Projects requiring data privacy (healthcare, legal, finance)
  • Multilingual applications where Qwen3 outperforms English-only models

FAQ

Can I use multiple Qwen3 models simultaneously?
Yes. Set OLLAMA_DEFAULT_MODELS="qwen3:1.7b,qwen3:7b" to download multiple models. Open WebUI lets you switch between them in the interface dropdown. Each model consumes memory only when actively generating responses.

How much does this cost on Railway?
Qwen3:1.7b runs comfortably on the $5/month Hobby plan. Qwen3:7b needs the $20/month Pro plan. Add $1-3/month for persistent volumes to cache models.

Can I connect other frontends besides Open WebUI?
Absolutely. Ollama exposes an OpenAI-compatible API at http://${{ollama.RAILWAY_PRIVATE_DOMAIN}}:11434/v1. Use it with LibreChat, Chatbot UI, or custom React/Next.js apps by swapping the API endpoint.

How do I enable authentication for Open WebUI?
Open WebUI requires account creation by default. Configure admin-only signups by setting WEBUI_AUTH=true and ENABLE_SIGNUP=false in Open WebUI's environment variables. The first user becomes admin automatically.

Why Deploy Qwen on Railway?

Railway is a singular platform to deploy your infrastructure stack. Railway will host your infrastructure so you don't have to deal with configuration, while allowing you to vertically and horizontally scale it.

By deploying Qwen on Railway, you are one step closer to supporting a complete full-stack application with minimal burden. Host your servers, databases, AI agents, and more on Railway.


Template Content

More templates in this category

View Template
Chat Chat
Chat Chat, your own unified chat and search to AI platform.

View Template
openui
Deploy OpenUI: AI-powered UI generation with GitHub OAuth and OpenAI API.

View Template
firecrawl
firecrawl api server + worker without auth, works with dify