Deploy Llama in Cloud

[Nov '25] Host Meta's Llama models privately, using Ollama + OpenWebUI.

Deploy Llama in Cloud

Just deployed

/root/.ollama

Just deployed

/app/backend/data

Llama Meta

Deploy and Host Llama on Railway

Run Meta's Llama models with Ollama backend and OpenWebUI frontend on Railway's infrastructure. This template gives you a private, ChatGPT-like interface for running Llama 3.2 and other open-source models without sending data to third-party APIs.

What is Llama?

Llama is Meta's family of open-source large language models, ranging from 1B to 405B parameters. These models handle text generation, coding assistance, and reasoning tasks. Unlike proprietary APIs, Llama runs entirely on your infrastructure—no external dependencies, no usage caps, no data leaving your server.

About Hosting Llama

Deploying Llama requires two components: an inference engine (Ollama) and a web interface (OpenWebUI). Ollama downloads and serves the model via API, while OpenWebUI provides the chat interface. Railway handles both services, connects them privately, and manages model downloads at boot time. The template pre-configures environment variables so Ollama listens on all network interfaces and OpenWebUI connects through Railway's internal networking—no manual configuration needed.

Getting Started

1. Deploy the Template Deploy on Railway Click Railway's one-click deploy button. Railway provisions two services: ollama and openwebui, connects them via private networking, and attaches storage volumes.

2. Configure Your Model Set OLLAMA_DEFAULT_MODELS to your preferred variant:

  • llama3.2:1b - Fastest, 2GB RAM, good for simple tasks
  • llama3.2:3b - Default, 4GB RAM, best quality-to-cost ratio
  • llama3.1:70b - Powerful, 40GB RAM, requires Railway Pro plan

First deployment takes 2-5 minutes as Ollama downloads model files.

3. Access OpenWebUI Railway generates a public URL for OpenWebUI (check service logs). First visit prompts you to create an admin account—this is local to your instance, not shared with Railway.

4. Start Using Llama OpenWebUI automatically detects models available through Ollama. Select your model from the dropdown and start chatting. Chat history persists across deployments.

Environment Variables Explained

Ollama Service:

  • OLLAMA_HOST="::" - Binds to all network interfaces so OpenWebUI can reach it
  • OLLAMA_ORIGINS="http://${{RAILWAY_PRIVATE_DOMAIN}}:*" - CORS whitelist for OpenWebUI's internal requests
  • OLLAMA_DEFAULT_MODELS="llama3.2:3b" - Model(s) to auto-download at boot (comma-separated for multiple)

OpenWebUI Service:

  • OLLAMA_BASE_URL="http://${{ollama.RAILWAY_PRIVATE_DOMAIN}}:11434" - Points to Ollama via Railway's private DNS
  • WEBUI_SECRET_KEY="${{secret(32)}}" - Encrypts session cookies (Railway auto-generates)
  • CORS_ALLOW_ORIGIN="*" - Allows browser access from OpenWebUI's public URL

🔧 How to Add or Modify Models in OpenWebUI

  1. Go to Profile → Admin Panel → Settings → Models.
  2. Click Manage Models on the top right.
  3. You’ll already see your Ollama endpoint prefilled as: http://ollama.railway.internal:11434
  4. Pull any model you like from the Ollama Model Library.
    (Make sure your instance has enough RAM to fit the model.)
  5. Return to the WebUI homepage — your new model will appear on the top left.

✅ That’s it! You can now chat, test, and prototype AI ideas right in your browser.

Common Use Cases

  • Private AI Assistant: Run models for internal teams without sending prompts to OpenAI or Anthropic. Your conversations stay on Railway's infrastructure with no external logging.

  • Prototyping AI Features: Test different Llama variants (3.2:1b for speed, 3.2:3b for quality) before committing to a commercial API. Switch models with one environment variable change.

  • Compliance-Sensitive Work: Healthcare, legal, or financial use cases that prohibit sending data to third-party LLM providers. Llama on Railway keeps everything self-hosted.

  • Custom Model Fine-Tuning: Load your own fine-tuned Llama variants through Ollama's model library system. OpenWebUI connects to any model Ollama serves.

Dependencies for Llama Hosting

  • Ollama: Inference engine that downloads, loads, and serves Llama models via HTTP API
  • OpenWebUI: Self-hosted web interface with chat history, model switching, and RAG support
  • Railway Private Networking: Internal DNS that connects OpenWebUI to Ollama without exposing public endpoints
  • Persistent Storage: Railway volumes for model files (Ollama) and user data (OpenWebUI)

Deployment Dependencies

Implementation Details

Llama vs Competing Models

FeatureLlama 3.2GPT-4Claude 3.5MistralDeepSeek
HostingSelf-hostedAPI onlyAPI onlySelf-hostedSelf-hosted
Cost at ScaleFixed infra cost$10-60 per 1M tokens$3-15 per 1M tokensFixed infra costFixed infra cost
Data PrivacyFull controlSent to OpenAISent to AnthropicFull controlFull control
Model Sizes1B to 405BUnknownUnknown7B to 22B1.5B to 671B
Offline UseYesNoNoYesYes
CustomizationFine-tuning allowedForbiddenForbiddenFine-tuning allowedFine-tuning allowed

Llama comparison

Railway vs Other Deployment Platforms

PlatformBest ForPricing ModelLlama Support
RailwayLong-running servicesPay-per-use (CPU/RAM)Native container support
VercelFrontend appsServerless functionsNot suitable (15-min timeout)
HerokuTraditional appsFixed dyno pricingWorks but expensive at scale
DigitalOceanFull controlFixed droplet pricingManual setup required
Fly.ioEdge deploymentPay-per-use (similar to Railway)Good for multi-region

When This Template Is the Right Choice

✓ Use this if:

  • Monthly token usage exceeds 500k (cost breaks even vs APIs)
  • You need guaranteed data privacy (healthcare, legal, internal tools)
  • You want to fine-tune models on proprietary data
  • You're prototyping before committing to expensive APIs
  • Your use case requires offline deployment

✗ Don't use this if:

  • You need absolute best-in-class quality (GPT-4 still wins)
  • Monthly usage is under 100k tokens (APIs cheaper for low volume)
  • You lack technical resources to debug deployment issues
  • You need instant cold-start responses (Ollama requires loaded model)

Troubleshooting

"Cannot connect to Ollama API": Check that OLLAMA_BASE_URL uses Railway's private domain variable, not a hardcoded IP. Railway reassigns internal IPs on redeploys.

Model download stuck: Ollama downloads run in background. Check Ollama service logs for progress. Large models (70b+) take 10-20 minutes depending on network speed.

Out of memory errors: Your selected model exceeds available RAM. Switch to smaller variant (llama3.2:1b) or upgrade Railway plan for more memory allocation.

OpenWebUI blank screen: Usually CORS misconfiguration. Verify CORS_ALLOW_ORIGIN="*" is set. For production, narrow this to your domain.

FAQ

Q: Can I use models other than Llama? A: Yes. Set OLLAMA_DEFAULT_MODELS to any model from ollama.com/library (Mistral, DeepSeek, CodeLlama, etc.). OpenWebUI works with all Ollama-compatible models.

Q: How do I update to newer Llama versions? A: Change OLLAMA_DEFAULT_MODELS to the new version tag (e.g., llama3.3:latest). Redeploy and Ollama downloads the update automatically.

Q: Does this work for production apps? A: Yes, but add monitoring. Railway provides basic metrics; consider adding Sentry or Prometheus for request tracking. Also set up automated backups for OpenWebUI's volume (chat history).

Q: Can I connect my own frontend instead of OpenWebUI? A: Absolutely. Ollama exposes an OpenAI-compatible API at http://ollama:11434/v1. Any OpenAI SDK works with minimal config changes.

Why Deploy DeepSeek on Railway?

Railway is a singular platform to deploy your infrastructure stack. Railway will host your infrastructure so you don't have to deal with configuration, while allowing you to vertically and horizontally scale it.

By deploying DeepSeek on Railway, you are one step closer to supporting a complete full-stack application with minimal burden. Host your servers, databases, AI agents, and more on Railway.


Template Content

More templates in this category

View Template
Chat Chat
Chat Chat, your own unified chat and search to AI platform.

View Template
openui
Deploy OpenUI: AI-powered UI generation with GitHub OAuth and OpenAI API.

View Template
firecrawl
firecrawl api server + worker without auth, works with dify