Deploy GPT-OSS 120B

Self-host GPT-OSS 120B on Railway with a chat UI.

Deploy GPT-OSS 120B

GPT-OSS 20B

ollama/ollama

Just deployed

/root/.ollama

Open-WebUI

open-webui/open-webui

Just deployed

/app/backend/data

OSS

Deploy and Host GPT-OSS 120B on Railway

GPT-OSS 120B is a powerful open-weight, 120-billion-parameter large language model designed for reasoning, coding, and chat-based interactions. With this template, you can deploy it in minutes on Railway, complete with a built-in API and browser-based chat interface powered by Ollama and OpenWebUI.

ChatUI

About Hosting GPT-OSS 120B

Hosting GPT-OSS 120B on Railway gives you a fully self-contained AI stack. It uses Ollama as the backend model server and OpenWebUI as the chat interface, preconfigured to run together automatically. Once deployed, Ollama will pull and serve the gpt-oss:120b model while OpenWebUI provides a clean interface for chat. You’ll also get a ready-to-use API endpoint, allowing you to call the model directly from any app, service, or workflow.

The setup includes persistent storage for models, so downloads only happen once, and can scale up easily by adjusting your Railway plan. Deploy on Railway


System Requirements

ResourceRecommendedNotes
CPU8–16 vCPUsEssential for smooth model inference
RAM64 GB +GPT-OSS 120B (≈65 GB quantized) requires high memory availability
Disk100 GB +Model stored at /root/.ollama

💡 Railway Hosting Tip: The GPT-OSS 120B model is large and requires significant resources. Railway’s free plan will not provide enough compute or memory to run this model effectively. It’s highly recommended to use a Pro Plan or higher with extended RAM and CPU. If you just want to test the setup or UI, you can deploy the stack (Link) on the free tier to test configuration, but for actual model usage, upgrade to a Pro Plan with more RAM and compute.


Common Use Cases

  • 🧠 Host a private ChatGPT-style assistant using GPT-OSS 120B
  • ⚙️ Call the API endpoint from LangChain, Flowise, or any external application
  • 💬 Prototype and test custom LLM agents or workflows using open-weight models

Dependencies for GPT-OSS 120B Hosting

  • Ollama — model server handling GPT-OSS 120B inference and API hosting
  • OpenWebUI — web-based chat interface for interacting with GPT-OSS 120B

Deployment Dependencies


FAQ

1. What is GPT-OSS 120B?
GPT-OSS 120B is an open-weight large language model with 120 billion parameters, built for text, reasoning, and code generation. It’s an open alternative to GPT-style models that can run locally via Ollama.

2. Is GPT-OSS 120B free to use?
Yes, the model itself is open source and free. You only pay for the hosting resources used on Railway.

3. Can I deploy GPT-OSS 120B on Railway’s free plan?
You can deploy it, but performance will be limited due to memory constraints. For smooth usage, upgrade to a Pro Plan with higher CPU and RAM.

4. Do I need a GPU to run GPT-OSS 120B?
No — Ollama supports CPU inference, though GPU acceleration improves performance significantly.

5. How can I access the GPT-OSS API?
After deployment, use the endpoint provided in Railway:

http://.railway.internal:11434

You can send POST requests to /api/generate or connect directly from LangChain or other frameworks.

6. What happens after redeploys?
Downloaded models remain stored in Railway’s persistent volume, so they don’t re-download each time.


Why Deploy GPT-OSS 120B on Railway?

Railway is a singular platform to deploy your infrastructure stack. Railway will host your infrastructure so you don't have to deal with configuration, while allowing you to vertically and horizontally scale it.

By deploying GPT-OSS 120B on Railway, you are one step closer to supporting a complete full-stack application with minimal burden. Host your servers, databases, AI agents, and more on Railway.


Template Content

More templates in this category

View Template
Chat Chat
Chat Chat, your own unified chat and search to AI platform.

View Template
openui
Deploy OpenUI: AI-powered UI generation with GitHub OAuth and OpenAI API.

View Template
firecrawl
firecrawl api server + worker without auth, works with dify