Tabby logo

Deploy and Host Tabby on Railway

Deploy Tabby on Railway to run your own private AI coding assistant with full data control. Self-host Tabby and get intelligent code completions, inline chat, and codebase-aware suggestions without sending code to third-party cloud services.

This Railway template deploys Tabby with StarCoder-1B for code completion, Qwen2-1.5B-Instruct for chat, and Nomic-Embed-Text for code search — all running on CPU with persistent model storage via Railway Volumes.

Tabby ML Chat

Getting Started with Tabby on Railway

After deployment completes, open your Railway-assigned URL to access the Tabby admin dashboard. On first visit, you'll be guided through creating an admin account — this is required since JWT authentication is enabled by default.

Once registered, navigate to the Settings page to configure repository indexing and generate API tokens. Install the Tabby IDE extension (VS Code, JetBrains, Vim, or Emacs), then paste your server URL and token into the extension settings. Start coding — completions and chat are now powered by your self-hosted instance.

Note: Initial model loading takes 2-3 minutes on first boot as weights are loaded into RAM. Subsequent restarts are faster since models are cached on the persistent volume.

Tabby dashboard screenshot

About Hosting Tabby

Tabby is an open-source, self-hosted AI coding assistant that provides GitHub Copilot-like functionality without external dependencies. Built in Rust by TabbyML, it runs inference locally using GGUF-quantized models through an integrated llama.cpp server.

Key features:

Code completion — context-aware inline suggestions as you type
Chat interface — ask questions about code, get explanations, generate snippets
Codebase indexing — RAG-powered answers grounded in your private repositories
Multi-IDE support — VS Code, JetBrains, Vim/Neovim, Emacs via LSP
Admin dashboard — team management, usage analytics, model configuration
Fine-tuning support — train on your private codebase for domain-specific completions
No telemetry — zero data leaves your infrastructure

Why Deploy Tabby on Railway

One-click deploy with pre-configured CPU inference and persistent storage
Complete data privacy — code never leaves your Railway project
Automatic model management with volume-backed caching
Built-in JWT auth protects your API endpoints
Scale memory up to 8GB for larger models when needed

Common Use Cases for Self-Hosted Tabby

Private codebase autocomplete — get suggestions trained on internal code patterns without cloud exposure
Enterprise compliance — meet data residency and security requirements that prohibit cloud AI tools
Team coding assistant — shared self-hosted instance with usage analytics and access control
Offline development — AI-powered completions without internet dependency after models are cached

Dependencies for Tabby on Railway

tabbyml/tabby — main application image (AI inference server, web UI, API)
Railway Volume at /data — stores downloaded models, SQLite database, configuration

Environment Variables Reference for Self-Hosted Tabby

Variable	Value	Purpose
`PORT`	`8080`	HTTP server port
`TABBY_ROOT`	`/data`	Root data directory
`TABBY_WEBSERVER_JWT_TOKEN_SECRET`	UUID string	Persistent JWT signing key
`RAILWAY_RUN_UID`	`0`	Run as root for volume access

Deployment Dependencies

Runtime: Rust binary + llama.cpp (embedded)
Docker Hub: tabbyml/tabby
GitHub: TabbyML/tabby (32,000+ stars)
Docs: tabby.tabbyml.com

Hardware Requirements for Self-Hosting Tabby

Resource	Minimum (1B model)	Recommended (1.5B+ models)
CPU	2 vCPU	4 vCPU
RAM	4 GB	8 GB
Storage	5 GB	10 GB
GPU	Not required (CPU mode)	NVIDIA GPU with 8GB+ VRAM

CPU inference is functional for personal use. For team deployments with sub-second latency, GPU acceleration is recommended.

Self-Hosting Tabby with Docker

Run Tabby locally with Docker for development or testing:

docker run -d \
  --name tabby \
  -p 8080:8080 \
  -v $HOME/.tabby:/data \
  -e TABBY_ROOT=/data \
  tabbyml/tabby \
  serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device cpu --host 0.0.0.0

For GPU-accelerated deployment with NVIDIA:

docker run -d \
  --gpus all \
  --name tabby \
  -p 8080:8080 \
  -v $HOME/.tabby:/data \
  -e TABBY_ROOT=/data \
  tabbyml/tabby \
  serve --model StarCoder-7B --chat-model Qwen2-1.5B-Instruct --device cuda --host 0.0.0.0

Connect your IDE after startup:

# VS Code: Install "Tabby" extension, then set:
# Server URL: http://localhost:8080
# Generate a token from the Tabby admin UI

Is Tabby Free to Self-Host?

Tabby is 100% open-source under the Apache 2.0 license with no paid tiers or feature gates. All capabilities — code completion, chat, team management, SSO, codebase indexing — are included in the free self-hosted version. Your only costs are infrastructure: on Railway, expect ~$5-15/month depending on usage and memory allocation.

Tabby vs GitHub Copilot for Self-Hosted Teams

Feature	Tabby	GitHub Copilot
Hosting	Self-hosted / on-prem	Cloud only
Data privacy	Code stays on your infra	Sent to GitHub/OpenAI
Cost	Free (infra only)	$10-39/user/month
Custom models	Yes (fine-tune on private code)	No
IDE support	VS Code, JetBrains, Vim, Emacs	VS Code, JetBrains, Vim
Offline capable	Yes (after model download)	No
Enterprise SSO	Yes	Enterprise plan only

Tabby is ideal for teams prioritizing data sovereignty and cost control. Copilot offers better out-of-box polish with stronger base models.

FAQ

What is Tabby and why would I self-host it on Railway? Tabby is an open-source AI coding assistant that provides code completion, chat, and codebase-aware suggestions. Self-hosting on Railway gives you complete data privacy — your code never leaves your infrastructure — plus one-click deployment with automatic scaling.

What does this Railway template deploy for Tabby? This template deploys a single Tabby service with CPU inference enabled, a Railway Volume for persistent model storage, and JWT authentication. It includes StarCoder-1B for code completion, Qwen2-1.5B-Instruct for chat, and Nomic-Embed-Text for code search embeddings.

Why does Tabby need a persistent volume on Railway? Tabby downloads ML models (1-4 GB each) on first startup and stores them locally. Without a volume, models would re-download on every container restart, adding 2-3 minutes to boot time. The volume also persists your SQLite database (users, chat history, settings).

How do I connect VS Code or JetBrains to my self-hosted Tabby instance? Install the Tabby extension from the VS Code Marketplace or JetBrains Plugin Repository. In extension settings, set the server URL to your Railway domain (e.g. https://tabby-production.up.railway.app) and paste an API token generated from the Tabby admin dashboard.

Can I use larger models with Tabby on Railway for better code completions? Yes, but larger models require more RAM. The default 8GB memory limit supports 1B-1.5B parameter models comfortably. For 7B models, you'll need GPU acceleration which Railway doesn't provide by default. Consider using Tabby's HTTP model backend to offload inference to an external API (OpenAI, Ollama) while keeping the orchestration layer on Railway.

How do I enable codebase indexing in self-hosted Tabby? After deployment, go to Settings → Repositories in the Tabby admin dashboard. Add your Git repositories (public or private with access tokens). Tabby will index the codebase and use it for context-aware completions and chat answers via its built-in RAG pipeline.