Deploy and Host replicate-openai on Railway

replicate-openai is an OpenAI-compatible API gateway for Replicate models. It lets you use any Replicate model — text, image, audio — with tools and SDKs that expect an OpenAI-compatible endpoint. Just change the base URL and API key, zero code changes required.

About Hosting replicate-openai

Hosting replicate-openai gives you a persistent, always-on gateway that any OpenAI-compatible client can connect to. The server runs as a lightweight FastAPI app inside Docker. It supports both streaming and non-streaming responses, pre-configured aliases for popular models like Llama 3, Mistral, Flux, and SDXL, and a BYOK mode where each client passes their own Replicate token — meaning the server owner pays nothing for inference.

Common Use Cases

Using Replicate models inside AI coding tools like Cursor, Kilo Code, or Continue that require an OpenAI base URL
Running image generation via the /v1/images/generations endpoint with Flux, SDXL, or Imagen
Self-hosting a shared AI gateway for a team, where each member brings their own Replicate token

Dependencies for replicate-openai Hosting

A Replicate account and API token
Docker (handled automatically by Railway)

Deployment Dependencies

Implementation Details

Point any OpenAI SDK at your Railway deployment:

from openai import OpenAI

client = OpenAI(
    base_url="https://your-app.railway.app/v1",
    api_key="your-replicate-token",  # in BYOK mode
)

response = client.chat.completions.create(
    model="llama-3-70b-instruct",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Streaming:

with client.chat.completions.stream(
    model="llama-3-70b-instruct",
    messages=[{"role": "user", "content": "Write a haiku."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Image generation:

response = client.images.generate(
    model="flux-schnell",
    prompt="a cinematic sunset over mountains",
)
print(response.data[0].url)

Why Deploy replicate-openai on Railway?

Railway is a singular platform to deploy your infrastructure stack. Railway will host your infrastructure so you don't have to deal with configuration, while allowing you to vertically and horizontally scale it.

By deploying replicate-openai on Railway, you are one step closer to supporting a complete full-stack application with minimal burden. Host your servers, databases, AI agents, and more on Railway.