Railway

Deploy replicate-openai

OpenAI-compatible gateway for Replicate models. BYOK supported.

Deploy replicate-openai

Just deployed

Deploy and Host replicate-openai on Railway

replicate-openai is an OpenAI-compatible API gateway for Replicate models. It lets you use any Replicate model — text, image, audio — with tools and SDKs that expect an OpenAI-compatible endpoint. Just change the base URL and API key, zero code changes required.

About Hosting replicate-openai

Hosting replicate-openai gives you a persistent, always-on gateway that any OpenAI-compatible client can connect to. The server runs as a lightweight FastAPI app inside Docker. It supports both streaming and non-streaming responses, pre-configured aliases for popular models like Llama 3, Mistral, Flux, and SDXL, and a BYOK mode where each client passes their own Replicate token — meaning the server owner pays nothing for inference.

Common Use Cases

  • Using Replicate models inside AI coding tools like Cursor, Kilo Code, or Continue that require an OpenAI base URL
  • Running image generation via the /v1/images/generations endpoint with Flux, SDXL, or Imagen
  • Self-hosting a shared AI gateway for a team, where each member brings their own Replicate token

Dependencies for replicate-openai Hosting

  • A Replicate account and API token
  • Docker (handled automatically by Railway)

Deployment Dependencies

Implementation Details

Point any OpenAI SDK at your Railway deployment:

from openai import OpenAI

client = OpenAI(
    base_url="https://your-app.railway.app/v1",
    api_key="your-replicate-token",  # in BYOK mode
)

response = client.chat.completions.create(
    model="llama-3-70b-instruct",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Streaming:

with client.chat.completions.stream(
    model="llama-3-70b-instruct",
    messages=[{"role": "user", "content": "Write a haiku."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Image generation:

response = client.images.generate(
    model="flux-schnell",
    prompt="a cinematic sunset over mountains",
)
print(response.data[0].url)

Why Deploy replicate-openai on Railway?

Railway is a singular platform to deploy your infrastructure stack. Railway will host your infrastructure so you don't have to deal with configuration, while allowing you to vertically and horizontally scale it.

By deploying replicate-openai on Railway, you are one step closer to supporting a complete full-stack application with minimal burden. Host your servers, databases, AI agents, and more on Railway.


Template Content

More templates in this category

View Template
Chat Chat
Chat Chat, your own unified chat and search to AI platform.

okisdev
View Template
EchoDeck
Generate a mp4 from powerpoint with TTS

Fixed Scope
View Template
Rift
Rift Its a OSS AI Chat for teams

Compound