Deploy replicate-openai
OpenAI-compatible gateway for Replicate models. BYOK supported.
replicate-openai
Just deployed
Deploy and Host replicate-openai on Railway
replicate-openai is an OpenAI-compatible API gateway for Replicate models. It lets you use any Replicate model — text, image, audio — with tools and SDKs that expect an OpenAI-compatible endpoint. Just change the base URL and API key, zero code changes required.
About Hosting replicate-openai
Hosting replicate-openai gives you a persistent, always-on gateway that any OpenAI-compatible client can connect to. The server runs as a lightweight FastAPI app inside Docker. It supports both streaming and non-streaming responses, pre-configured aliases for popular models like Llama 3, Mistral, Flux, and SDXL, and a BYOK mode where each client passes their own Replicate token — meaning the server owner pays nothing for inference.
Common Use Cases
- Using Replicate models inside AI coding tools like Cursor, Kilo Code, or Continue that require an OpenAI base URL
- Running image generation via the
/v1/images/generationsendpoint with Flux, SDXL, or Imagen - Self-hosting a shared AI gateway for a team, where each member brings their own Replicate token
Dependencies for replicate-openai Hosting
- A Replicate account and API token
- Docker (handled automatically by Railway)
Deployment Dependencies
Implementation Details
Point any OpenAI SDK at your Railway deployment:
from openai import OpenAI
client = OpenAI(
base_url="https://your-app.railway.app/v1",
api_key="your-replicate-token", # in BYOK mode
)
response = client.chat.completions.create(
model="llama-3-70b-instruct",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
Streaming:
with client.chat.completions.stream(
model="llama-3-70b-instruct",
messages=[{"role": "user", "content": "Write a haiku."}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
Image generation:
response = client.images.generate(
model="flux-schnell",
prompt="a cinematic sunset over mountains",
)
print(response.data[0].url)
Why Deploy replicate-openai on Railway?
Railway is a singular platform to deploy your infrastructure stack. Railway will host your infrastructure so you don't have to deal with configuration, while allowing you to vertically and horizontally scale it.
By deploying replicate-openai on Railway, you are one step closer to supporting a complete full-stack application with minimal burden. Host your servers, databases, AI agents, and more on Railway.
Template Content
replicate-openai
CookieShualon/replicate-openai