Deploy and Host LlamaIndex Apps on Railway

LlamaIndex is an open-source framework for building LLM-powered applications over your own data. This template deploys two Streamlit apps — one for chatting with PDFs using LlamaParse, and one for summarizing URLs using Google Gemini — each demonstrating a core LlamaIndex pattern: document ingestion, indexing, and retrieval-augmented generation (RAG).

About Hosting LlamaIndex Apps

Both apps are single-service Streamlit deployments with no persistent storage or database required. The PDF chat app uses LlamaParse to parse uploaded PDFs and OpenAI to answer queries over the indexed content. The URL summarizer uses Google Gemini to generate summaries of web pages fetched at runtime. API keys are entered via the sidebar at runtime — nothing is stored server-side. Railway handles HTTPS and port binding automatically via the railway.toml start command in each app.

Common Use Cases

PDF Q&A — upload any PDF and ask natural language questions against its content; LlamaParse handles complex layouts, tables, and multi-column text that standard PDF parsers struggle with, before indexing with OpenAI embeddings for retrieval
URL summarization — paste any public URL and get a concise 200-250 word summary generated by Gemini 2.5 Flash, useful for quickly digesting articles, documentation, or research papers
LlamaIndex RAG starter — use either app as a reference implementation for building your own LlamaIndex-powered Streamlit tools on Railway, with the VectorStoreIndex and SummaryIndex patterns already wired up

Dependencies for LlamaIndex Apps Hosting

LlamaIndex (llama-index, llama-index-core) — core RAG framework
LlamaCloud (llama-cloud-services) — LlamaParse API for PDF parsing; requires a LlamaCloud API key
OpenAI (llama-index-llms-openai, llama-index-embeddings-openai) — LLM and embeddings for the PDF chat app; requires an OpenAI API key
Google Gemini (llama-index-llms-google-genai) — LLM for the URL summarizer app; requires a Google AI Studio API key
No database or persistent volume required

Deployment Dependencies

Implementation Details

Chat with PDF — uses LlamaParse from llama-cloud-services to parse the uploaded PDF into markdown, builds a VectorStoreIndex from the parsed content, and queries it with similarity_top_k=5 and tree_summarize response mode. The document is parsed and indexed once per session; subsequent queries reuse the cached index without re-uploading.

Summarize URL — uses SimpleWebPageReader to fetch and extract plain text from the provided URL, builds a SummaryIndex, and queries it with a fixed prompt for a 200-250 word summary using Gemini 2.5 Flash.

API keys are entered via the sidebar at runtime and stored in st.session_state for the duration of the session. They are never persisted to disk or logged.

Why Deploy LlamaIndex Apps on Railway?

Railway is a singular platform to deploy your infrastructure stack. Railway will host your infrastructure so you don't have to deal with configuration, while allowing you to vertically and horizontally scale it.

By deploying LlamaIndex Apps on Railway, you are one step closer to supporting a complete full-stack application with minimal burden. Host your servers, databases, AI agents, and more on Railway.