
Deploy LightCrawl
Playwright-based lightweight scraping API and MCP server
LightCrawl
Just deployed
Deploy and Host LightCrawl on Railway
LightCrawl is a lightweight, self-hostable Web scraping API and Model Context Protocol (MCP) server that converts any web page into clean Markdown. Optimized for low-resource environments, it acts as a minimal, secure, and cost-effective alternative to Firecrawl, perfect for local development and LLM integration.
About Hosting LightCrawl
Deploying LightCrawl on Railway is incredibly straightforward and takes less than a minute. Since the project includes a fully configured Dockerfile, Railway automatically detects the environment, builds the container, and downloads only the required Chromium browser binary to optimize memory footprint.
You only need to configure basic environment variables: specifying PORT, setting NODE_ENV to production, and setting API_KEY (which Railway can generate automatically) to secure your HTTP API endpoints. Once deployed, Railway exposes a public URL, enabling immediate use as a standard REST API or integrating it as an MCP server with AI tools like Cursor or Claude Desktop.
Common Use Cases
- LLM Context Injection: Scrape target web pages and extract clean Markdown content to feed directly into LLM prompt contexts.
- MCP Server for AI Agents: Register LightCrawl as an MCP tool inside AI clients (Cursor, Claude Desktop) to let agents scrape the web in real-time.
- Secure Sandboxed Scraping: Protect client environments from malicious scripts by running execution-heavy browser sessions inside an isolated Railway container.
Dependencies for LightCrawl Hosting
- Playwright (Chromium): Headless browser engine used to render pages and bypass bot detection.
- Mozilla Readability: Content extraction engine to strip boilerplate code (menus, ads, footers) and return clean semantic content.
Deployment Dependencies
Implementation Details
The service operates as a hybrid server. It starts an Express API server while simultaneously setting up a Stdio transport channel for MCP clients. Below is the simplified structure of the scraping logic and HTML-to-Markdown pipeline:
// 1. Launch headless Chromium using Playwright (Stealth mode enabled)
const browser = await playwright.chromium.launch();
const page = await browser.newPage();
await page.goto(url, { waitUntil: 'domcontentloaded' });
// 2. Extract content using Mozilla Readability
const parsed = new Readability(dom.window.document).parse();
// 3. Convert clean HTML to Markdown via Turndown
const markdown = turndownService.turndown(parsed.content);
Why Deploy LightCrawl on Railway?
Railway is a singular platform to deploy your infrastructure stack. Railway will host your infrastructure so you don't have to deal with configuration, while allowing you to vertically and horizontally scale it.
By deploying LightCrawl on Railway, you are one step closer to supporting a complete full-stack application with minimal burden. Host your servers, databases, AI agents, and more on Railway.
Template Content
LightCrawl
yosuke1024/LightCrawl