Deploy PostgreSQL with pg_duckdb and pgvectorscale
PostgreSQL 17 with pg_duckdb and pgvectorscale extensions.
pgduckdb-pgvectorscale
joeychilson/railway-pgduckdb-pgvectorscale:sha-733ce3e
Just deployed
Deploy and Host PostgreSQL with pg_duckdb and pgvectorscale on Railway
PostgreSQL 17 combining pg_duckdb's analytical power with pgvectorscale's AI-optimized vector search. Run fast OLAP queries on data lakes while performing semantic search and embeddings operations—all in one database without separate infrastructure.
About Hosting PostgreSQL with pg_duckdb and pgvectorscale
This template combines two powerful PostgreSQL extensions into a unified database solution. pg_duckdb embeds DuckDB's analytical engine for lightning-fast queries on S3 data lakes, Parquet files, and MotherDuck datasets. pgvectorscale adds StreamingDiskANN indexing for high-performance vector similarity search, extending pgvector's capabilities. Together, they enable hybrid workloads: analytical queries on massive datasets alongside AI-powered semantic search and embeddings operations. The container is built from source using multi-stage builds, includes automatic extension initialization, and supports optional S3 configuration. Perfect for modern data applications requiring both analytical processing and AI capabilities without managing multiple database systems.
Common Use Cases
- AI Analytics Platforms: Combine vector search for RAG/semantic queries with analytical queries on user behavior and metrics
- Intelligent Data Lakes: Query S3/Parquet data with DuckDB while performing similarity search on document embeddings
- Real-time AI Dashboards: Run analytical aggregations alongside nearest-neighbor searches for recommendations
- Hybrid Search Systems: Blend traditional analytics (trends, aggregations) with semantic search (embeddings, similarity)
- MLOps Pipelines: Store model embeddings and analyze training metrics/performance data in a single database
Dependencies for PostgreSQL with pg_duckdb and pgvectorscale Hosting
- PostgreSQL 17: Latest major version with enhanced performance
- Build Tools: Rust toolchain, PostgreSQL development headers, and build essentials for compiling extensions from source
Deployment Dependencies
- pg_duckdb Documentation - DuckDB analytical engine integration
- pgvector Documentation - Core vector extension
- pgvectorscale Documentation - StreamingDiskANN indexing
- DuckDB Extensions - Additional DuckDB capabilities
Implementation Details
Environment Variables:
# PostgreSQL Configuration
POSTGRES_PASSWORD=your_secure_password
POSTGRES_USER=postgres
POSTGRES_DB=your_database
# S3 Configuration (optional, for pg_duckdb)
S3_ACCESS_KEY_ID=your_access_key
S3_SECRET_ACCESS_KEY=your_secret_key
S3_REGION=us-east-1
S3_ENDPOINT=optional_custom_endpoint
S3_URL_STYLE=path
# DuckDB Extensions (optional, comma-separated)
DUCKDB_EXTENSIONS=httpfs,parquet,json
Example Usage - Analytical Queries with pg_duckdb:
-- Query Parquet files from S3
SELECT * FROM read_parquet('s3://bucket/analytics/*.parquet')
WHERE event_date >= '2024-01-01';
-- Aggregate data from remote sources
SELECT
date_trunc('month', event_time) as month,
COUNT(*) as events
FROM read_csv('https://example.com/data.csv')
GROUP BY month;
Example Usage - Vector Search with pgvectorscale:
import psycopg2
from pgvector.psycopg2 import register_vector
conn = psycopg2.connect(DATABASE_PUBLIC_URL)
register_vector(conn)
cur = conn.cursor()
# Create table with vector embeddings
cur.execute("""
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT,
embedding VECTOR(1536)
)
""")
# Create StreamingDiskANN index for scalable search
cur.execute("""
CREATE INDEX ON documents
USING diskann (embedding)
""")
# Perform similarity search
cur.execute("""
SELECT id, content
FROM documents
ORDER BY embedding <-> %s
LIMIT 10
""", (query_embedding,))
Hybrid Query Example:
-- Combine analytical aggregation with vector search
WITH similar_docs AS (
SELECT id, content
FROM documents
ORDER BY embedding <-> '[0.1, 0.2, ...]'::vector
LIMIT 100
)
SELECT
date_trunc('day', created_at) as date,
COUNT(*) as similar_doc_count
FROM similar_docs
JOIN events ON events.doc_id = similar_docs.id
GROUP BY date
ORDER BY date;
Volume Configuration:
Mount a volume at /var/lib/postgresql/data for persistent storage of both analytical data and vector embeddings.
Why Deploy PostgreSQL with pg_duckdb and pgvectorscale on Railway?
Railway is a singular platform to deploy your infrastructure stack. Railway will host your infrastructure so you don't have to deal with configuration, while allowing you to vertically and horizontally scale it.
By deploying PostgreSQL with pg_duckdb and pgvectorscale on Railway, you are one step closer to supporting a complete full-stack application with minimal burden. Host your servers, databases, AI agents, and more on Railway.
Template Content
pgduckdb-pgvectorscale
ghcr.io/joeychilson/railway-pgduckdb-pgvectorscale:sha-733ce3e