Deploy PostgreSQL with pg_duckdb and pgvectorscale

PostgreSQL 17 with pg_duckdb and pgvectorscale extensions.

Deploy PostgreSQL with pg_duckdb and pgvectorscale

pgduckdb-pgvectorscale

joeychilson/railway-pgduckdb-pgvectorscale:sha-733ce3e

Just deployed

Deploy and Host PostgreSQL with pg_duckdb and pgvectorscale on Railway

PostgreSQL 17 combining pg_duckdb's analytical power with pgvectorscale's AI-optimized vector search. Run fast OLAP queries on data lakes while performing semantic search and embeddings operations—all in one database without separate infrastructure.

About Hosting PostgreSQL with pg_duckdb and pgvectorscale

This template combines two powerful PostgreSQL extensions into a unified database solution. pg_duckdb embeds DuckDB's analytical engine for lightning-fast queries on S3 data lakes, Parquet files, and MotherDuck datasets. pgvectorscale adds StreamingDiskANN indexing for high-performance vector similarity search, extending pgvector's capabilities. Together, they enable hybrid workloads: analytical queries on massive datasets alongside AI-powered semantic search and embeddings operations. The container is built from source using multi-stage builds, includes automatic extension initialization, and supports optional S3 configuration. Perfect for modern data applications requiring both analytical processing and AI capabilities without managing multiple database systems.

Common Use Cases

  • AI Analytics Platforms: Combine vector search for RAG/semantic queries with analytical queries on user behavior and metrics
  • Intelligent Data Lakes: Query S3/Parquet data with DuckDB while performing similarity search on document embeddings
  • Real-time AI Dashboards: Run analytical aggregations alongside nearest-neighbor searches for recommendations
  • Hybrid Search Systems: Blend traditional analytics (trends, aggregations) with semantic search (embeddings, similarity)
  • MLOps Pipelines: Store model embeddings and analyze training metrics/performance data in a single database

Dependencies for PostgreSQL with pg_duckdb and pgvectorscale Hosting

  • PostgreSQL 17: Latest major version with enhanced performance
  • Build Tools: Rust toolchain, PostgreSQL development headers, and build essentials for compiling extensions from source

Deployment Dependencies

Implementation Details

Environment Variables:

# PostgreSQL Configuration
POSTGRES_PASSWORD=your_secure_password
POSTGRES_USER=postgres
POSTGRES_DB=your_database

# S3 Configuration (optional, for pg_duckdb)
S3_ACCESS_KEY_ID=your_access_key
S3_SECRET_ACCESS_KEY=your_secret_key
S3_REGION=us-east-1
S3_ENDPOINT=optional_custom_endpoint
S3_URL_STYLE=path

# DuckDB Extensions (optional, comma-separated)
DUCKDB_EXTENSIONS=httpfs,parquet,json

Example Usage - Analytical Queries with pg_duckdb:

-- Query Parquet files from S3
SELECT * FROM read_parquet('s3://bucket/analytics/*.parquet')
WHERE event_date >= '2024-01-01';

-- Aggregate data from remote sources
SELECT
    date_trunc('month', event_time) as month,
    COUNT(*) as events
FROM read_csv('https://example.com/data.csv')
GROUP BY month;

Example Usage - Vector Search with pgvectorscale:

import psycopg2
from pgvector.psycopg2 import register_vector

conn = psycopg2.connect(DATABASE_PUBLIC_URL)
register_vector(conn)

cur = conn.cursor()

# Create table with vector embeddings
cur.execute("""
    CREATE TABLE documents (
        id SERIAL PRIMARY KEY,
        content TEXT,
        embedding VECTOR(1536)
    )
""")

# Create StreamingDiskANN index for scalable search
cur.execute("""
    CREATE INDEX ON documents
    USING diskann (embedding)
""")

# Perform similarity search
cur.execute("""
    SELECT id, content
    FROM documents
    ORDER BY embedding <-> %s
    LIMIT 10
""", (query_embedding,))

Hybrid Query Example:

-- Combine analytical aggregation with vector search
WITH similar_docs AS (
    SELECT id, content
    FROM documents
    ORDER BY embedding <-> '[0.1, 0.2, ...]'::vector
    LIMIT 100
)
SELECT
    date_trunc('day', created_at) as date,
    COUNT(*) as similar_doc_count
FROM similar_docs
JOIN events ON events.doc_id = similar_docs.id
GROUP BY date
ORDER BY date;

Volume Configuration:

Mount a volume at /var/lib/postgresql/data for persistent storage of both analytical data and vector embeddings.

Why Deploy PostgreSQL with pg_duckdb and pgvectorscale on Railway?

Railway is a singular platform to deploy your infrastructure stack. Railway will host your infrastructure so you don't have to deal with configuration, while allowing you to vertically and horizontally scale it.

By deploying PostgreSQL with pg_duckdb and pgvectorscale on Railway, you are one step closer to supporting a complete full-stack application with minimal burden. Host your servers, databases, AI agents, and more on Railway.


Template Content

More templates in this category

View Template
Postgres-to-R2 Backup
Auto back up PostgreSQL databases to Cloudflare R2 with optional encryption

View Template
ReadySet
A lightweight caching engine for Postgres

View Template
Simple S3
Deploy a S3-compatible storage service with a pre-named bucket.