Deploy ClickHouse Cluster

3-shard ClickHouse cluster with 2x replication

Deploy ClickHouse Cluster

ClickHouse Keeper K1

railwayapp-templates/clickhouse-cluster

Just deployed

ClickHouse S1R2

railwayapp-templates/clickhouse-cluster

Just deployed

ClickHouse S2R1

railwayapp-templates/clickhouse-cluster

Just deployed

ClickHouse S2R2

railwayapp-templates/clickhouse-cluster

Just deployed

Direct Proxy

railwayapp-templates/clickhouse-cluster

Just deployed

Main Proxy

railwayapp-templates/clickhouse-cluster

Just deployed

ClickHouse S1R1

railwayapp-templates/clickhouse-cluster

Just deployed

ClickHouse Keeper K3

railwayapp-templates/clickhouse-cluster

Just deployed

ClickHouse S3R2

railwayapp-templates/clickhouse-cluster

Just deployed

ClickHouse S3R1

railwayapp-templates/clickhouse-cluster

Just deployed

ClickHouse Keeper K2

railwayapp-templates/clickhouse-cluster

Just deployed

Deploy and Host ClickHouse Cluster on Railway

ClickHouse is a fast open-source column-oriented database management system that provides high-performance analytics and real-time data processing capabilities. It is designed for online analytical processing (OLAP) workloads and is widely used for data warehousing, business intelligence, and real-time analytics applications that require processing large volumes of data.

About Hosting ClickHouse Cluster

Hosting a ClickHouse cluster gives you access to a distributed analytical database capable of handling massive concurrent queries, managing terabyte-scale data persistence, and supporting high availability across multiple nodes. This template provides a pre-configured cluster with 3 shards and 2 replicas per shard, and efficient columnar storage with zstd compression enabled by default. The database excels at real-time analytics, complex aggregation queries, and distributed query processing across cluster nodes. ClickHouse cluster deployments benefit from scalable CPU, RAM, and storage resources while supporting network security through Railway's private network capabilities. Railway provides automated backup systems and comprehensive logging to support your distributed database operations.

Common Use Cases

  • Real-time Analytics and Business Intelligence: Powering dashboards, reporting systems, and data visualization tools that require sub-second query response times across billions of records for e-commerce analytics, user behavior tracking, and operational monitoring.

  • Data Warehousing and ETL Processing: Serving as the primary analytical database for data lakes, ETL pipelines, and data transformation workflows that process large volumes of structured and semi-structured data from multiple sources.

  • Time-Series and Event Data Analysis: Managing high-velocity time-series data, application logs, IoT sensor data, and event streams that require efficient compression, fast ingestion, and complex temporal queries.

  • Machine Learning and Data Science: Supporting feature engineering, model training data preparation, and real-time scoring pipelines that require fast aggregations and statistical computations across large datasets.

Dependencies for ClickHouse Cluster Hosting

  • clickhouse-keeper - For cluster coordination and metadata management
  • haproxy - For load balancing and connection management

Deployment Dependencies

Implementation Details

This template deploys a ClickHouse cluster with 3 shards and 2 replicas per shard, totaling 6 ClickHouse server nodes plus ClickHouse Keeper ensemble for coordination and HAProxy for load balancing.

Cluster Architecture

The cluster is configured with the following topology:

  • 3 Shards: Data is horizontally partitioned across three shard groups
  • 2 Replicas per Shard: Each shard has two replica nodes for high availability
  • ClickHouse Keeper Ensemble: 3-node ClickHouse Keeper cluster for metadata and coordination
  • HAProxy Load Balancer: Routes client connections across healthy ClickHouse nodes

Cluster Layout:

Shard 1: ClickHouse S1R1, ClickHouse S1R2
Shard 2: ClickHouse S2R1, ClickHouse S2R2
Shard 3: ClickHouse S3R1, ClickHouse S3R2

Configuration Files

The deployment includes custom configuration files:

  • config.xml - Main server configuration with cluster definition, ClickHouse Keeper settings, and network configuration
  • users.xml - User authentication and permission settings
  • cluster.xml - Distributed table configuration and shard mappings

ClickHouse Keeper Integration

ClickHouse Keeper is used for:

  • Replica synchronization and consistency
  • Distributed DDL operations
  • Leader election for ReplicatedMergeTree tables
  • Cluster metadata management

Data Distribution

In the following example, we will create a database, a local table, and a distributed table on the cluster.

Tables are created using the Distributed engine with automatic sharding based on a hash function of the primary key. ReplicatedMergeTree tables ensure data replication within each shard.

Notes:

  • When '{cluster}' is used, it is not a placeholder that the user needs to replace with the cluster name. It is a placeholder that ClickHouse will automatically replace with the cluster name when the query is executed.
  • The CODEC (ZSTD) is used to compress the data further. This is a good idea for large tables with a lot of data.
  • The TTL toDateTime(TimeStamp) + INTERVAL 90 DAY; is used to delete data older than 90 days.
  • The PARTITION BY toYYYYMM(TimeStamp) is used to partition the data by month.
  • The ORDER BY (TimeStamp, EventId) is used to order the data by timestamp and event id, ClickHouse will also index the timestamp and event id.
  • The ReplicatedMergeTree is used to replicate the data across the cluster.

Create a database on the cluster:

CREATE DATABASE IF NOT EXISTS events_database ON CLUSTER '{cluster}';

This database is where you will create your local and distributed tables.

Create a local table on the cluster:

CREATE TABLE IF NOT EXISTS events_database.events_local ON CLUSTER '{cluster}' (
    -- timestamp
    TimeStamp DateTime64(3) CODEC (Delta, ZSTD),

    -- event data
    EventId String CODEC (ZSTD),
    EventType String CODEC (ZSTD),
    EventData String CODEC (ZSTD),
    EventSource String CODEC (ZSTD),
    EventSeverity String CODEC (ZSTD),
    EventStatus String CODEC (ZSTD),
    EventTags String CODEC (ZSTD)
) ENGINE = ReplicatedMergeTree('/clickhouse/{installation}/{cluster}/tables/{shard}-{uuid}/{database}/{table}', '{replica}')
PARTITION BY toYYYYMM(TimeStamp)
ORDER BY (TimeStamp, EventId)
TTL toDateTime(TimeStamp) + INTERVAL 90 DAY;

This table is the backing table for the distributed table, and it has a TTL of 90 days, meaning data older than 90 days will be automatically deleted.

Create a distributed table on the cluster:

CREATE TABLE IF NOT EXISTS events_database.events ON CLUSTER '{cluster}'
    AS events_database.events_local ENGINE = Distributed('{cluster}', events_database, events_local, rand());

This events_database.events table is what your code will read or write to.

Modifying data in the distributed table

ALTER TABLE events_database.events_local ON CLUSTER '{cluster}' DELETE WHERE EventType = 'error';

This will delete all rows where the EventType is 'error'.

Note: SQL that modifies the data must be ran on the local table, not the distributed table.

High Availability Features

  • Automatic Failover: If a replica fails, queries automatically route to healthy replicas
  • Data Synchronization: ReplicatedMergeTree ensures eventual consistency across replicas
  • Rolling Updates: Cluster can be updated one node at a time without downtime (This would also apply if you ever need to grow a volume)

Proxy Configuration

This template comes with two proxies, a direct proxy and a main proxy.

  • Direct Proxy - The direct Proxy provides a direct connection to the ClickHouse S1R1 node, use this for running migrations.

  • Main Proxy - The main Proxy provides a round robin connection to the ClickHouse S{1/2/3}R1 nodes, use this for running queries and inserting data.

Environment Variables

Key environment variables.

On the Direct Proxy:

  • CLICKHOUSE_PUBLIC_DIRECT_HTTP_URL - The public HTTP URL of the direct proxy.

  • CLICKHOUSE_PRIVATE_DIRECT_HTTP_URL - The private HTTP URL of the direct proxy (a reference variable to the ClickHouse S1R1 node's private domain and port).

  • CLICKHOUSE_PUBLIC_DIRECT_TCP_URL - The public TCP URL of the direct proxy.

  • CLICKHOUSE_PRIVATE_DIRECT_TCP_URL - The private TCP URL of the direct proxy (a reference variable to the ClickHouse S1R1 node's private domain and port).

On the Main Proxy:

  • CLICKHOUSE_PUBLIC_HTTP_URL - The public HTTP URL of the main proxy.

  • CLICKHOUSE_PRIVATE_HTTP_URL - The private HTTP URL of the main proxy.

  • CLICKHOUSE_PUBLIC_TCP_URL - The public TCP URL of the main proxy.

  • CLICKHOUSE_PRIVATE_TCP_URL - The private TCP URL of the main proxy.

On the ClickHouse S1R1 node:

  • CH_USER - The user to connect to the database.

  • CH_PASSWORD - The automatically generated password to connect to the database.

Note: The TCP and HTTP URLs already have the CH_USER and CH_PASSWORD variables injected into them.

Why Deploy ClickHouse Cluster on Railway?

Railway is a singular platform to deploy your infrastructure stack. Railway will host your infrastructure so you don't have to deal with configuration, while allowing you to vertically scale it.

By deploying a ClickHouse cluster on Railway, you are one step closer to supporting a complete analytical data stack with minimal burden. Host your servers, databases, AI agents, and more on Railway.


Template Content

More templates in this category

View Template
ReadySet
A lightweight caching engine for Postgres

View Template
Simple S3
Deploy a S3-compatible storage service with a pre-named bucket.

View Template
Flare
A modern, lightning-fast file sharing platform built for self-hosting