---
title: "Step-by-step guide to self-host ClickHouse® for beginners (2026)"
excerpt: "Learn to self-host ClickHouse® with our complete 2026 guide covering installation, configuration, monitoring, and production-ready deployment strategies."
authors: "Cameron Archer"
categories: "AI Resources"
createdOn: "2025-11-07 17:24:11"
publishedOn: "2025-11-07 17:24:11"
updatedOn: "2025-11-07 17:24:11"
status: "published"
---

Self-hosting ClickHouse® means installing and running the database on infrastructure you control, rather than using a managed cloud service. You handle installation, configuration, backups, and scaling yourself, which gives you complete control over performance tuning and data location but requires ongoing operational work.

This guide walks through the complete process of setting up a self-hosted ClickHouse® deployment, from initial installation and configuration to production hardening, replication, monitoring, and backup strategies.

## What it takes to self-host ClickHouse®

Self-hosting ClickHouse® isn't for the faint of heart. You are responsible for setting up and running the database on servers you control, rather than using a managed service, though there are [multiple deployment options](https://www.tinybird.co/blog/clickhouse-deployment-options) to consider. You handle the installation, configuration, and maintenance yourself, whether that’s on physical machines in a data center, virtual machines from a cloud provider, or k8s containers.

This approach makes sense in specific situations. If your company has strict data residency rules that require data to stay within certain geographic boundaries or private networks, self-hosting gives you that control. If you have existing infrastructure expertise you might prefer self-hosting to fine-tune performance settings and manage costs directly, especially when processing very large data volumes where managed service pricing can add up quickly. Or if you're building a commercial open-source SaaS product, you might want to keep your stack completely self-managed.

The tradeoff is operational work. You’re responsible for installation, security patches, backups, monitoring, and scaling. For teams building applications rather than managing databases, [managed services like Tinybird](https://www.tinybird.co/blog/managed-clickhouse-options) handle this complexity while keeping ClickHouse®’s performance characteristics intact.

| Consideration | Self-hosted | Managed (Tinybird) |
| --- | --- | --- |
| Setup time | Hours to days | Minutes |
| Operational overhead | Patching, monitoring, scaling | Fully managed |
| Infrastructure control | Full control | Abstracted |
| Cost structure | Infrastructure + personnel | Usage-based |
| API layer | Build your own | Built-in HTTP APIs |

## Prerequisites for a safe install

Before installing ClickHouse®, check that your system meets the minimum requirements. Inadequate resources or incorrect system settings can lead to poor performance, data corruption, or installation failures.

### 1. CPU and memory sizing

ClickHouse® needs at least 4 CPU cores and 8GB of RAM for basic workloads. Analytical queries load data into memory for processing, so more RAM directly improves query speed. Production deployments typically start with 16GB and scale up based on concurrent queries and dataset size.

### 2. Disk type and layout

SSDs are strongly recommended over mechanical drives because ClickHouse® performs many random reads during query execution. Hard drives create bottlenecks that can slow queries by 10x or more, while SSDs can achieve [316.99 M rows/s](https://clickhouse.com/blog/graviton-boosts-clickhouse-cloud-performance) throughput rates. Separating data and log directories onto different mount points improves performance and makes troubleshooting easier when disk issues occur.

### 3. Network and firewall rules

ClickHouse® listens on port 9000 for native protocol connections (used by `clickhouse-client`) and port 8123 for HTTP connections (used by applications). In production, restrict these ports to trusted networks using firewall rules. Opening port 9000 to the public internet without authentication creates a security vulnerability.

### 4. Kernel and sysctl settings

Linux kernel parameters affect how ClickHouse® handles file operations and network connections. Increase the maximum number of open files to at least 262144 with `ulimit -n 262144` or by editing `/etc/security/limits.conf`. TCP settings like `net.ipv4.tcp_keepalive_time` help maintain long-running connections for streaming data.

```bash
# Add to /etc/sysctl.conf
net.ipv4.tcp_keepalive_time = 300
net.core.somaxconn = 4096
vm.max_map_count = 262144

```

## Step-by-step installation methods

ClickHouse® can be installed using package managers, containers, or orchestration tools. The method you pick depends on your existing infrastructure and comfort level with different deployment approaches.

### 1. APT or YUM on popular Linux distros

The official ClickHouse® repository provides packages for Ubuntu, Debian, CentOS, and RHEL. This method integrates with your system’s package manager for easy updates.

For Ubuntu or Debian:

```bash
sudo apt-get install -y apt-transport-https ca-certificates dirmngr
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 8919F6BD2B48D754
echo "deb https://packages.clickhouse.com/deb stable main" | sudo tee /etc/apt/sources.list.d/clickhouse.list
sudo apt-get update
sudo apt-get install -y clickhouse-server clickhouse-client
```

After installation, start the server with `sudo systemctl start clickhouse-server` and enable it to start on boot with `sudo systemctl enable clickhouse-server`.

### 2. Docker Compose quick start

Docker Compose provides the fastest way to get ClickHouse® running for development or testing (though keep in mind that self-hosted ClickHouse® on Docker isn't the only [Docker-based ClickHouse® solution](https://www.tinybird.co/blog/tinybird-local-docker-container) for local dev). Create a `docker-compose.yml` file:

```yaml
version: '3.8'
services:
clickhouse:
image: clickhouse/clickhouse-server:latest
ports:
- "8123:8123"
- "9000:9000"
volumes:
- clickhouse\_data:/var/lib/clickhouse
ulimits:
nofile:
soft: 262144
hard: 262144
volumes:
clickhouse\_data:
```

Run `docker-compose up -d` to start the container in the background. This works well for local development but requires additional configuration for production use.

### 3. Kubernetes Helm chart

The official ClickHouse® Helm chart simplifies deployment in container-native environments. Add the ClickHouse® repository:

```bash
helm repo add clickhouse https://charts.clickhouse.com
helm repo update
helm install my-clickhouse clickhouse/clickhouse
```

Production deployments require customizing values for replica counts, storage classes, and resource limits by creating a `values.yaml` file.

## Step-by-step first query and sample data

After installation, verify ClickHouse® works by creating a database, inserting data, and running a query. This confirms the installation succeeded before moving to production configuration.

### 1. Create a database and table

Connect to ClickHouse® using `clickhouse-client` and create a database:

```sql
CREATE DATABASE test_db;
```

Create a table using the [`MergeTree`](https://www.tinybird.co/blog/clickhouse-create-table-example) engine, which is ClickHouse®’s most common table engine:

```sql
CREATE TABLE test\_db.events (
event\_id String,
user\_id String,
event\_type String,
event\_time DateTime,
value Float64
) ENGINE = MergeTree()
ORDER BY (event\_type, event\_time);
```

The `ORDER BY` clause defines how data is sorted on disk. Queries that filter or aggregate by `event_type` and `event_time` will be fast because data is physically organized in that order.

### 2. Insert sample rows

Insert a few rows to test the table:

```sql
INSERT INTO test_db.events VALUES
    ('evt_001', 'user_123', 'pageview', '2025-01-15 10:30:00', 1.0),
    ('evt_002', 'user_456', 'click', '2025-01-15 10:31:00', 2.5),
    ('evt_003', 'user_123', 'purchase', '2025-01-15 10:32:00', 49.99);
```

For larger datasets, ClickHouse® supports batch inserts from CSV, JSON, or Parquet files.

### 3. Run a `select` with `clickhouse-client`

Query the data to confirm everything works:

```sql
SELECT event_type, count() AS event_count, avg(value) AS avg_value
FROM test_db.events
GROUP BY event_type
ORDER BY event_count DESC;
```

If the query returns data, your ClickHouse® installation is working.

## Core configuration tweaks for production safety

Default ClickHouse® settings work for development but require adjustment for production. These changes prevent common issues like data loss and performance degradation, and there are [specific optimization steps](https://www.tinybird.co/blog/optimize-clickhouse-cluster) to achieve peak performance.

### 1. Enable asynchronous inserts

Asynchronous inserts batch multiple small inserts together before writing to disk, which significantly improves write throughput. Add this to [`/etc/clickhouse-server/config.d/async_insert.xml`](https://www.tinybird.co/blog/clickhouse-config-xml-example-explainer):

```xml
1
1
10485760
```

The `wait_for_async_insert` setting ensures clients receive confirmation after data is written to disk, not just buffered in memory.

### 2. Tune MergeTree part size

ClickHouse® stores data in parts that are periodically merged together. Too many small parts slow down queries because ClickHouse® reads from multiple files. Configure merge settings in `/etc/clickhouse-server/config.d/merge_settings.xml`:

```xml
<clickhouse>
    <merge_tree>
        <max_bytes_to_merge_at_max_space_in_pool>161061273600</max_bytes_to_merge_at_max_space_in_pool>
    </merge_tree>
</clickhouse>

```

### 3. Raise max open files

ClickHouse® opens many files simultaneously during query execution. Increase system limits to prevent "too many open files" errors. Edit `/etc/security/limits.conf`:

```bash
clickhouse soft nofile 262144
clickhouse hard nofile 262144

```

If running ClickHouse® via systemd, also edit the service file to include `LimitNOFILE=262144`.

### 4. Disable swap and enforce fsync

Swap degrades ClickHouse® performance because the database expects data in RAM for fast access. Disable swap entirely on database servers with `sudo swapoff -a` and remove swap entries from `/etc/fstab`. For data protection, enable `fsync_after_insert` to ensure writes are flushed to disk before acknowledging success, though this reduces insert throughput.

## Adding replication and high availability

Single-node ClickHouse® works for development but creates a single point of failure in production. Replication distributes data across multiple nodes so queries continue working even if one server fails.

### 1. Set up ClickHouse® Keeper or ZooKeeper

ClickHouse® uses a coordination service to manage replication metadata and ensure consistency across nodes. ClickHouse® Keeper is the modern, built-in option that's easier to operate than ZooKeeper. Add this to `/etc/clickhouse-server/config.d/keeper.xml` on three separate servers:

```xml
<clickhouse>
    <keeper_server>
        <tcp_port>9181</tcp_port>
        <server_id>1</server_id>
        <raft_configuration>
            <server>
                <id>1</id>
                <hostname>keeper1.example.com</hostname>
                <port>9234</port>
            </server>
        </raft_configuration>
    </keeper_server>
</clickhouse>

```

Change `server_id` to 2 and 3 on the other servers. Three nodes provide fault tolerance for one node failure.

### 2. Create replicated tables

Convert tables to replicated versions using the `ReplicatedMergeTree` engine:

```sql
CREATE TABLE test_db.events_replicated (
    event_id String,
    user_id String,
    event_type String,
    event_time DateTime,
    value Float64
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/events', '{replica}')
ORDER BY (event_type, event_time);
```

The first argument is the ZooKeeper path where replication metadata is stored, and the second identifies this specific replica.

### 3. Test node failover

Stop the ClickHouse® service on one node with `sudo systemctl stop clickhouse-server` and run queries against the remaining nodes. Queries continue working without errors, though write operations may pause briefly while the cluster detects the failure.

## Monitoring and alerting with Prometheus and Grafana

Observability helps catch performance issues and capacity problems before they affect users. ClickHouse® exposes metrics that Prometheus can scrape, and Grafana provides visualization dashboards.

### 1. Expose ClickHouse® metrics

ClickHouse® includes a built-in metrics endpoint on port 9363. Enable it by adding this to `/etc/clickhouse-server/config.d/prometheus.xml`:

```xml
<clickhouse>
    <prometheus>
        <endpoint>/metrics</endpoint>
        <port>9363</port>
        <metrics>true</metrics>
    </prometheus>
</clickhouse>

```

Configure Prometheus to scrape this endpoint by adding a job to `prometheus.yml`:

```yaml
scrape_configs:
  - job_name: 'clickhouse'
    static_configs:
      - targets: ['clickhouse-server:9363']
```

### 2. Import community dashboards

The ClickHouse® community maintains Grafana dashboards that visualize key metrics. Import dashboard ID 14192 from Grafana's dashboard repository for a comprehensive overview. Watch for queries per second, query duration percentiles, and memory usage.

### 3. Set latency and disk alerts

Create Prometheus alert rules for conditions that indicate problems:

```yaml
- alert: HighQueryLatency
  expr: histogram_quantile(0.95, rate(ClickHouse®ProfileEvents_Query[5m])) > 5
  for: 5m
  annotations:
    summary: "95th percentile query latency above 5 seconds"

```

## Backup and restore strategies

Regular backups protect against data loss from hardware failure, software bugs, or operator errors. ClickHouse®'s backup tools support both full and incremental backups.

### 1. Snapshot with `clickhouse-backup`

The `clickhouse-backup` tool creates consistent snapshots of ClickHouse® tables without stopping the server. Configure it by creating `/etc/clickhouse-backup/config.yml`:

```yaml
general:
  remote_storage: s3
  backups_to_keep_local: 3
  backups_to_keep_remote: 30

s3:
  bucket: clickhouse-backups
  region: us-east-1

```

Create a backup with `clickhouse-backup create backup_name`.

### 2. Ship snapshots to object storage

Storing backups on the same server as the database doesn't protect against hardware failure. Upload backups to S3 with `clickhouse-backup upload backup_name`. Schedule regular backups using cron:

```bash
0 2 * * * /usr/bin/clickhouse-backup create && /usr/bin/clickhouse-backup upload $(clickhouse-backup list local | tail -n1)

```

### 3. Restore to a new node

Test your restore process regularly to verify backups work. Download a backup with `clickhouse-backup download backup_name` and restore it with `clickhouse-backup restore backup_name`. The restore process recreates tables and copies data files into place.

## When NOT to self-host

Self-hosting makes sense when you have the team and expertise to manage infrastructure, but many teams find the operational overhead outweighs the benefits. If you're spending more time managing ClickHouse® than building features, a managed service might be a better fit.

[Tinybird](https://www.tinybird.co) provides [managed ClickHouse®](https://www.tinybird.co/product/managed-clickhouse) with developer-focused features that self-hosted deployments require significant engineering effort to replicate. The platform handles infrastructure scaling, backup management, and monitoring automatically. Tinybird's API layer lets you expose ClickHouse® queries as REST endpoints without building your own API server, which accelerates development for teams integrating analytics into applications.

For teams without dedicated database administrators or those prioritizing speed over infrastructure control, [managed services](https://www.tinybird.co/blog/best-cloud-managed-clickhouse) eliminate complexity while maintaining ClickHouse®'s performance. [Sign up for a free Tinybird account](https://cloud.tinybird.co/signup) to explore managed ClickHouse® without the operational overhead of self-hosting.

## Frequently asked questions about self-hosting ClickHouse®

### Can I run ClickHouse® on ARM processors?

Yes, ClickHouse® supports ARM64 architecture including Apple Silicon and AWS Graviton processors. Performance delivers [~25% higher QPS](https://clickhouse.com/blog/graviton-boosts-clickhouse-cloud-performance?utm_source=openai) than x86 for most workloads, and ARM instances often cost less than equivalent x86 instances in cloud environments.

### How does ClickHouse® storage cost compare to cloud warehouses?

Self-hosted ClickHouse® typically costs less for storage because you control the infrastructure and can choose cheaper storage tiers. ClickHouse®'s compression ratios are often 10-20x better than traditional warehouses, further reducing storage costs.

### Does ClickHouse® support ACID transactions?

ClickHouse® provides atomic inserts and consistent reads but not full ACID transactions across multiple tables. It's designed for analytical workloads where data is inserted in batches rather than transactional systems that require row-level locking and rollback capabilities.

/
