Self-hosting ClickHouse means installing and running the database on infrastructure you control, rather than using a managed cloud service. You handle installation, configuration, backups, and scaling yourself, which gives you complete control over performance tuning and data location but requires ongoing operational work.
This guide walks through the complete process of setting up a self-hosted ClickHouse deployment, from initial installation and configuration to production hardening, replication, monitoring, and backup strategies.
What it takes to self-host ClickHouse
Self-hosting ClickHouse isn't for the faint of heart. You are responsible for setting up and running the database on servers you control, rather than using a managed service, though there are multiple deployment options to consider. You handle the installation, configuration, and maintenance yourself, whether that’s on physical machines in a data center, virtual machines from a cloud provider, or k8s containers.
This approach makes sense in specific situations. If your company has strict data residency rules that require data to stay within certain geographic boundaries or private networks, self-hosting gives you that control. If you have existing infrastructure expertise you might prefer self-hosting to fine-tune performance settings and manage costs directly, especially when processing very large data volumes where managed service pricing can add up quickly. Or if you're building a commercial open-source SaaS product, you might want to keep your stack completely self-managed.
The tradeoff is operational work. You’re responsible for installation, security patches, backups, monitoring, and scaling. For teams building applications rather than managing databases, managed services like Tinybird handle this complexity while keeping ClickHouse’s performance characteristics intact.
| Consideration | Self-hosted | Managed (Tinybird) |
|---|---|---|
| Setup time | Hours to days | Minutes |
| Operational overhead | Patching, monitoring, scaling | Fully managed |
| Infrastructure control | Full control | Abstracted |
| Cost structure | Infrastructure + personnel | Usage-based |
| API layer | Build your own | Built-in HTTP APIs |
Prerequisites for a safe install
Before installing ClickHouse, check that your system meets the minimum requirements. Inadequate resources or incorrect system settings can lead to poor performance, data corruption, or installation failures.
1. CPU and memory sizing
ClickHouse needs at least 4 CPU cores and 8GB of RAM for basic workloads. Analytical queries load data into memory for processing, so more RAM directly improves query speed. Production deployments typically start with 16GB and scale up based on concurrent queries and dataset size.
2. Disk type and layout
SSDs are strongly recommended over mechanical drives because ClickHouse performs many random reads during query execution. Hard drives create bottlenecks that can slow queries by 10x or more, while SSDs can achieve 316.99 M rows/s throughput rates. Separating data and log directories onto different mount points improves performance and makes troubleshooting easier when disk issues occur.
3. Network and firewall rules
ClickHouse listens on port 9000 for native protocol connections (used by clickhouse-client) and port 8123 for HTTP connections (used by applications). In production, restrict these ports to trusted networks using firewall rules. Opening port 9000 to the public internet without authentication creates a security vulnerability.
4. Kernel and sysctl settings
Linux kernel parameters affect how ClickHouse handles file operations and network connections. Increase the maximum number of open files to at least 262144 with ulimit -n 262144 or by editing /etc/security/limits.conf. TCP settings like net.ipv4.tcp_keepalive_time help maintain long-running connections for streaming data.
# Add to /etc/sysctl.conf
net.ipv4.tcp_keepalive_time = 300
net.core.somaxconn = 4096
vm.max_map_count = 262144
Step-by-step installation methods
ClickHouse can be installed using package managers, containers, or orchestration tools. The method you pick depends on your existing infrastructure and comfort level with different deployment approaches.
1. APT or YUM on popular Linux distros
The official ClickHouse repository provides packages for Ubuntu, Debian, CentOS, and RHEL. This method integrates with your system’s package manager for easy updates.
For Ubuntu or Debian:
sudo apt-get install -y apt-transport-https ca-certificates dirmngr
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 8919F6BD2B48D754
echo "deb https://packages.clickhouse.com/deb stable main" | sudo tee /etc/apt/sources.list.d/clickhouse.list
sudo apt-get update
sudo apt-get install -y clickhouse-server clickhouse-client
After installation, start the server with sudo systemctl start clickhouse-server and enable it to start on boot with sudo systemctl enable clickhouse-server.
2. Docker Compose quick start
Docker Compose provides the fastest way to get ClickHouse running for development or testing (though keep in mind that self-hosted ClickHouse on Docker isn't the only Docker-based ClickHouse solution for local dev). Create a docker-compose.yml file:
version: '3.8'
services:
clickhouse:
image: clickhouse/clickhouse-server:latest
ports:
- "8123:8123"
- "9000:9000"
volumes:
- clickhouse\_data:/var/lib/clickhouse
ulimits:
nofile:
soft: 262144
hard: 262144
volumes:
clickhouse\_data:
Run docker-compose up -d to start the container in the background. This works well for local development but requires additional configuration for production use.
3. Kubernetes Helm chart
The official ClickHouse Helm chart simplifies deployment in container-native environments. Add the ClickHouse repository:
helm repo add clickhouse https://charts.clickhouse.com
helm repo update
helm install my-clickhouse clickhouse/clickhouse
Production deployments require customizing values for replica counts, storage classes, and resource limits by creating a values.yaml file.
Step-by-step first query and sample data
After installation, verify ClickHouse works by creating a database, inserting data, and running a query. This confirms the installation succeeded before moving to production configuration.
1. Create a database and table
Connect to ClickHouse using clickhouse-client and create a database:
CREATE DATABASE test_db;
Create a table using the MergeTree engine, which is ClickHouse’s most common table engine:
CREATE TABLE test\_db.events (
event\_id String,
user\_id String,
event\_type String,
event\_time DateTime,
value Float64
) ENGINE = MergeTree()
ORDER BY (event\_type, event\_time);
The ORDER BY clause defines how data is sorted on disk. Queries that filter or aggregate by event_type and event_time will be fast because data is physically organized in that order.
2. Insert sample rows
Insert a few rows to test the table:
INSERT INTO test_db.events VALUES
('evt_001', 'user_123', 'pageview', '2025-01-15 10:30:00', 1.0),
('evt_002', 'user_456', 'click', '2025-01-15 10:31:00', 2.5),
('evt_003', 'user_123', 'purchase', '2025-01-15 10:32:00', 49.99);
For larger datasets, ClickHouse supports batch inserts from CSV, JSON, or Parquet files.
3. Run a select with clickhouse-client
Query the data to confirm everything works:
SELECT event_type, count() AS event_count, avg(value) AS avg_value
FROM test_db.events
GROUP BY event_type
ORDER BY event_count DESC;
If the query returns data, your ClickHouse installation is working.
Core configuration tweaks for production safety
Default ClickHouse settings work for development but require adjustment for production. These changes prevent common issues like data loss and performance degradation, and there are specific optimization steps to achieve peak performance.
1. Enable asynchronous inserts
Asynchronous inserts batch multiple small inserts together before writing to disk, which significantly improves write throughput. Add this to /etc/clickhouse-server/config.d/async_insert.xml:
1
1
10485760
The wait_for_async_insert setting ensures clients receive confirmation after data is written to disk, not just buffered in memory.
2. Tune MergeTree part size
ClickHouse stores data in parts that are periodically merged together. Too many small parts slow down queries because ClickHouse reads from multiple files. Configure merge settings in /etc/clickhouse-server/config.d/merge_settings.xml:
<clickhouse>
<merge_tree>
<max_bytes_to_merge_at_max_space_in_pool>161061273600</max_bytes_to_merge_at_max_space_in_pool>
</merge_tree>
</clickhouse>
3. Raise max open files
ClickHouse opens many files simultaneously during query execution. Increase system limits to prevent "too many open files" errors. Edit /etc/security/limits.conf:
clickhouse soft nofile 262144
clickhouse hard nofile 262144
If running ClickHouse via systemd, also edit the service file to include LimitNOFILE=262144.
4. Disable swap and enforce fsync
Swap degrades ClickHouse performance because the database expects data in RAM for fast access. Disable swap entirely on database servers with sudo swapoff -a and remove swap entries from /etc/fstab. For data protection, enable fsync_after_insert to ensure writes are flushed to disk before acknowledging success, though this reduces insert throughput.
Adding replication and high availability
Single-node ClickHouse works for development but creates a single point of failure in production. Replication distributes data across multiple nodes so queries continue working even if one server fails.
1. Set up ClickHouse Keeper or ZooKeeper
ClickHouse uses a coordination service to manage replication metadata and ensure consistency across nodes. ClickHouse Keeper is the modern, built-in option that's easier to operate than ZooKeeper. Add this to /etc/clickhouse-server/config.d/keeper.xml on three separate servers:
<clickhouse>
<keeper_server>
<tcp_port>9181</tcp_port>
<server_id>1</server_id>
<raft_configuration>
<server>
<id>1</id>
<hostname>keeper1.example.com</hostname>
<port>9234</port>
</server>
</raft_configuration>
</keeper_server>
</clickhouse>
Change server_id to 2 and 3 on the other servers. Three nodes provide fault tolerance for one node failure.
2. Create replicated tables
Convert tables to replicated versions using the ReplicatedMergeTree engine:
CREATE TABLE test_db.events_replicated (
event_id String,
user_id String,
event_type String,
event_time DateTime,
value Float64
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/events', '{replica}')
ORDER BY (event_type, event_time);
The first argument is the ZooKeeper path where replication metadata is stored, and the second identifies this specific replica.
3. Test node failover
Stop the ClickHouse service on one node with sudo systemctl stop clickhouse-server and run queries against the remaining nodes. Queries continue working without errors, though write operations may pause briefly while the cluster detects the failure.
Monitoring and alerting with Prometheus and Grafana
Observability helps catch performance issues and capacity problems before they affect users. ClickHouse exposes metrics that Prometheus can scrape, and Grafana provides visualization dashboards.
1. Expose ClickHouse metrics
ClickHouse includes a built-in metrics endpoint on port 9363. Enable it by adding this to /etc/clickhouse-server/config.d/prometheus.xml:
<clickhouse>
<prometheus>
<endpoint>/metrics</endpoint>
<port>9363</port>
<metrics>true</metrics>
</prometheus>
</clickhouse>
Configure Prometheus to scrape this endpoint by adding a job to prometheus.yml:
scrape_configs:
- job_name: 'clickhouse'
static_configs:
- targets: ['clickhouse-server:9363']
2. Import community dashboards
The ClickHouse community maintains Grafana dashboards that visualize key metrics. Import dashboard ID 14192 from Grafana's dashboard repository for a comprehensive overview. Watch for queries per second, query duration percentiles, and memory usage.
3. Set latency and disk alerts
Create Prometheus alert rules for conditions that indicate problems:
- alert: HighQueryLatency
expr: histogram_quantile(0.95, rate(ClickHouseProfileEvents_Query[5m])) > 5
for: 5m
annotations:
summary: "95th percentile query latency above 5 seconds"
Backup and restore strategies
Regular backups protect against data loss from hardware failure, software bugs, or operator errors. ClickHouse's backup tools support both full and incremental backups.
1. Snapshot with clickhouse-backup
The clickhouse-backup tool creates consistent snapshots of ClickHouse tables without stopping the server. Configure it by creating /etc/clickhouse-backup/config.yml:
general:
remote_storage: s3
backups_to_keep_local: 3
backups_to_keep_remote: 30
s3:
bucket: clickhouse-backups
region: us-east-1
Create a backup with clickhouse-backup create backup_name.
2. Ship snapshots to object storage
Storing backups on the same server as the database doesn't protect against hardware failure. Upload backups to S3 with clickhouse-backup upload backup_name. Schedule regular backups using cron:
0 2 * * * /usr/bin/clickhouse-backup create && /usr/bin/clickhouse-backup upload $(clickhouse-backup list local | tail -n1)
3. Restore to a new node
Test your restore process regularly to verify backups work. Download a backup with clickhouse-backup download backup_name and restore it with clickhouse-backup restore backup_name. The restore process recreates tables and copies data files into place.
When NOT to self-host
Self-hosting makes sense when you have the team and expertise to manage infrastructure, but many teams find the operational overhead outweighs the benefits. If you're spending more time managing ClickHouse than building features, a managed service might be a better fit.
Tinybird provides managed ClickHouse with developer-focused features that self-hosted deployments require significant engineering effort to replicate. The platform handles infrastructure scaling, backup management, and monitoring automatically. Tinybird's API layer lets you expose ClickHouse queries as REST endpoints without building your own API server, which accelerates development for teams integrating analytics into applications.
For teams without dedicated database administrators or those prioritizing speed over infrastructure control, managed services eliminate complexity while maintaining ClickHouse's performance. Sign up for a free Tinybird account to explore managed ClickHouse without the operational overhead of self-hosting.
Frequently asked questions about self-hosting ClickHouse
Can I run ClickHouse on ARM processors?
Yes, ClickHouse supports ARM64 architecture including Apple Silicon and AWS Graviton processors. Performance delivers ~25% higher QPS than x86 for most workloads, and ARM instances often cost less than equivalent x86 instances in cloud environments.
How does ClickHouse storage cost compare to cloud warehouses?
Self-hosted ClickHouse typically costs less for storage because you control the infrastructure and can choose cheaper storage tiers. ClickHouse's compression ratios are often 10-20x better than traditional warehouses, further reducing storage costs.
Does ClickHouse support ACID transactions?
ClickHouse provides atomic inserts and consistent reads but not full ACID transactions across multiple tables. It's designed for analytical workloads where data is inserted in batches rather than transactional systems that require row-level locking and rollback capabilities.
/
