Setting up a ClickHouse server means editing XML configuration files that control everything from network ports to storage policies. The config.xml
file sits at the heart of every ClickHouse deployment, and getting it wrong can prevent your server from starting or cause performance problems that only show up under load.
This guide walks through the structure of ClickHouse configuration files, explains the most important settings, and shows you how to test and deploy changes safely.
What is config.xml
and where does ClickHouse look for it
The config.xml
file is ClickHouse's main server configuration file, and it lives at /etc/clickhouse-server/config.xml
by default. This file tells ClickHouse where to store data, which ports to listen on, how to write logs, and how to connect to other servers in a cluster.
ClickHouse looks for configuration files in a specific order. First, it reads the main config.xml
file. Then it processes any XML or YAML files in the /etc/clickhouse-server/config.d/
directory. Files in config.d/
can override settings from the base configuration, which makes it easier to manage different environments without editing the main file.
When ClickHouse starts up, it merges all configuration files into one internal representation. You can see what the server actually uses by checking /var/lib/clickhouse/preprocessed_configs/
, where ClickHouse writes the final merged configuration.
Folder structure and include hierarchy
Production ClickHouse deployments split settings across multiple files instead of cramming everything into one massive config.xml
. This structure makes it easier to track changes in version control and swap out environment-specific settings.
A typical setup looks like this:
/etc/clickhouse-server/
├── config.xml # Main configuration
├── users.xml # User accounts and permissions
└── config.d/ # Additional configs
├── network.xml # Ports and interfaces
├── storage.xml # Disk policies
├── clusters.xml # Cluster topology
└── logging.xml # Log settings
This modular approach lets you commit base configurations to git while keeping secrets like passwords in separate files that stay out of version control.
<include_from>
mechanism
The <include_from>
directive pulls in external files for settings you want to keep separate. This works well for credentials and environment-specific values that change between development and production.
<clickhouse>
<include_from>/etc/clickhouse-server/secrets.xml</include_from>
</clickhouse>
The referenced file can contain any valid configuration elements. ClickHouse merges them into the main configuration at startup, so you can keep database passwords and API keys in a file that doesn't get committed to your repository.
Precedence between XML and YAML
ClickHouse reads both XML and YAML configuration files, and you can mix them in the same deployment. When both formats define the same setting, the file processed last wins based on alphabetical order in the config.d/
directory.
If you have both settings.xml
and settings.yaml
in config.d/
, the YAML file overrides conflicting settings from the XML file because 'y' comes after 'x' alphabetically. Configuration files are merged in alphabetical order. Most production setups stick to one format for consistency.
preprocessed_configs
explained
The preprocessed_configs
directory at /var/lib/clickhouse/preprocessed_configs/
holds the final merged configuration that ClickHouse actually runs with. When troubleshooting configuration problems, this directory shows you exactly what settings the server applied after processing all includes and overrides.
This directory updates every time ClickHouse starts or reloads its configuration, generating file-preprocessed.xml files with all completed substitutions and overrides. If your changes don't seem to take effect, comparing the preprocessed files to your source files reveals syntax errors or incorrect include paths.
The smallest working config.xml
example
A minimal ClickHouse configuration only needs a few elements to start the server. Here's the absolute minimum for a working installation:
<clickhouse>
<logger>
<level>information</level>
<console>1</console>
</logger>
<http_port>8123</http_port>
<tcp_port>9000</tcp_port>
<path>/var/lib/clickhouse/</path>
</clickhouse>
This sets up logging to console at the information level, opens the HTTP interface on port 8123 and native TCP on port 9000, and stores data in /var/lib/clickhouse/
. Production deployments add more settings for security, performance, and clustering.
Example with comments
Here's the minimal configuration with annotations explaining what each part does:
<clickhouse>
<!-- Logging: where and how verbose -->
<logger>
<!-- Options: trace, debug, information, warning, error -->
<level>information</level>
<!-- Write to console (1) or file (0) -->
<console>1</console>
</logger>
<!-- HTTP interface for queries and monitoring -->
<http_port>8123</http_port>
<!-- Native TCP protocol for clickhouse-client -->
<tcp_port>9000</tcp_port>
<!-- Base directory for data storage -->
<path>/var/lib/clickhouse/</path>
</clickhouse>
Each setting has defaults, but writing them out explicitly makes server behavior predictable across different environments.
YAML equivalent
The same minimal configuration in YAML format looks cleaner to some people:
clickhouse:
logger:
level: information
console: 1
http_port: 8123
tcp_port: 9000
path: /var/lib/clickhouse/
YAML's indentation-based structure eliminates closing tags. XML remains more common in ClickHouse deployments, especially where configuration management tools already use XML.
Core sections you must know
Production ClickHouse setups configure several key sections beyond the minimal example. These sections control logging, networking, clustering, storage, compression, and replication.
1. <logger>
The logger section controls where ClickHouse writes logs and how much detail to capture. Log levels range from trace
(most verbose) to error
(least verbose), with information
being a solid default for production.
<logger>
<level>information</level>
<log>/var/log/clickhouse-server/clickhouse-server.log</log>
<errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog>
<size>100M</size>
<count>10</count>
</logger>
The size
and count
settings control log rotation, keeping your 10 most recent log files at 100MB each. For debugging performance problems, temporarily switching to debug
or trace
provides more detail, though these levels generate significantly more log data.
2. <listen_host>
and network ports
Network configuration determines which interfaces ClickHouse binds to and which ports it listens on. By default, ClickHouse only accepts connections from localhost, protecting against accidental exposure.
<listen_host>0.0.0.0</listen_host>
<http_port>8123</http_port>
<tcp_port>9000</tcp_port>
<interserver_http_port>9009</interserver_http_port>
Setting listen_host
to 0.0.0.0
allows connections from any network interface. In production, combine this with firewall rules or security groups to restrict access. The interserver_http_port
handles communication between ClickHouse servers in a cluster for replication and distributed queries.
3. <remote_servers>
or <clusters>
The remote_servers
section defines cluster topology for distributed queries and replicated tables. Each cluster contains one or more shards, and each shard can have multiple replicas for high availability.
<remote_servers>
<production_cluster>
<shard>
<replica>
<host>clickhouse-01.example.com</host>
<port>9000</port>
</replica>
<replica>
<host>clickhouse-02.example.com</host>
<port>9000</port>
</replica>
</shard>
</production_cluster>
</remote_servers>
This creates a cluster named production_cluster
with a single shard containing two replicas. Queries using the Distributed
table engine or ON CLUSTER
syntax reference this cluster name to execute across multiple servers. Operating large-scale ClickHouse clusters requires careful attention to shard distribution and replica configuration.
4. <storage_configuration>
Storage configuration defines disks, volumes, and policies that control where ClickHouse stores data. This becomes important when you want different storage types for hot and cold data or tiered storage strategies.
<storage_configuration>
<disks>
<default>
<path>/var/lib/clickhouse/</path>
</default>
<s3_disk>
<type>s3</type>
<endpoint>https://s3.amazonaws.com/my-bucket/clickhouse/</endpoint>
<access_key_id from_env="AWS_ACCESS_KEY_ID"/>
<secret_access_key from_env="AWS_SECRET_ACCESS_KEY"/>
</s3_disk>
</disks>
<policies>
<tiered>
<volumes>
<hot>
<disk>default</disk>
</hot>
<cold>
<disk>s3_disk</disk>
</cold>
</volumes>
</tiered>
</policies>
</storage_configuration>
This example defines a tiered storage policy with local disk for hot data and S3 for cold data. Tables can reference the tiered
policy to automatically move older data to cheaper storage as it ages.
5. <compression>
Compression settings control which algorithms ClickHouse uses for different data types. Proper compression configuration reduces storage costs and improves query performance by reducing I/O.
<compression>
<case>
<method>lz4</method>
</case>
<case>
<min_part_size>10000000</min_part_size>
<min_part_size_ratio>0.01</min_part_size_ratio>
<method>zstd</method>
<level>3</level>
</case>
</compression>
LZ4 offers the fastest decompression speed, making it good for frequently accessed data. ZSTD provides better compression ratios at the cost of slightly slower decompression, which works well for larger data parts queried less often.
6. <zookeeper>
for replication
The zookeeper
section configures coordination services for replicated tables. ClickHouse uses ZooKeeper or ClickHouse Keeper to maintain consistency across replicas and coordinate distributed operations.
<zookeeper>
<node>
<host>keeper-01.example.com</host>
<port>9181</port>
</node>
<node>
<host>keeper-02.example.com</host>
<port>9181</port>
</node>
<node>
<host>keeper-03.example.com</host>
<port>9181</port>
</node>
</zookeeper>
A ZooKeeper ensemble typically consists of three or five nodes for fault tolerance. ClickHouse Keeper is recommended to replace ZooKeeper for ClickHouse clusters, using the same configuration format and offering better integration with ClickHouse-specific operations.
Settings that can be reloaded without restart
Some configuration changes take effect immediately when you run SYSTEM RELOAD CONFIG
, while others require a full server restart. Knowing which settings support hot reloading helps you make changes without downtime.
Settings that can be reloaded include:
- Logger configuration: Log levels and file paths
- Compression settings: Algorithm and level changes
- User settings: Permissions and quotas from
users.xml
- Dictionary definitions: External dictionary configurations
- Storage policies: New disks and volumes (with limitations)
Changes to network ports, memory limits, and core server paths always require a restart. When in doubt, check the ClickHouse documentation for your specific version, as hot reload capabilities have expanded over time.
Live reload command
The SYSTEM RELOAD CONFIG
command tells ClickHouse to reprocess its configuration files and apply changes that support hot reloading:
SYSTEM RELOAD CONFIG;
After running this command, check the server logs to confirm which settings were updated. If your changes don't appear to take effect, the setting likely requires a restart, or there's a syntax error preventing the configuration from loading.
Sections requiring full restart
Several important configuration sections cannot be changed without restarting the ClickHouse server. These include network settings, memory limits, and storage paths.
Plan configuration changes to these sections during maintenance windows:
- Network configuration: Changes to ports and listen addresses
- Memory limits: Max memory usage and buffer sizes
- Storage paths: Data directory and metadata locations
- Interserver communication: Cluster and replication ports
- Core server settings: Thread pools and background task limits
Testing changes in a development environment first helps catch problems before they affect production systems.
Common issues and how to avoid them
Configuration mistakes can prevent ClickHouse from starting or cause subtle performance problems that only appear under load. Here are the most frequent issues that trip up both new and experienced users.
Wrong path to users.xml
The users.xml
file defines user accounts, passwords, and access permissions. By default, ClickHouse looks for this file in the same directory as config.xml
, but you can override this location:
<users_config>/etc/clickhouse-server/users.xml</users_config>
If ClickHouse can't find the users configuration, it won't start and will log an error about missing user definitions. Using absolute paths rather than relative paths prevents problems when ClickHouse starts from different working directories.
Too-low max_open_files
ClickHouse can open thousands of files simultaneously when processing queries across many data parts. Operating system limits on open file descriptors often default to values too low for production workloads.
<max_open_files>262144</max_open_files>
You'll also need to increase the system limit using ulimit
or systemd configuration. If ClickHouse hits the file descriptor limit, queries fail with "Too many open files" errors, which can be difficult to diagnose without checking system logs.
Misconfigured disks or volumes
Storage configuration errors often prevent ClickHouse from starting or cause data to be written to unexpected locations. Common mistakes include incorrect paths, missing directories, and insufficient permissions.
<disks>
<default>
<path>/var/lib/clickhouse/</path>
<keep_free_space_bytes>10737418240</keep_free_space_bytes>
</default>
</disks>
The keep_free_space_bytes
setting reserves disk space for system operations and prevents ClickHouse from completely filling the disk. The ClickHouse process needs write permissions to all configured disk paths, which you can verify before starting the server.
Step-by-step workflow to edit test and deploy config changes
Changing ClickHouse configuration in production requires a careful process to avoid downtime and data loss. This workflow balances safety with the ability to iterate quickly on configuration improvements.
1. Back up current config
Before making any changes, copy your existing configuration files to a backup location with a timestamp:
sudo cp -r /etc/clickhouse-server /etc/clickhouse-server.backup.$(date +%Y%m%d_%H%M%S)
This creates a dated backup you can restore if the new configuration causes problems. Store backups outside the configuration directory to prevent them from being processed as active configuration files.
2. Validate locally with clickhouse-server --config-file
Test your configuration changes in a development environment before deploying to production. ClickHouse provides a validation mode that checks syntax without starting the full server:
clickhouse-server --config-file=/etc/clickhouse-server/config.xml --daemon
Watch the logs for startup errors or warnings about deprecated settings. When configuration changes affect table structures, you'll need strategies for handling schema migrations in production without disrupting data ingestion. If the server starts successfully, run a few test queries to verify the configuration behaves as expected under load.
3. Commit to git
and trigger CI
Version control for configuration files provides an audit trail of changes and enables automated testing. Store your ClickHouse configurations in a git
repository with a CI pipeline that validates syntax:
# Example GitHub Actions workflow
- name: Validate ClickHouse config
run: |
docker run --rm -v $(pwd):/config clickhouse/clickhouse-server \
clickhouse-server --config-file=/config/config.xml --dry-run
Automated validation catches syntax errors before they reach production. Some teams also run integration tests that start a ClickHouse container with the new configuration and execute representative queries.
4. Roll out and monitor metrics
Deploy configuration changes incrementally, starting with a single server in your cluster if possible. Monitor key metrics like query latency, memory usage, and error rates for at least 30 minutes before proceeding to additional servers.
If you're using a load balancer, you can take servers out of rotation one at a time, update their configuration, restart them, verify they're healthy, and then add them back to the pool. This rolling deployment approach prevents total service interruption if a configuration change causes unexpected problems.
Avoid config.xml
concerns and try Tinybird
Managing ClickHouse configuration becomes increasingly complex as your deployment grows, leading many teams to consider managed ClickHouse services instead.
Self-hosting ClickHouse means you're responsible for choosing compression algorithms, configuring storage tiers, tuning memory limits, and adjusting thread pools. These decisions require deep knowledge of ClickHouse internals and ongoing attention as your data volume and query patterns change, which is why understanding the differences between self-hosted and managed ClickHouse solutions becomes important.
Tinybird eliminates this configuration burden by providing a fully managed ClickHouse service with pre-optimized settings. The platform handles cluster configuration, storage policies, and performance tuning automatically, letting you focus on building features instead of managing infrastructure. Sign up for a free Tinybird account to see how quickly you can start querying data without touching a single configuration file.
Frequently asked questions about ClickHouse config.xml
Can I store ClickHouse secrets in environment variables?
Yes, use substitution syntax like <password from_env="DB_PASSWORD"/>
in your config.xml
file. This keeps sensitive data out of configuration files and works well with container orchestration systems that inject environment variables at runtime.
How do I migrate my existing config.xml
to a managed service?
Export your current settings using SHOW CREATE TABLE
and SELECT * FROM system.settings
queries. Most managed services handle configuration automatically, so you typically only need to migrate your data and table schemas rather than recreating your entire config.xml
.
Is it safe to symlink config files in Kubernetes?
Yes, but use ConfigMaps
and mounted volumes instead of symlinks for better container orchestration. This approach provides better version control and rollback capabilities in Kubernetes environments, and it integrates more cleanly with tools like Helm
and Kustomize
.