Name: Tinybird
Brand: Tinybird
Rating: 5.0 (10 reviews)

Setting up a ClickHouse® server means editing XML configuration files that control everything from network ports to storage policies. The config.xml file sits at the heart of every ClickHouse® deployment, and getting it wrong can prevent your server from starting or cause performance problems that only show up under load.

This guide walks through the structure of ClickHouse® configuration files, explains the most important settings, and shows you how to test and deploy changes safely.

What is `config.xml` and where does ClickHouse look for it

The config.xml file is ClickHouse's main server configuration file, and it lives at /etc/clickhouse-server/config.xml by default. This file tells ClickHouse where to store data, which ports to listen on, how to write logs, and how to connect to other servers in a cluster.

ClickHouse looks for configuration files in a specific order. First, it reads the main config.xml file. Then it processes any XML or YAML files in the /etc/clickhouse-server/config.d/ directory. Files in config.d/ can override settings from the base configuration, which makes it easier to manage different environments without editing the main file.

When ClickHouse starts up, it merges all configuration files into one internal representation. You can see what the server actually uses by checking /var/lib/clickhouse/preprocessed_configs/, where ClickHouse writes the final merged configuration.

Folder structure and include hierarchy

Production ClickHouse deployments split settings across multiple files instead of cramming everything into one massive config.xml. This structure makes it easier to track changes in version control and swap out environment-specific settings.

A typical setup looks like this:

/etc/clickhouse-server/
├── config.xml                    # Main configuration
├── users.xml                     # User accounts and permissions
└── config.d/                     # Additional configs
    ├── network.xml               # Ports and interfaces
    ├── storage.xml               # Disk policies
    ├── clusters.xml              # Cluster topology
    └── logging.xml               # Log settings

This modular approach lets you commit base configurations to git while keeping secrets like passwords in separate files that stay out of version control.

`<include_from>` mechanism

The <include_from> directive pulls in external files for settings you want to keep separate. This works well for credentials and environment-specific values that change between development and production.

<clickhouse>
    <include_from>/etc/clickhouse-server/secrets.xml</include_from>
</clickhouse>

The referenced file can contain any valid configuration elements. ClickHouse merges them into the main configuration at startup, so you can keep database passwords and API keys in a file that doesn't get committed to your repository.

Precedence between XML and YAML

ClickHouse reads both XML and YAML configuration files, and you can mix them in the same deployment. When both formats define the same setting, the file processed last wins based on alphabetical order in the config.d/ directory.

If you have both settings.xml and settings.yaml in config.d/, the YAML file overrides conflicting settings from the XML file because 'y' comes after 'x' alphabetically. Configuration files are merged in alphabetical order. Most production setups stick to one format for consistency.

`preprocessed_configs` explained

The preprocessed_configs directory at /var/lib/clickhouse/preprocessed_configs/ holds the final merged configuration that ClickHouse actually runs with. When troubleshooting configuration problems, this directory shows you exactly what settings the server applied after processing all includes and overrides.

This directory updates every time ClickHouse starts or reloads its configuration, generating file-preprocessed.xml files with all completed substitutions and overrides. If your changes don't seem to take effect, comparing the preprocessed files to your source files reveals syntax errors or incorrect include paths.

The smallest working `config.xml` example

A minimal ClickHouse configuration only needs a few elements to start the server. Here's the absolute minimum for a working installation:

<clickhouse>
    <logger>
        <level>information</level>
        <console>1</console>
    </logger>
    
    <http_port>8123</http_port>
    <tcp_port>9000</tcp_port>
    
    <path>/var/lib/clickhouse/</path>
</clickhouse>

This sets up logging to console at the information level, opens the HTTP interface on port 8123 and native TCP on port 9000, and stores data in /var/lib/clickhouse/. Production deployments add more settings for security, performance, and clustering.

Example with comments

Here's the minimal configuration with annotations explaining what each part does:

<clickhouse>
    <!-- Logging: where and how verbose -->
    <logger>
        <!-- Options: trace, debug, information, warning, error -->
        <level>information</level>
        <!-- Write to console (1) or file (0) -->
        <console>1</console>
    </logger>
    
    <!-- HTTP interface for queries and monitoring -->
    <http_port>8123</http_port>
    
    <!-- Native TCP protocol for clickhouse-client -->
    <tcp_port>9000</tcp_port>
    
    <!-- Base directory for data storage -->
    <path>/var/lib/clickhouse/</path>
</clickhouse>

Each setting has defaults, but writing them out explicitly makes server behavior predictable across different environments.

YAML equivalent

The same minimal configuration in YAML format looks cleaner to some people:

clickhouse:
  logger:
    level: information
    console: 1
  
  http_port: 8123
  tcp_port: 9000
  
  path: /var/lib/clickhouse/

YAML's indentation-based structure eliminates closing tags. XML remains more common in ClickHouse deployments, especially where configuration management tools already use XML.

Core sections you must know

Production ClickHouse setups configure several key sections beyond the minimal example. These sections control logging, networking, clustering, storage, compression, and replication.

1. `<logger>`

The logger section controls where ClickHouse writes logs and how much detail to capture. Log levels range from trace (most verbose) to error (least verbose), with information being a solid default for production.

<logger>
    <level>information</level>
    <log>/var/log/clickhouse-server/clickhouse-server.log</log>
    <errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog>
    <size>100M</size>
    <count>10</count>
</logger>

The size and count settings control log rotation, keeping your 10 most recent log files at 100MB each. For debugging performance problems, temporarily switching to debug or trace provides more detail, though these levels generate significantly more log data.

2. `<listen_host>` and network ports

Network configuration determines which interfaces ClickHouse binds to and which ports it listens on. By default, ClickHouse only accepts connections from localhost, protecting against accidental exposure.

<listen_host>0.0.0.0</listen_host>
<http_port>8123</http_port>
<tcp_port>9000</tcp_port>
<interserver_http_port>9009</interserver_http_port>

Setting listen_host to 0.0.0.0 allows connections from any network interface. In production, combine this with firewall rules or security groups to restrict access. The interserver_http_port handles communication between ClickHouse servers in a cluster for replication and distributed queries.

3. `<remote_servers>` or `<clusters>`

The remote_servers section defines cluster topology for distributed queries and replicated tables. Each cluster contains one or more shards, and each shard can have multiple replicas for high availability.

<remote_servers>
    <production_cluster>
        <shard>
            <replica>
                <host>clickhouse-01.example.com</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>clickhouse-02.example.com</host>
                <port>9000</port>
            </replica>
        </shard>
    </production_cluster>
</remote_servers>

This creates a cluster named production_cluster with a single shard containing two replicas. Queries using the Distributed table engine or ON CLUSTER syntax reference this cluster name to execute across multiple servers. Operating large-scale ClickHouse clusters requires careful attention to shard distribution and replica configuration.

4. `<storage_configuration>`

Storage configuration defines disks, volumes, and policies that control where ClickHouse stores data. This becomes important when you want different storage types for hot and cold data or tiered storage strategies.

<storage_configuration>
    <disks>
        <default>
            <path>/var/lib/clickhouse/</path>
        </default>
        <s3_disk>
            <type>s3</type>
            <endpoint>https://s3.amazonaws.com/my-bucket/clickhouse/</endpoint>
            <access_key_id from_env="AWS_ACCESS_KEY_ID"/>
            <secret_access_key from_env="AWS_SECRET_ACCESS_KEY"/>
        </s3_disk>
    </disks>
    <policies>
        <tiered>
            <volumes>
                <hot>
                    <disk>default</disk>
                </hot>
                <cold>
                    <disk>s3_disk</disk>
                </cold>
            </volumes>
        </tiered>
    </policies>
</storage_configuration>

This example defines a tiered storage policy with local disk for hot data and S3 for cold data. Tables can reference the tiered policy to automatically move older data to cheaper storage as it ages.

5. `<compression>`

Compression settings control which algorithms ClickHouse uses for different data types. Proper compression configuration reduces storage costs and improves query performance by reducing I/O.

<compression>
    <case>
        <method>lz4</method>
    </case>
    <case>
        <min_part_size>10000000</min_part_size>
        <min_part_size_ratio>0.01</min_part_size_ratio>
        <method>zstd</method>
        <level>3</level>
    </case>
</compression>

LZ4 offers the fastest decompression speed, making it good for frequently accessed data. ZSTD provides better compression ratios at the cost of slightly slower decompression, which works well for larger data parts queried less often.

6. `<zookeeper>` for replication

The zookeeper section configures coordination services for replicated tables. ClickHouse uses ZooKeeper or ClickHouse Keeper to maintain consistency across replicas and coordinate distributed operations.

<zookeeper>
    <node>
        <host>keeper-01.example.com</host>
        <port>9181</port>
    </node>
    <node>
        <host>keeper-02.example.com</host>
        <port>9181</port>
    </node>
    <node>
        <host>keeper-03.example.com</host>
        <port>9181</port>
    </node>
</zookeeper>

A ZooKeeper ensemble typically consists of three or five nodes for fault tolerance. ClickHouse Keeper is recommended to replace ZooKeeper for ClickHouse clusters, using the same configuration format and offering better integration with ClickHouse-specific operations.

Settings that can be reloaded without restart

Some configuration changes take effect immediately when you run SYSTEM RELOAD CONFIG, while others require a full server restart. Knowing which settings support hot reloading helps you make changes without downtime.

Settings that can be reloaded include:

Logger configuration: Log levels and file paths
Compression settings: Algorithm and level changes
User settings: Permissions and quotas from users.xml
Dictionary definitions: External dictionary configurations
Storage policies: New disks and volumes (with limitations)

Changes to network ports, memory limits, and core server paths always require a restart. When in doubt, check the ClickHouse documentation for your specific version, as hot reload capabilities have expanded over time.

Live reload command

The SYSTEM RELOAD CONFIG command tells ClickHouse to reprocess its configuration files and apply changes that support hot reloading:

SYSTEM RELOAD CONFIG;

After running this command, check the server logs to confirm which settings were updated. If your changes don't appear to take effect, the setting likely requires a restart, or there's a syntax error preventing the configuration from loading.

Sections requiring full restart

Several important configuration sections cannot be changed without restarting the ClickHouse server. These include network settings, memory limits, and storage paths.

Plan configuration changes to these sections during maintenance windows:

Network configuration: Changes to ports and listen addresses
Memory limits: Max memory usage and buffer sizes
Storage paths: Data directory and metadata locations
Interserver communication: Cluster and replication ports
Core server settings: Thread pools and background task limits

Testing changes in a development environment first helps catch problems before they affect production systems.

Common issues and how to avoid them

Configuration mistakes can prevent ClickHouse from starting or cause subtle performance problems that only appear under load. Here are the most frequent issues that trip up both new and experienced users.

Wrong path to `users.xml`

The users.xml file defines user accounts, passwords, and access permissions. By default, ClickHouse looks for this file in the same directory as config.xml, but you can override this location:

<users_config>/etc/clickhouse-server/users.xml</users_config>

If ClickHouse can't find the users configuration, it won't start and will log an error about missing user definitions. Using absolute paths rather than relative paths prevents problems when ClickHouse starts from different working directories.

Too-low `max_open_files`

ClickHouse can open thousands of files simultaneously when processing queries across many data parts. Operating system limits on open file descriptors often default to values too low for production workloads.

<max_open_files>262144</max_open_files>

You'll also need to increase the system limit using ulimit or systemd configuration. If ClickHouse hits the file descriptor limit, queries fail with "Too many open files" errors, which can be difficult to diagnose without checking system logs.

Misconfigured disks or volumes

Storage configuration errors often prevent ClickHouse from starting or cause data to be written to unexpected locations. Common mistakes include incorrect paths, missing directories, and insufficient permissions.

<disks>
    <default>
        <path>/var/lib/clickhouse/</path>
        <keep_free_space_bytes>10737418240</keep_free_space_bytes>
    </default>
</disks>

The keep_free_space_bytes setting reserves disk space for system operations and prevents ClickHouse from completely filling the disk. The ClickHouse process needs write permissions to all configured disk paths, which you can verify before starting the server.

Step-by-step workflow to edit test and deploy config changes

Changing ClickHouse configuration in production requires a careful process to avoid downtime and data loss. This workflow balances safety with the ability to iterate quickly on configuration improvements.

1. Back up current config

Before making any changes, copy your existing configuration files to a backup location with a timestamp:

sudo cp -r /etc/clickhouse-server /etc/clickhouse-server.backup.$(date +%Y%m%d_%H%M%S)

This creates a dated backup you can restore if the new configuration causes problems. Store backups outside the configuration directory to prevent them from being processed as active configuration files.

2. Validate locally with `clickhouse-server --config-file`

Test your configuration changes in a development environment before deploying to production. ClickHouse provides a validation mode that checks syntax without starting the full server:

clickhouse-server --config-file=/etc/clickhouse-server/config.xml --daemon

Watch the logs for startup errors or warnings about deprecated settings. When configuration changes affect table structures, you'll need strategies for handling schema migrations in production without disrupting data ingestion. If the server starts successfully, run a few test queries to verify the configuration behaves as expected under load.

3. Commit to `git` and trigger CI

Version control for configuration files provides an audit trail of changes and enables automated testing. Store your ClickHouse configurations in a git repository with a CI pipeline that validates syntax:

# Example GitHub Actions workflow
- name: Validate ClickHouse config
  run: |
    docker run --rm -v $(pwd):/config clickhouse/clickhouse-server \
      clickhouse-server --config-file=/config/config.xml --dry-run

Automated validation catches syntax errors before they reach production. Some teams also run integration tests that start a ClickHouse container with the new configuration and execute representative queries.

4. Roll out and monitor metrics

Deploy configuration changes incrementally, starting with a single server in your cluster if possible. Monitor key metrics like query latency, memory usage, and error rates for at least 30 minutes before proceeding to additional servers.

If you're using a load balancer, you can take servers out of rotation one at a time, update their configuration, restart them, verify they're healthy, and then add them back to the pool. This rolling deployment approach prevents total service interruption if a configuration change causes unexpected problems.

Avoid `config.xml` concerns and try Tinybird

Managing ClickHouse configuration becomes increasingly complex as your deployment grows, leading many teams to consider managed ClickHouse services instead.

Self-hosting ClickHouse means you're responsible for choosing compression algorithms, configuring storage tiers, tuning memory limits, and adjusting thread pools. These decisions require deep knowledge of ClickHouse internals and ongoing attention as your data volume and query patterns change, which is why understanding the differences between self-hosted and managed ClickHouse solutions becomes important.

Tinybird eliminates this configuration burden by providing a fully managed ClickHouse service with pre-optimized settings. The platform handles cluster configuration, storage policies, and performance tuning automatically, letting you focus on building features instead of managing infrastructure. Sign up for a free Tinybird account to see how quickly you can start querying data without touching a single configuration file.

Frequently asked questions about ClickHouse config.xml

Can I store ClickHouse secrets in environment variables?

Yes, use substitution syntax like <password from_env="DB_PASSWORD"/> in your config.xml file. This keeps sensitive data out of configuration files and works well with container orchestration systems that inject environment variables at runtime.

How do I migrate my existing `config.xml` to a managed service?

Export your current settings using SHOW CREATE TABLE and SELECT * FROM system.settings queries. Most managed services handle configuration automatically, so you typically only need to migrate your data and table schemas rather than recreating your entire config.xml.

Is it safe to symlink config files in Kubernetes?

Yes, but use ConfigMaps and mounted volumes instead of symlinks for better container orchestration. This approach provides better version control and rollback capabilities in Kubernetes environments, and it integrates more cleanly with tools like Helm and Kustomize.

Skip the infra work. Deploy your first ClickHouse^®
project now.

Our Columns:

Skip the infra work. Deploy your first ClickHouse^®
project now.

Skip the infra work. Deploy your first ClickHouse^®
project now.

How to set up your ClickHouse^® config.xml file (with examples)

What is `config.xml` and where does ClickHouse look for it

Folder structure and include hierarchy

`<include_from>` mechanism

Precedence between XML and YAML

`preprocessed_configs` explained

The smallest working `config.xml` example

Example with comments

YAML equivalent

Core sections you must know

1. `<logger>`

2. `<listen_host>` and network ports

3. `<remote_servers>` or `<clusters>`

4. `<storage_configuration>`

5. `<compression>`

6. `<zookeeper>` for replication

Settings that can be reloaded without restart

Live reload command

Sections requiring full restart

Common issues and how to avoid them

Wrong path to `users.xml`

Too-low `max_open_files`

Misconfigured disks or volumes

Step-by-step workflow to edit test and deploy config changes

1. Back up current config

2. Validate locally with `clickhouse-server --config-file`

3. Commit to `git` and trigger CI

4. Roll out and monitor metrics

Avoid `config.xml` concerns and try Tinybird

Frequently asked questions about ClickHouse config.xml

Can I store ClickHouse secrets in environment variables?

How do I migrate my existing `config.xml` to a managed service?

Is it safe to symlink config files in Kubernetes?

Skip the infra work. Deploy your first ClickHouse^®
project now.

Skip the infra work. Deploy your first ClickHouse® project now.

Our Columns:

Skip the infra work. Deploy your first ClickHouse® project now.

Skip the infra work. Deploy your first ClickHouse® project now.

How to set up your ClickHouse® config.xml file (with examples)

What is config.xml and where does ClickHouse look for it

Folder structure and include hierarchy

<include_from> mechanism

Precedence between XML and YAML

preprocessed_configs explained

The smallest working config.xml example

Example with comments

YAML equivalent

Core sections you must know

1. <logger>

2. <listen_host> and network ports

3. <remote_servers> or <clusters>

4. <storage_configuration>

5. <compression>

6. <zookeeper> for replication

Settings that can be reloaded without restart

Live reload command

Sections requiring full restart

Common issues and how to avoid them

Wrong path to users.xml

Too-low max_open_files

Misconfigured disks or volumes

Step-by-step workflow to edit test and deploy config changes

1. Back up current config

2. Validate locally with clickhouse-server --config-file

3. Commit to git and trigger CI

4. Roll out and monitor metrics

Avoid config.xml concerns and try Tinybird

Frequently asked questions about ClickHouse config.xml

Can I store ClickHouse secrets in environment variables?

How do I migrate my existing config.xml to a managed service?

Is it safe to symlink config files in Kubernetes?

Skip the infra work. Deploy your first ClickHouse® project now.

Skip the infra work. Deploy your first ClickHouse^®
project now.

Skip the infra work. Deploy your first ClickHouse^®
project now.

Skip the infra work. Deploy your first ClickHouse^®
project now.

How to set up your ClickHouse^® config.xml file (with examples)

What is `config.xml` and where does ClickHouse look for it

`<include_from>` mechanism

`preprocessed_configs` explained

The smallest working `config.xml` example

1. `<logger>`

2. `<listen_host>` and network ports

3. `<remote_servers>` or `<clusters>`

4. `<storage_configuration>`

5. `<compression>`

6. `<zookeeper>` for replication

Wrong path to `users.xml`

Too-low `max_open_files`

2. Validate locally with `clickhouse-server --config-file`

3. Commit to `git` and trigger CI

Avoid `config.xml` concerns and try Tinybird

How do I migrate my existing `config.xml` to a managed service?

Skip the infra work. Deploy your first ClickHouse^®
project now.