ClickHouse stores and analyzes massive datasets at high speed. Accessing that data from your Python application requires a client library that can handle the connection protocol and query execution. Two main Python clients exist for ClickHouse: the official clickhouse-connect
library and the community-maintained clickhouse-driver
, each with different protocols, features, and tradeoffs.
Installing a Python driver for ClickHouse
To talk to ClickHouse from Python, you need a client library that handles the connection and sends your queries to the database. Two main options exist: clickhouse-connect
, the official client from ClickHouse, Inc., and clickhouse-driver
, a community-built alternative that uses a different protocol.
Both libraries let you connect to databases, run queries, and insert data. Your choice depends on which protocol you prefer and what features you need. If you're working with Java instead, see our complete ClickHouse Java integration guide with JDBC examples.
pip install clickhouse-connect
The official ClickHouse client installs with this command:
pip install clickhouse-connect
This library talks to ClickHouse over HTTP and works with Python 3.8 or newer. ClickHouse, Inc. maintains it directly, so updates align with new ClickHouse releases.
The community-maintained client installs similarly:
pip install clickhouse-driver
This library uses ClickHouse's native TCP protocol and supports Python 3.7 or newer. The community has maintained it for several years with stable releases, demonstrating the highest repository issue activity among similar OLAP projects.
Comparing clickhouse-connect and clickhouse-driver
The two clients differ in how they communicate with ClickHouse and what features they include. Here's what sets them apart:
Feature | clickhouse-connect | clickhouse-driver |
---|---|---|
Protocol | HTTP/HTTPS | Native TCP |
Maintainer | ClickHouse, Inc. | Community |
Pandas support | Built-in | Manual conversion |
Async support | Included | Separate package |
Firewall friendly | Yes (port 8123/8443) | Requires port 9000/9440 |
Protocol support and performance differences
clickhouse-connect
uses HTTP, which means it works over standard web ports. This makes it easier to use when you're behind a firewall or connecting through a proxy. clickhouse-driver
uses ClickHouse's native TCP protocol, which can be faster for large result sets because it uses a more compact binary format.
Connection pooling works automatically in clickhouse-connect
, while clickhouse-driver
requires manual setup if you want pooling.
Feature gaps and ecosystem maturity
Both clients handle the core operations you'll use daily, but they differ in convenience features. clickhouse-connect
includes pandas DataFrame support through its query_df()
method. With clickhouse-driver
, you convert query results to DataFrames manually using pd.DataFrame()
.
For async operations, clickhouse-connect
includes everything in the main package. clickhouse-driver
requires installing the separate clickhouse-driver[asyncio]
package.
Creating a connection with Python
Connecting to ClickHouse involves three steps: importing the client library, creating a client with your connection details, and optionally testing that the connection works.
1. Import the client library
For clickhouse-connect
, import the module and call get_client()
:
import clickhouse_connect
client = clickhouse_connect.get_client(host='localhost', username='default', password='')
For clickhouse-driver
, import the Client
class:
from clickhouse_driver import Client
client = Client(host='localhost', user='default', password='')
2. Instantiate the client with host, port, and auth
When connecting to a local ClickHouse instance, use localhost
with the default ports. For cloud deployments, use the hostname provided by your cloud service and enable secure connections.
Here's a local connection with clickhouse-connect
:
client = clickhouse_connect.get_client(
host='localhost',
port=8123,
username='default',
password='',
database='default'
)
For Tinybird with clickhouse-connect
:
client = clickhouse_connect.get_client(
host='clickhouse.us-east-1.aws.tinybird.co',
port=7182,
username='default',
password='TINYBIRD_TOKEN',
secure=True
)
With clickhouse-driver
, a cloud connection looks like this:
client = Client(
host='clickhouse.us-east-1.aws.tinybird.co',
port=7182,
username='default',
password='TINYBIRD_TOKEN',
secure=True
)
3. Validate the connection with a ping
Before running queries, test that your connection works. With clickhouse-connect
:
if client.ping():
print("Connection successful")
With clickhouse-driver
, run a simple query:
result = client.execute('SELECT 1')
print(f"Connection successful: {result}")
If the connection fails, you'll see an error message showing what went wrong, like authentication problems, network timeouts, or incorrect hostnames.
Running your first select query
Once connected, you can run SQL queries and get results back. The format of the results depends on which client you're using and which method you call.
Simple select returning tuples
Both clients return query results as lists of tuples by default. Here's a basic query with clickhouse-connect
:
result = client.query('SELECT number, number * 2 FROM system.numbers LIMIT 5')
for row in result.result_rows:
print(row)
This prints tuples like (0, 0)
, (1, 2)
, (2, 4)
, and so on.
With clickhouse-driver
, the syntax changes slightly:
result = client.execute('SELECT number, number * 2 FROM system.numbers LIMIT 5')
for row in result:
print(row)
Parameterized select returning a pandas DataFrame
Parameterized queries prevent SQL injection by separating your query structure from user input. With clickhouse-connect
, you can pass parameters and get results as a pandas DataFrame:
result = client.query_df(
'SELECT number, number * {multiplier:Int32} FROM system.numbers WHERE number < {limit:Int32}',
parameters={'multiplier': 3, 'limit': 5}
)
print(result)
The parameter syntax uses curly braces with the parameter name and its ClickHouse type. For clickhouse-driver
, you pass parameters as a dictionary:
import pandas as pd
result = client.execute(
'SELECT number, number * %(multiplier)s FROM system.numbers WHERE number < %(limit)s',
{'multiplier': 3, 'limit': 5}
)
df = pd.DataFrame(result, columns=['number', 'result'])
Inserting data in bulk from Python
ClickHouse performs best with bulk inserts rather than single-row operations. Both clients support batch insertion, though the syntax differs.
1. Prepare sample rows or a DataFrame
First, create your table structure and prepare your data. Here's a simple table:
client.command('''
CREATE TABLE IF NOT EXISTS events (
event_id String,
user_id String,
timestamp DateTime,
event_type String
) ENGINE = MergeTree()
ORDER BY (timestamp, event_id)
''')
Prepare data as a list of tuples:
data = [
('evt_1', 'user_100', '2024-01-15 10:30:00', 'login'),
('evt_2', 'user_101', '2024-01-15 10:31:00', 'pageview'),
('evt_3', 'user_100', '2024-01-15 10:35:00', 'logout')
]
2. Single insert for quick tests
For testing or tiny datasets, you can insert one row. With clickhouse-connect
:
client.insert('events', [['evt_4', 'user_102', '2024-01-15 11:00:00', 'signup']])
This approach works but creates overhead in ClickHouse's storage engine, so it's not efficient for production.
3. Batch insert with compression for speed
For production use, insert data in batches. With clickhouse-connect
:
client.insert('events', data, column_names=['event_id', 'user_id', 'timestamp', 'event_type'])
The client automatically compresses data before sending it to ClickHouse. For larger batches (10,000+ rows), this can cut insert time significantly.
With clickhouse-driver
, batch inserts look like this:
client.execute(
'INSERT INTO events (event_id, user_id, timestamp, event_type) VALUES',
data
)
A good batch size for most cases is between 10,000 and 100,000 rows, though ClickHouse can handle much more extreme ingestion rates with proper optimization.
Using async and threaded clients for high concurrency
When your application handles multiple concurrent requests, async or threaded database clients prevent blocking operations from slowing the application down.
Async example with clickhouse-connect
The async version of clickhouse-connect
allows non-blocking database operations:
import asyncio
import clickhouse_connect
async def fetch_data():
client = await clickhouse_connect.get_async_client(
host='localhost',
username='default',
password=''
)
result = await client.query('SELECT count() FROM events')
print(result.result_rows[0][0])
await client.close()
asyncio.run(fetch_data())
Thread pool example with clickhouse-driver
For thread-based concurrency, create a connection pool that manages multiple database connections:
from clickhouse_driver import Client
from concurrent.futures import ThreadPoolExecutor
def execute_query(query):
client = Client(host='localhost')
result = client.execute(query)
client.disconnect()
return result
with ThreadPoolExecutor(max_workers=5) as executor:
queries = ['SELECT count() FROM events WHERE event_type = "login"' for _ in range(10)]
results = executor.map(execute_query, queries)
Each thread gets its own connection, preventing concurrent access issues.
Handling auth, TLS, and network timeouts
Production deployments require proper security configuration and timeout handling to prevent connection failures and protect data.
Setting credentials with environment variables
Hardcoding credentials in your code creates security risks. Store them in environment variables instead:
import os
import clickhouse_connect
client = clickhouse_connect.get_client(
host=os.environ.get('CLICKHOUSE_HOST', 'localhost'),
port=int(os.environ.get('CLICKHOUSE_PORT', 8123)),
username=os.environ.get('CLICKHOUSE_USER', 'default'),
password=os.environ.get('CLICKHOUSE_PASSWORD', '')
)
Enabling TLS to cloud endpoints
Cloud ClickHouse deployments typically require encrypted connections. With clickhouse-connect
, enable TLS by setting secure=True
:
client = clickhouse_connect.get_client(
host='your-instance.clickhouse.cloud',
port=8443,
username='default',
password='your_password',
secure=True
)
For clickhouse-driver
:
client = Client(
host='your-instance.clickhouse.cloud',
port=9440,
user='default',
password='your_password',
secure=True
)
Tuning query and socket timeouts
Long-running queries or network issues can cause your application to hang. Set timeouts to prevent this:
client = clickhouse_connect.get_client(
host='localhost',
connect_timeout=10,
send_receive_timeout=300
)
For analytical queries that might take longer, increase the send_receive_timeout
. For transactional workloads, keep timeouts shorter to fail fast.
Common ClickHouse Python errors and fixes
Certain errors appear frequently when working with ClickHouse from Python. Recognizing these patterns helps you debug faster.
Authentication code 516
This error appears when credentials are incorrect or the user lacks permissions:
Code: 516. DB::Exception: default: Authentication failed
Check that your username and password are correct. If connecting to ClickHouse Cloud, verify you're using the cloud credentials, not local defaults.
TLS handshake failures
When connecting to secure endpoints, you might see SSL certificate errors:
SSLError: [SSL: CERTIFICATE_VERIFY_FAILED]
This typically means the client can't verify the server's certificate. Update your SSL certificates:
pip install --upgrade certifi
Date parsing errors during insert
ClickHouse is strict about date and datetime formats. If you see parsing errors:
Code: 41. DB::Exception: Cannot parse DateTime
Ensure your datetime strings match ClickHouse's expected format: YYYY-MM-DD HH:MM:SS
. When inserting from pandas, convert datetime columns explicitly:
df['timestamp'] = pd.to_datetime(df['timestamp']).dt.strftime('%Y-%m-%d %H:%M:%S')
client.insert('events', df.values.tolist(), column_names=df.columns.tolist())
Using Tinybird for an API-first ClickHouse Python backend
Setting up and maintaining a ClickHouse cluster requires expertise in distributed systems, query optimization, and infrastructure management. Managed services handle this complexity, letting you focus on building application features.
Tinybird provides a managed ClickHouse platform designed for developers integrating ClickHouse into application backends. The platform includes built-in data ingestion, query optimization, and API generation.
With Tinybird, you can deploy ClickHouse-backed APIs in minutes rather than weeks. The platform handles cluster scaling, monitoring, backup management, and security configuration automatically. Tinybird also includes data pipelines as code, local development environments, and CI/CD integration.
Sign up for a free Tinybird plan to try managed ClickHouse without infrastructure work.
Frequently asked questions about ClickHouse Python
Can I use pandas with ClickHouse?
Yes, both major Python clients support pandas DataFrames. With clickhouse-connect
, use the query_df()
method to get results directly as a DataFrame, or use insert()
with DataFrame values. clickhouse-driver
requires manual conversion between query results and DataFrames using pd.DataFrame()
.
Does ClickHouse support prepared statements in Python?
ClickHouse Python clients use parameterized queries instead of traditional prepared statements. This approach provides similar protection against SQL injection by separating query structure from user input. Parameters are sent separately from the query text and properly escaped by the client library.
How do I stream data continuously from Python to ClickHouse?
Rather than streaming individual rows, collect data in batches and insert periodically. ClickHouse is optimized for bulk operations, so inserting 10,000 rows every few seconds performs better than inserting rows one at a time. You can use a buffer in your application to collect events and flush them to ClickHouse when the buffer reaches a size threshold or time limit.