---
title: CANNOT_READ_ALL_DATA ClickHouse error
meta:
  description: Learn how to fix the CANNOT_READ_ALL_DATA error in ClickHouse and Tinybird. Understand what causes it and see examples of how to resolve it
---

# CANNOT_READ_ALL_DATA ClickHouse error

{% callout %}
This error occurs when ClickHouse cannot read all the data from a source during import or processing operations. It's common with corrupted data sources, incomplete transfers, or data format issues.
{% /callout %}

The `CANNOT_READ_ALL_DATA` error in ClickHouse (and Tinybird) happens when the system cannot read all the data from a source during import, processing, or data transfer operations. This typically occurs with corrupted data sources, incomplete data transfers, malformed data formats, or when data streams are interrupted unexpectedly.

## What causes this error

You'll typically see it when:

* Data source is corrupted or incomplete
* Data transfer is interrupted
* Malformed data format
* Insufficient permissions to read data
* Data source becomes unavailable during reading
* Network interruptions during data transfer
* File system issues
* Memory constraints during data processing
* Data source format changes unexpectedly

{% callout type="tip" %}
This error often indicates data source issues rather than ClickHouse problems. Check the data source integrity and connectivity.
{% /callout %}

## Example errors

```sql {% title="Fails: corrupted data source" %}
INSERT INTO events FROM INFILE '/path/to/corrupted_file.csv'
-- Error: CANNOT_READ_ALL_DATA
```

```sql {% title="Fails: incomplete data transfer" %}
-- When data source becomes unavailable
INSERT INTO users FROM INFILE '/path/to/remote_data.csv'
-- Error: CANNOT_READ_ALL_DATA
```

```sql {% title="Fails: malformed data format" %}
INSERT INTO metrics FROM INFILE '/path/to/malformed_data.json'
-- Error: CANNOT_READ_ALL_DATA
```

```sql {% title="Fails: interrupted data stream" %}
-- When streaming data is interrupted
INSERT INTO logs FROM INFILE '/path/to/streaming_data.csv'
-- Error: CANNOT_READ_ALL_DATA
```

## How to fix it

### Check data source integrity

Verify the data source is complete and accessible:

```sql {% title="Check data source" %}
-- Verify data source integrity
-- Example for Linux:
-- ls -la /path/to/your_data_file
-- file /path/to/your_data_file
-- md5sum /path/to/your_data_file
--
-- Example for Python:
-- import os
-- file_path = '/path/to/your_data_file'
-- if os.path.exists(file_path):
--     file_size = os.path.getsize(file_path)
--     print(f"File exists, size: {file_size} bytes")
-- else:
--     print("File does not exist")
```

### Verify data format

Check if the data format is correct:

```sql {% title="Verify format" %}
-- Validate data format before import
-- Example for CSV:
-- import csv
-- try:
--     with open('/path/to/your_file.csv', 'r') as f:
--         reader = csv.reader(f)
--         for i, row in enumerate(reader):
--             if i == 0:  # Check header
--                 print(f"Header: {row}")
--             if i < 5:   # Check first few rows
--                 print(f"Row {i}: {row}")
-- except Exception as e:
--     print(f"Format error: {e}")
```

### Check permissions

Verify access permissions to the data source:

```sql {% title="Check permissions" %}
-- Check file permissions
-- Example for Linux:
-- ls -la /path/to/your_data_file
--
-- Example for Python:
-- import os
-- file_path = '/path/to/your_data_file'
-- if os.access(file_path, os.R_OK):
--     print("File is readable")
-- else:
--     print("File is not readable")
```

### Retry the operation

Attempt the data reading operation again:

```sql {% title="Retry operation" %}
-- Try reading the data again
INSERT INTO events FROM INFILE '/path/to/your_data_file.csv'
FORMAT CSV
SETTINGS
    input_format_allow_errors_num = 10,
    input_format_allow_errors_ratio = 0.1
```

## Common patterns and solutions

### Data source validation

Implement validation before processing:

```sql {% title="Data validation" %}
-- Validate data source before processing
-- Example pseudo-code:
--
-- def validate_data_source(file_path):
--     try:
--         # Check file exists
--         if not os.path.exists(file_path):
--             return False, "File does not exist"
--
--         # Check file size
--         file_size = os.path.getsize(file_path)
--         if file_size == 0:
--             return False, "File is empty"
--
--         # Check file permissions
--         if not os.access(file_path, os.R_OK):
--             return False, "File is not readable"
--
--         # Check file format
--         validate_file_format(file_path)
--
--         return True, "Data source is valid"
--     except Exception as e:
--         return False, f"Validation error: {e}"
```

### Error-tolerant import

Use error-tolerant import settings:

```sql {% title="Error-tolerant import" %}
-- Import with error tolerance
INSERT INTO events FROM INFILE '/path/to/your_data_file.csv'
FORMAT CSV
SETTINGS
    input_format_allow_errors_num = 100,        -- Allow up to 100 errors
    input_format_allow_errors_ratio = 0.05,     -- Allow up to 5% errors
    input_format_skip_unknown_fields = 1,       -- Skip unknown fields
    input_format_null_as_default = 1            -- Use defaults for NULL values
```

### Incremental processing

Process data in smaller chunks:

```sql {% title="Incremental processing" %}
-- Process data in smaller chunks
-- Example pseudo-code:
--
-- def process_in_chunks(file_path, chunk_size=10000):
--     with open(file_path, 'r') as f:
--         header = f.readline()  # Skip header
--
--         chunk = []
--         for i, line in enumerate(f):
--             chunk.append(line)
--
--             if len(chunk) >= chunk_size:
--                 # Process chunk
--                 process_chunk(chunk)
--                 chunk = []
--
--         # Process remaining lines
--         if chunk:
--             process_chunk(chunk)
```

### Data source monitoring

Monitor data source health:

```sql {% title="Data source monitoring" %}
-- Monitor data source health
-- Example pseudo-code:
--
-- class DataSourceMonitor:
--     def __init__(self, data_source_path):
--         self.data_source_path = data_source_path
--         self.last_check = None
--         self.last_size = 0
--
--     def check_health(self):
--         try:
--             if not os.path.exists(self.data_source_path):
--                 return False, "Data source not found"
--
--             current_size = os.path.getsize(self.data_source_path)
--             if current_size == 0:
--                 return False, "Data source is empty"
--
--             if self.last_size > 0 and current_size < self.last_size:
--                 return False, "Data source size decreased"
--
--             self.last_size = current_size
--             self.last_check = time.time()
--
--             return True, "Data source is healthy"
--         except Exception as e:
--             return False, f"Health check failed: {e}"
```

## Tinybird-specific notes

In Tinybird, CANNOT_READ_ALL_DATA errors often occur when:

* Data Source imports are interrupted
* External data sources become unavailable
* Pipe transformations encounter data issues
* Data streaming operations are interrupted
* External API endpoints fail

To debug in Tinybird:

1. Check Data Source import status
2. Verify external data source connectivity
3. Review Pipe transformation logs
4. Check external API endpoint health

{% callout type="tip" %}
In Tinybird, use the Data Source preview to validate data before processing it in Pipes.
{% /callout %}

## Best practices

### Data source management

* Validate data sources before processing
* Implement data source health monitoring
* Use error-tolerant import settings
* Monitor data source availability

### Error handling

* Implement retry logic for failed operations
* Log data source issues for debugging
* Provide clear error messages
* Handle partial data failures gracefully

### Data quality

* Validate data format before processing
* Check for data corruption indicators
* Implement data quality checks
* Monitor import success rates

## Configuration options

### Import settings

```sql {% title="Import configuration" %}
-- Configure import behavior
SET input_format_allow_errors_num = 100;
SET input_format_allow_errors_ratio = 0.05;
SET input_format_skip_unknown_fields = 1;
SET input_format_null_as_default = 1;
```

### Data processing settings

```sql {% title="Data processing configuration" %}
-- Configure data processing
SET max_insert_block_size = 1000000;
SET min_insert_block_size_rows = 1000;
SET min_insert_block_size_bytes = 1000000;
```

### Error handling settings

```sql {% title="Error handling configuration" %}
-- Configure error handling
SET max_errors_to_log = 1000;
SET log_error_rate_prob = 1.0;
SET log_errors = 1;
```

## Alternative solutions

### Use streaming import

Import data in streaming mode:

```sql {% title="Streaming import" %}
-- Use streaming import for large data sources
INSERT INTO events FROM INFILE '/path/to/your_data_file.csv'
FORMAT CSV
SETTINGS
    input_format_parallel_parsing = 1,
    max_insert_block_size = 100000,
    min_insert_block_size_rows = 1000
```

### Implement data source fallback

Use fallback data sources:

```sql {% title="Data source fallback" %}
-- Implement fallback data sources
-- Example pseudo-code:
--
-- def get_data_with_fallback(primary_source, fallback_sources):
--     try:
--         return read_data_source(primary_source)
--     except CannotReadAllData:
--         for fallback_source in fallback_sources:
--             try:
--                 return read_data_source(fallback_source)
--             except CannotReadAllData:
--                 continue
--
--         raise Exception("All data sources failed")
```

### Use data validation

Implement data validation before processing:

```sql {% title="Data validation" %}
-- Validate data before processing
-- Example pseudo-code:
--
-- def validate_and_process_data(file_path):
--     # Validate data source
--     is_valid, message = validate_data_source(file_path)
--     if not is_valid:
--         logger.error(f"Data source validation failed: {message}")
--         return False
--
--     # Process data
--     try:
--         process_data_file(file_path)
--         return True
--     except CannotReadAllData as e:
--         logger.error(f"Data processing failed: {e}")
--         return False
```

## Monitoring and prevention

### Data source monitoring

```sql {% title="Data source tracking" %}
-- Monitor data source operations
-- Example pseudo-code:
--
-- def track_data_source_operation(operation, data_source, status, error=None):
--     logger.info(f"Data source operation: {operation} on {data_source}")
--     logger.info(f"Status: {status}")
--
--     if error:
--         logger.error(f"Data source error: {error}")
--
--     # Track operation metrics
--     increment_counter('data_source_operations', {
--         'operation': operation,
--         'data_source': data_source,
--         'status': status,
--         'error': error
--     })
```

### Data quality monitoring

```sql {% title="Data quality tracking" %}
-- Monitor data quality metrics
-- Example pseudo-code:
--
-- class DataQualityMonitor:
--     def __init__(self):
--         self.quality_checks = []
--
--     def check_data_quality(self, data_source):
--         try:
--             # Check data completeness
--             completeness = self.check_completeness(data_source)
--
--             # Check data consistency
--             consistency = self.check_consistency(data_source)
--
--             # Check data validity
--             validity = self.check_validity(data_source)
--
--             quality_score = (completeness + consistency + validity) / 3
--
--             quality_status = {
--                 'data_source': data_source,
--                 'completeness': completeness,
--                 'consistency': consistency,
--                 'validity': validity,
--                 'quality_score': quality_score,
--                 'timestamp': time.time()
--             }
--
--             self.quality_checks.append(quality_status)
--             return quality_status
--
--         except Exception as e:
--             quality_status = {
--                 'data_source': data_source,
--                 'error': str(e),
--                 'timestamp': time.time()
--             }
--             self.quality_checks.append(quality_status)
--             return quality_status
```

### Proactive monitoring

```sql {% title="Proactive monitoring" %}
-- Implement proactive data source monitoring
-- Example pseudo-code:
--
-- def monitor_data_sources():
--     data_sources = get_data_source_list()
--
--     for data_source in data_sources:
--         try:
--             # Check data source health
--             is_healthy, message = check_data_source_health(data_source)
--
--             if not is_healthy:
--                 send_alert(f"Data source {data_source} is unhealthy: {message}")
--
--             # Check data source size
--             current_size = get_data_source_size(data_source)
--             if current_size == 0:
--                 send_alert(f"Data source {data_source} is empty")
--
--         except Exception as e:
--             send_alert(f"Failed to monitor data source {data_source}: {e}")
```

## See also

* [Error Handling](/forward/dev-reference/list-of-errors)
* [Data Import](/forward/get-data-in/data-sources)
* [Test and deploy](/forward/test-and-deploy)