Data ingestion

Common issues and solutions for data ingestion problems in ClickHouse and Tinybird.

Overview

This section covers troubleshooting for data ingestion issues, including format problems, parsing errors, and data validation issues.

Ingestion categories

JSON formatting

Common JSON ingestion issues:

  • CANNOT_PARSE_TEXT - Malformed JSON data
  • UNKNOWN_TYPE - ClickHouse can't infer types from JSON
  • Nested JSON objects - Complex nested structure handling
  • Array handling - JSON arrays causing parsing issues

View JSON formatting troubleshooting →

CSV type issues

Common CSV ingestion issues:

  • CANNOT_PARSE_TEXT - Malformed CSV data or wrong delimiter
  • TYPE_MISMATCH - Non-numeric data in numeric columns
  • Missing headers - CSV files without column headers
  • Inconsistent delimiters - Mixed delimiters in CSV files

View CSV type issues troubleshooting →

Cannot parse date

Common date parsing issues:

  • CANNOT_PARSE_DATE - Date strings in unexpected format
  • CANNOT_PARSE_DATETIME - DateTime strings in unexpected format
  • Timezone issues - Dates with timezone information
  • Mixed date formats - Same column with different date formats

View cannot parse date troubleshooting →

Unexpected null

Common null value issues:

  • Missing data in source - Source data with missing values
  • Schema inference issues - ClickHouse inferring nulls from sample data
  • Data validation - Checking for null values in data
  • Handling strategies - Using COALESCE and CASE statements

View unexpected null troubleshooting →

Common patterns

Data validation

Strategies for validating ingested data:

  1. Check data formats - Validate JSON, CSV, and other formats
  2. Verify data types - Ensure data matches expected types
  3. Handle missing data - Provide defaults for missing values
  4. Monitor ingestion - Track ingestion success and failure rates

Schema handling

Best practices for schema design during ingestion:

  1. Use explicit schemas - Don't rely on automatic inference
  2. Handle mixed types - Use string types for mixed data
  3. Provide defaults - Use default values for missing data
  4. Validate early - Check data quality during ingestion

Best practices

  1. Validate data formats - Check JSON, CSV, and other formats before ingestion
  2. Use explicit schemas - Specify expected types in schema
  3. Handle missing data - Provide appropriate defaults
  4. Monitor ingestion quality - Track parsing errors and data quality
  5. Document data sources - Keep track of data source characteristics