Suppose you have a very large CSV file, and youâd like to extract some insights from it. You could use CLI tools like csvkit, clickhouse-local or q, but maybe you donât want to install another program to run a simple query, or you want a tool that is more visual or interactive.
Using something like Excel is also not an option, as the experience becomes painful once your CSV grows to tens of thousands of rows or more. Some websites will let you upload a larger CSV, but they can be clunky and slow and likely wonât support the upload at a certain file size.
Tinybird lets youâŠ
- Upload CSVs in seconds from your computer or a remote URL (e.g. S3 bucket, Dropbox, Google Cloud Storage, etc...).
- Write SQL queries in your browser and performing compute-intensive operations like joins and aggregations on the data in milliseconds (it uses ClickHouse under the hood).
- And share the results with other people via snapshots or dynamic API endpoints.
Querying a CSV file online with SQL
Letâs take, for example, this repo containing Crunchbase data about startup investments from 2015 available in a nice and clean CSV format. Particularly, we'll use the investments.csv file and make some queries on it.
As you can see from the above screencast, it takes less than a minute to upload and query a CSV using SQL in Tinybird.
Want to try this yourself? Follow these steps:
- Create a free Tinybird account here.
- Navigate to your Tinybird dashboard and click "Add Data Source".
- Choose "Remote URL" as your Data Source (or if you're using a local file, select "File Upload".
- Copy the URL of the CSV file and paste it into the Tinybird UI. Click "Add".
- Tinybird will analyze the file and automatically infer the column data types. Modify your schema if you'd like and click âCreate Data Sourceâ. Tinybird will ingest the CSV file into a Data Source table.
- Click âCreate Pipeâ too start building your SQL queries. You can use Tinybird Pipes to break your queries into smaller, simpler nodes. Each subsequent node can query over prior nodes. This keeps things clean and simple and avoids nested CTEs and the like.
Hereâs the query I made in the video in case you want to use it as a starting point.
Joining two CSV files
The previous dataset doesnât contain specific data about countries. So if you wanted to know, for example, the ratio of startup funding received for a country versus the population of of that country, youâd have to join the data from the file you just uploaded with another one like this.
Here we get the latest data available for each country:
And here we join the investments data with the country data:
Creating API endpoints from CSV data
Tinybird makes it easy to publish the results of your SQL queries as HTTP endpoints in a click, with options for returning the data as CSV, JSON, NDSJON, or Parquet. Just click âCreate API Endpointâ, select the node you want to publish, and youâll have your endpoint.
If you're new to Tinybird, you can sign up for free. The Tinybird Build Plan is free forever, with no time limit and no credit card required. You can store up to 10 GB of data and query your CSVs as much as you want. Once you publish APIs from your queries, the Build Plan includes up to 1,000 requests per day free of charge.