Querying large CSVs online with SQL


There are times when you have a CSV file and you’d like to extract some insights from it. You could use some CLI tools like csvkit, clickhouse-local or q, but maybe you don’t want to install another program to run a simple query, or you want a tool that is more visual or interactive. Using something like Excel is also discarded as it will be a painful experience once you have some tens of thousands of rows to work with. Some websites let you do upload your CSV, but they’re kind of clunky and slow, and won’t support your file if it’s too large.
Tinybird lets you…
- Upload CSVs in seconds from your computer or an URL
- Make fast SQL queries (is uses ClickHouse underneath), joins and transformations on the data, in your browser
- And share the results via snapshots or dynamic endpoints with other people
Making SQL queries on a CSV file
Let’s take, for instance, this repo containing Crunchbase data about startup investments from 2015, in a nice and clean CSV format. Particularly, let’s take the investments.csv file and make some queries on it.
As you can see, it takes less than 1 minute to upload and query a CSV using SQL in Tinybird
To do it yourself, create an account here, go to your dashboard, click on the “Add Data Source” button and paste the URL of the CSV. The types of the columns will be inferred automatically and clicking “Continue”, the data will be imported. Then, by clicking on the “Create Pipe” button you can start making SQL queries on it
Here’s the query made on the video, in case you want to copy it directly to your account
Joining two CSVs
The previous dataset doesn’t contain data about countries. So if you wanted to get, for example, a ratio between the amount of funding the startups of a country receive VS the population of that country, you’d have to join the previous dataset with another one like, like this other one.
Here, we get the latest data available for each country
And here we join the investments CSV with the country data:
Note that you have to give an alias to the nodes you’re joining or you’ll get an error.
Creating API endpoints from your results
Tinybird also lets you expose the results as CSV or JSON endpoints. Just click on the green “Create API Endpoint” and you’ll be good to go.