Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Querying large CSVs online with SQL

Tinybird lets you query CSVs with hundreds of millions of rows with SQL right from your browser
Xoel López
Founder at TheirStack
Jan 19, 2021
 ・ 
  min read

There are times when you have a CSV file and you’d like to extract some insights from it. You could use some CLI tools like csvkit, clickhouse-local or q, but maybe you don’t want to install another program to run a simple query, or you want a tool that is more visual or interactive. Using something like Excel is also discarded as it will be a painful experience once you have some tens of thousands of rows to work with. Some websites let you do upload your CSV, but they’re kind of clunky and slow, and won’t support your file if it’s too large.

Tinybird lets you…

  • Upload CSVs in seconds from your computer or an URL
  • Make fast SQL queries (is uses ClickHouse underneath), joins and transformations on the data, in your browser
  • And share the results via snapshots or dynamic endpoints with other people

Making SQL queries on a CSV file

Let’s take, for instance, this repo containing Crunchbase data about startup investments from 2015, in a nice and clean CSV format. Particularly, let’s take the investments.csv file and make some queries on it.

As you can see, it takes less than 1 minute to upload and query a CSV using SQL in Tinybird

To do it yourself, create an account here, go to your dashboard, click on the “Add Data Source” button and paste the URL of the CSV. The types of the columns will be inferred automatically and clicking “Continue”, the data will be imported. Then, by clicking on the “Create Pipe” button you can start making SQL queries on it

Here’s the query made on the video, in case you want to copy it directly to your account

Joining two CSVs

The previous dataset doesn’t contain data about countries. So if you wanted to get, for example, a ratio between the amount of funding the startups of a country receive VS the population of that country, you’d have to join the previous dataset with another one like, like this other one.

Here, we get the latest data available for each country


And here we join the investments CSV with the country data:

Note that you have to give an alias to the nodes you’re joining or you’ll get an error.

Creating API endpoints from your results

Tinybird also lets you expose the results as CSV or JSON endpoints. Just click on the green “Create API Endpoint” and you’ll be good to go.

Become a better data developer

Subscribe to the tinytales newsletter for monthly tips on building better data products.