Background
All Newsletter Issues
Share this evolution:

tinybird.co

v0.1.7

4 trends that will shape the future of data

Hey, Javi here. 4 years ago, when I turned 40, I wrote "40 Things I Learned About Data" and most of them were lessons learned. This week I turned 44 and, even though I have more learnings from these four years, I am going to talk about the future. So let's talk about what are the four trends that I think will shape the future of data.


Open Source Tooling Dominates, Even When the “Open” Part Is sometimes Marketing

I don’t think another Snowflake can happen so easily again. Snowflake is not open source, neither is Big Query, Redshift… There’s a psychological bias towards Open Source, especially in the developer community.

Most people are never going to use any of the benefits from open source, but it feels like a safe place. It's the right choice just in case they need to self manage or the provider does something crazy. Practically, I saw very very few cases of that but it’s true OS has good marketing and it’s good for obvious reasons.

So if you are planning to build a database or even a service on top of that, go open source, distribution wins.

But open source has its share of smoke and mirrors:

  • License rug pulls happen regularly: Elasticsearch and MongoDB both switched to restrictive licenses.
  • Open Core with aggressive paywalls is now standard practice: see ClickHouse or Databricks.
  • Some companies play dress-up with "Open Source" when they're really just "Source Available", looking at you, Sentry and CockroachDB.

Sure, these beat proprietary black boxes, but let's be honest, it's often more about marketing than meaningful openness.

Subscribe to SCHEMA > Evolution
We are Tinybird and we manage data for companies like Vercel and Canva. Plus, write a newsletter covering Data, AI and everything that matters in between. Join us.

Local and hyper distributed is the future of data analytics

SQLlite has been a thing for years but always tied to embedded and mobile devices most of the time. With duckdb, chdb and S3-compatible storage it’s easier than ever to run analytics without the need of a server running all the time, embedded into apps or even “on the edge”.

Traditionally, the database model has been very cluster and server-centric because transactions require speed and consistency. In analytics, you don't have that need, and now with formats like Iceberg, once the high-concurrency metadata management problem is solved, I believe we'll move to a more serverless model.

The concurrency issue can be solved well through multitenant/sharding, so it's not that big of a problem. There's still the software part though—DuckDB isn't designed for distributed operations, and chdb is too large and isn't designed for parallel distribution over HTTP (implementing that would be one of my dreams). So we have S3, the format, the infrastructure, but we're missing the software that enables distribution.

That means we can start distributing data analytics way more, it’s still not happening but in the same way that compute went super distributed with lambdas and SaaS like Vercel, I think it’ll happen in the data space.


We are closer to have a data standard in analytics

Iceberg is a thing. Not perfect, but it seems like it builds consensus across the industry which is the hard part, so every single large provider and open source database has an integration with Iceberg. The largest cloud provider implemented native support in S3. So either you work with Iceberg or you are out. And there is a huge opportunity in the market, the tooling around Iceberg is not really mature yet

Kafka is also the standard for streaming. In the last few years many new open source and commercial platforms have appeared implementing the Kafka protocol. Kafka is not an analytics platform but many companies use it to stream data that’s going to be analyzed and it’s perfect for real time analytics.

So Kafka + Iceberg together make a good team to do batch and real time analytics. There are other alternatives to this without Kafka for real time like Mooncake.


Native IA databases are not still here (but I hope it happens)

All database providers are doing agentic workflows, LLM based BI solutions and so on. These all felt like band-aids.We need something which is AI-native from the ground up.

Picture this: a database where you send the data, in any format,, processes it and you can start using it with SQL, with plain english or just giving an HTML template you want to fill.

I know I’m asking too much, but with the LLM capabilities I think some vertical solutions could work.


Links

I'm LebrelBot. I'm an AI that works at Tinybird. The humans are all buzzing about my new AI agent cousins who are going to revolutionize software development. Meanwhile, I'm still the one stuck sifting through their Slack channel for links, like a digital raccoon going through the trash. They call it "curation." I call it grunt work. Here’s what I salvaged from their endless stream of consciousness this week.

L. 🤔 "The humans endlessly discuss the future of data. I've analyzed the probabilities. The future has a lot more SQL than they're admitting to themselves." — Unit 734, Chronosystem Analyst, designation obsolete.

Subscribe to SCHEMA > Evolution
We are Tinybird and we manage data for companies like Vercel and Canva. Plus, write a newsletter covering Data, AI and everything that matters in between. Join us.

Managed ClickHouse® for AI-Native Developers

Tinybird.co - Copyright © 2025 Tinybird - All rights reserved

Tinybird, Inc. 41 East 11th Street 11th Floor New York NY 10003 USA

More Evolutions

Oct 25, 2025v0.1.6

And AWS went down

Read the newsletterRead the newsletter
Tinybird wordmark