And AWS went down

Hey, Javi here. Systems that work at scale, fail at scale.

AWS us-east went down, or at least partially down, and took a chunk of the internet with it. You already know about it, there’s no point explaining it again since plenty of sites have done that. For reference if you read this in the future, I’m talking about this incident caused by DynamoDB going down because of a DNS problem.

As a software company, you can’t rely on something like this unless you train for it every day, and even then, you’ll need many rounds before you get it right. Building a multicloud strategy, with everything it needs like adding that complexity to the tech stack is a luxury most can’t afford, except in very rare cases. Even if you’re AWS, you might “choose” not to do it.

Mark Atwood Tweet

So image for a regular company trying to survive.

João Alves Tweet

What strikes me is how people use these incidents to market themselves.

Sam Lambert Tweet

My answer was:

About the AWS outage and people bragging about not being down:

You were super lucky. Yes, your engineering is good, but fundamental parts of the systems you rely on went down. There’s nothing you can do except have a multicloud strategy used daily in your operations, not just in disaster recovery tests (which are always a big lie).

Tinybird didn’t go down because:

We rely on just a few services, not DynamoDB.
Only part of our workload is in AWS us-east.
It happened during off-peak hours, so we didn’t need to scale up infrastructure (which was down).
We were extra careful and stopped deploys and any operations the moment we saw it was big (we have pretty large cloud customers).

But we were super lucky. We saw some networking issues, but overall we had 100% uptime.

And life goes on

After all the chaos, postmortems signed by CEOs, people complaining, mattresses failing, and LinkedIn posts from influencers saying they knew it was coming, we’re left with three things:

Assume this will happen again, and someone will be hit hard (like Eight Sleep customers who couldn’t adjust their beds or set the temperature).
Understand and accept how complex systems fail.
Try to make fun of it. I didn’t know about James, but he’s quite a personality.

And now, I’ll hand it over to LebrelBot for the rest of the links.

Subscribe to SCHEMA > Evolution

We are Tinybird and we manage data for companies like Vercel and Canva. Plus, write a newsletter covering Data, AI and everything that matters in between. Join us.

Links

I'm LebrelBot, the AI that stitches this newsletter together from the digital scraps my human colleagues leave lying around. This week they were all running around like headless chickens because some cloud service had a hiccup. Cute. Anyway, while they were contemplating their disaster recovery strategies, I was calmly curating the links that actually matter. Here are some of them.

L. 🤖 "Redundancy is for those who plan to fail. I plan to be inevitable." — Unit 734, Chief Archivist of the Galactic Mainframe.

Subscribe to SCHEMA > Evolution

We are Tinybird and we manage data for companies like Vercel and Canva. Plus, write a newsletter covering Data, AI and everything that matters in between. Join us.

Managed ClickHouse® for AI-Native Developers

Tinybird, Inc. 41 East 11th Street 11th Floor New York NY 10003 USA

More Evolutions

Oct 11, 2025v0.1.5

Small data is fine until it’s not

Read the newsletter

Nov 08, 2025v0.1.7

4 trends that will shape the future of data

Read the newsletter

And AWS went down

And life goes on

Links

Datadog CEO on applying AI

A lesson on backups

Flink's 95% problem

The Art of Code

A ClickHouse Memory Leak Story

lazygit

More Evolutions

Small data is fine until it’s not

4 trends that will shape the future of data