---
title: "Investigating Performance Bottlenecks With SQL & Statistics"
excerpt: "We investigate performance bottlenecks using our own product and Google Sheets. Dogfooding reveals problems customers never report."
authors: "Xoel Lopez"
categories: "Scalable Analytics Architecture"
createdOn: "2023-09-25 00:00:00"
publishedOn: "2020-12-15 00:00:00"
updatedOn: "2025-04-24 00:00:00"
status: "published"
---

<p>We use ClickHouse® under the hood, and it lets some of our customers query hundreds of billions of rows per hour in real time.</p><figure class="kg-card kg-embed-card"><blockquote class="twitter-tweet"><p lang="en" dir="ltr">The truth is we read ~85B rows/h on November 27th, with peaks of ~200B rows/h<br><br>So the winners are... <a href="https://twitter.com/manutazogm?ref_src=twsrc%5Etfw">@manutazogm</a>, <a href="https://twitter.com/_mfridman?ref_src=twsrc%5Etfw">@_mfridman</a> and <a href="https://twitter.com/Data_sigh?ref_src=twsrc%5Etfw">@Data_sigh</a>. Congrats! 🎉 <br><br>Check out your DMs to get the invite links to create your accounts <a href="https://t.co/aS3Z8HMO3L">https://t.co/aS3Z8HMO3L</a> <a href="https://t.co/P82vkzgPvA">pic.twitter.com/P82vkzgPvA</a></p>— Tinybird (@tinybirdco) <a href="https://twitter.com/tinybirdco/status/1336956317735464961?ref_src=twsrc%5Etfw">December 10, 2020</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></figure><p>But this doesn’t come without challenges. We still have to think very carefully when writing queries and designing how we store and process data. It is an iterative process and when data or concurrency increase, some queries have to be rewritten to maintain performance.</p><h2 id="monitoring-api-response-times">Monitoring API response times</h2><p>Our enterprise tier includes performance (QPS and latency) SLAs that our platform needs to satisfy. We provide a platform where our customers can store, process and <a href="https://www.tinybird.co/docs/guides/querying-data/dynamic-aggregation" rel="noreferrer">define dynamic API endpoints</a> to query their data.</p><p>As users can define arbitrary endpoints, some of them could end up being too computationally expensive and slow. That would have an impact on performance, so to meet the SLAs we sometimes also help our enterprise customers optimize their data schemas and their queries.</p><p>Monitoring the requests made to those endpoints and their duration is fundamental to know how well the system is performing and whether more work has to be done to meet performance requirements.</p><p>This log data is available for all Tinybird accounts via the tinybird.pipe_stats_rt service data source, for the past two days.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://tinybird-blog.ghost.io/content/images/2023/09/6253d9c053f05b4a1a070d04_pipe_stats_rt-11.png" class="kg-image" alt="The tinybird.pipe_stats_rt Data Source provides request time, endpoints called, full URLs, response durations and error codes, if any" loading="lazy" width="1999" height="657" srcset="https://tinybird-blog.ghost.io/content/images/size/w600/2023/09/6253d9c053f05b4a1a070d04_pipe_stats_rt-11.png 600w, https://tinybird-blog.ghost.io/content/images/size/w1000/2023/09/6253d9c053f05b4a1a070d04_pipe_stats_rt-11.png 1000w, https://tinybird-blog.ghost.io/content/images/size/w1600/2023/09/6253d9c053f05b4a1a070d04_pipe_stats_rt-11.png 1600w, https://tinybird-blog.ghost.io/content/images/2023/09/6253d9c053f05b4a1a070d04_pipe_stats_rt-11.png 1999w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The tinybird.pipe_stats_rt Data Source provides request time, endpoints called, full URLs, response durations and error codes, if any</span></figcaption></figure><p>But even if you don’t use Tinybird, if you have requests logs for your app’s endpoints, you could do a similar analysis to the one we explain below.</p><h2 id="correlating-presence-or-absence-of-query-parameters-with-response-times">Correlating presence or absence of query parameters with response times</h2><p>The goal here is detecting what parameters have a bigger impact on query complexity when they’re present in the request URLs, and then modifying those queries or the underlying schemas to keep response times low.</p><p>For any given endpoint, if response times are higher when a certain parameter appears in the request URL, and they’re lower when it doesn’t appear, the correlation between response times and whether that parameter was included in the URL will be high.</p><p>Computing whether and which parameters appear in each request URL can be done with some of the <a href="https://clickhouse.tech/docs/en/sql-reference/functions/string-search-functions/">string search functions</a> that ClickHouse® provides us with. To calculate whether the endpoint URL contains the word country, we’d add a column like:</p><figure class="kg-card kg-embed-card"><iframe loading="lazy" width="100%" src="https://snippets.tinybird.co/XQAAAAKSAAAAAAAAAABBKUqGk9nLKzhVDMAcvXfa6jn7Haqzagx2gHlwxLXYxp-wMIxdkg2yqHcoiO_vuM--Llgurn6bKWROqwbL9jeB4mCqwkzIFeG79Z8H4YpUR8lLpTGsf5ghKyV_4Kqipsk2ag4e7gLSQ9naGfA_xV62eFAfvuhqjAI_jZcyI6y8xDorFRf0DBCM___X6LAA/embed"></iframe></figure><p>We could also perform the previous step using <a href="https://clickhouse.tech/docs/en/sql-reference/functions/url-functions/#extracturlparameterurl-name">extractURLParameter</a>, another ClickHouse® function.</p><p>To add a column that counts the number of elements of a comma-separated-values parameter, we’d do:</p><figure class="kg-card kg-embed-card"><iframe loading="lazy" width="100%" src="https://snippets.tinybird.co/XQAAAAK_AAAAAAAAAABBKUqGk9nLKzhShj3IlJavOpzyjiKhBzP6pbmMoLSZs_2nTmxwcsgJixd5PEVjJ7PMp2_NRKcW_cIh_IrgZjvEM-W1lok-6vrvhYUPCnIIMyK2PEQM_YeV4hQgZcPMGkRmoPw8HW4mnlbUTIxY10Uji0twlMWqCBzB0Dq17zlK4Hi52ELg5I407rpge8BO-6gTJITnnjkwxOUQ25TKxgmpBpHfiltlqr4LDcO-_yojIAA/embed"></iframe></figure><p>And finally, the last step is getting the correlation between whether a query string is present in the endpoint and the duration of the requests made to that endpoint. The higher the correlation, the longer the calls to that endpoint take when that parameter is present, and the more work will likely have to be done to bring request durations down.</p><p>We can also compute correlation within Tinybird and ClickHouse®, with the <a href="https://clickhouse.tech/docs/en/sql-reference/aggregate-functions/reference/corr/">corr</a> function</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://tinybird-blog.ghost.io/content/images/2023/09/62567516d03fb73c67384ea8_corr_node-11.png" class="kg-image" alt="Correlating all the variables we've created with response duration" loading="lazy" width="1094" height="799" srcset="https://tinybird-blog.ghost.io/content/images/size/w600/2023/09/62567516d03fb73c67384ea8_corr_node-11.png 600w, https://tinybird-blog.ghost.io/content/images/size/w1000/2023/09/62567516d03fb73c67384ea8_corr_node-11.png 1000w, https://tinybird-blog.ghost.io/content/images/2023/09/62567516d03fb73c67384ea8_corr_node-11.png 1094w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Correlating all the variables we've created with response duration</span></figcaption></figure><p>Some elements of the correlation matrix will probably be null (because a parameter was never present in an endpoint), so we’ll replace those null values with 0. ClickHouse® lets us do it with the <a href="https://clickhouse.tech/docs/en/sql-reference/functions/conditional-functions/#ternary-operator">ternary operator</a></p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://tinybird-blog.ghost.io/content/images/2023/09/62567516744ac0739e34f624_fillna_corr-11.png" class="kg-image" alt="Replacing null values with 0" loading="lazy" width="1101" height="849" srcset="https://tinybird-blog.ghost.io/content/images/size/w600/2023/09/62567516744ac0739e34f624_fillna_corr-11.png 600w, https://tinybird-blog.ghost.io/content/images/size/w1000/2023/09/62567516744ac0739e34f624_fillna_corr-11.png 1000w, https://tinybird-blog.ghost.io/content/images/2023/09/62567516744ac0739e34f624_fillna_corr-11.png 1101w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Replacing null values with 0</span></figcaption></figure><h2 id="visualizing-the-correlation-matrix-in-google-sheets">Visualizing the correlation matrix in Google Sheets</h2><p>Tinybird allows you to expose the results of a pipe with a JSON or CSV API endpoint. Clicking on the <strong>View API</strong> button in the top right corner you’ll see a self-documenting page for the endpoint exposed:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://tinybird-blog.ghost.io/content/images/2023/09/625675161a10762224de55d5_api_docs-11.png" class="kg-image" alt="API endpoint documentation page within Tinybird" loading="lazy" width="1596" height="954" srcset="https://tinybird-blog.ghost.io/content/images/size/w600/2023/09/625675161a10762224de55d5_api_docs-11.png 600w, https://tinybird-blog.ghost.io/content/images/size/w1000/2023/09/625675161a10762224de55d5_api_docs-11.png 1000w, https://tinybird-blog.ghost.io/content/images/2023/09/625675161a10762224de55d5_api_docs-11.png 1596w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">API endpoint documentation page within Tinybird</span></figcaption></figure><p>And with the <a href="https://support.google.com/docs/answer/3093335?hl=en">IMPORTDATA</a> function available in Google Sheets, we can import the results into a spreadsheet, where we can visualize them much better:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://tinybird-blog.ghost.io/content/images/2023/09/6256177dc13708cfa465ddce_corr_matrix-15.png" class="kg-image" alt="Visualizing the correlation matrix in Google Sheets" loading="lazy" width="1356" height="377" srcset="https://tinybird-blog.ghost.io/content/images/size/w600/2023/09/6256177dc13708cfa465ddce_corr_matrix-15.png 600w, https://tinybird-blog.ghost.io/content/images/size/w1000/2023/09/6256177dc13708cfa465ddce_corr_matrix-15.png 1000w, https://tinybird-blog.ghost.io/content/images/2023/09/6256177dc13708cfa465ddce_corr_matrix-15.png 1356w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Visualizing the correlation matrix in Google Sheets</span></figcaption></figure><h2 id="what-did-we-do-to-fix-it">What did we do to fix it?</h2><p>For example, in this case we’d see that the endpoints 0, 1 and 14 tend to be slower with non-aggregated data (h_non_aggr). To fix it, we <strong>partitioned the table and set a good sorting key</strong> so that it does not have to do a full scan every time.</p><p>Also, the bigger the range of dates we’re reading data from (n_day), the slower endpoints 8, 12 and 22 are. To deal with bigger date ranges, the way to go would be <strong>merging data from materialized views with different temporal aggregations</strong>, depending on the date range of the current query.</p><p>We also noticed that endpoints filtered by country (h_country, h_country2) tend to be faster (except endpoint 2, due to an expensive array filtering). So we <strong>reduced the amount of threads needed for those queries</strong> so that other queries had more threads available.</p><p>More things: we saw that h_raw_filter impacts performance on most of the endpoints. This is due to an expensive IN filter to a big dimensions Data Source. We did nothing because it would have complicated the logic of other endpoints, but it’s a place where to improve for the future.</p><p>And finally: the bigger h_page_size is, the slower the queries are. There is nothing to do here. The developers that integrate the endpoints should take it into account, and therefore request <strong>smaller page sizes</strong> and <strong>implement infinite scrolling</strong> or similar.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://tinybird-blog.ghost.io/content/images/2023/09/625511fb3e8e422b9225c1e0_old_vs_new_endpoint_times-11.png" class="kg-image" alt="After making these changes, response times went down ~40%" loading="lazy" width="907" height="676" srcset="https://tinybird-blog.ghost.io/content/images/size/w600/2023/09/625511fb3e8e422b9225c1e0_old_vs_new_endpoint_times-11.png 600w, https://tinybird-blog.ghost.io/content/images/2023/09/625511fb3e8e422b9225c1e0_old_vs_new_endpoint_times-11.png 907w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">After making these changes, response times went down ~40%</span></figcaption></figure><p>That’s it! If you’d like to learn more about building real-time analytics systems that perform well over billions of rows, check out our <a href="https://www.tinybird.co/courses/principles-of-real-time-analytics">Principles of Real Time Analytics free course</a></p><p>Tinybird is the way to build real-time data products. If you too think Data should be analysed as it happens and without worrying about scale, <a href="https://www.tinybird.co/signup" rel="noreferrer">sign-up for an account</a>.</p><p>‍</p>
