Have you ever suffered from heavy dashboards or timed-out queries? The issue may not come from your visualization tool or your data warehouse but from your SQL query itself.
As a heavy data consumer in my past roles, I spent a significant amount of time querying thousands of terabytes of analytical data to make the right decisions. Complex queries need to be built with a lot of care. And whether interacting with a data warehouse, a data lake, or a streaming service, it comes down to applying the same principles to ensure performance in executions.
Below are some of the tricks I’ve collected over the years and narrowed down to 5 main principles:
To avoid mobilizing unnecessary system resources, start by scoping the data you need for your use case, and leverage all the SLQ tricks to get there:
Despite the apparent overlap, every SQL function is designed for a specific need, and using the right one is essential to prevent inefficiencies. Here is a non-exhaustive list of tips where even a small adjustment can have a significant performance impact:
Combining columns from different tables is very recurrent. In some cases, you are joining every row from the first table to every row from the second table, and the query might not even finish. Be wise by:
Source: Sifflet
Common Table Expressions (CTE) is a temporary table that can be defined once and used many times within the same query. CTEs are very helpful in improving your code readability, reducing its complexity, and optimizing its execution resources.
CTEs can be defined at the beginning of your queries with the instructions WITH table_name AS
Caring about your query format is essential. These simple best practices can make your queries readable, easily debuggable, and shareable:
With new technologies shaping the modern data stack, data is becoming more and more accessible to users regardless of their level of technical expertise. Democratizing data is great but ensuring the operational efficiency and health of the data platform is a real challenge. Although there is no magic manual or one-size-fits-all approach, implementing some best practices as the ones covered in this article can play an integral role in ensuring the sustainability of your data operations.
Get in touch if you'd like to go in depth on any of the topics. wissem@siffletdata.com; contact@siffletdata.com