Data-driven decision-making has placed increased importance on collecting and analyzing data within companies. Acting on data-derived information is crucial for organizations; however, data teams spend too much time debating which numbers from different sources are the right ones to use. The concept of a Single Source of Truth (SSOT) promises to solve this issue by ensuring that everyone in an organization bases their decision-making on the same data. This is achieved by aggregating data from many sources into a single location. In other words, a Single Source of Truth is a centralized repository for all the data within an organization.
It is crucial for companies that want to extract the most value from their data to ensure that everyone speaks the same language and bases their decision-making on the same information. However, what does Single Source of Truth mean practically? And most importantly, is it achievable within organizations? We sat down with Edouard Flouriot from Aircall and Alexandre Cortyl from Dashlane to answer these questions and provide more clarity on this widely discussed topic.
The concept of a Single Source of Truth gets thrown around in discussions on companies’ data management. However, its definition varies depending on whom you ask.
Edouard explained that it is a highly complex and primarily unattainable concept when asked about this. He argued that the Single Source of Truth could work only if one single application is used to run everything within the company. The reality is that more nuance is needed. Alexandre agreed that the Single Source of Truth is an ideal concept that is often hard to achieve in practice because data can be seen from many different perspectives. For him, each tool may have other measurements, specificities, or best practices that inevitably lead to various sources of truth or different perspectives. However, if organizations want to get more value out of their data, they cannot allow silos, and the data team plays a crucial role in ensuring that silos are not created in two main ways:
It’s key to ensure that the main stakeholders have easy access to one specific source rather than a multitude of them not to impact their trust in the data you are showcasing. This is critical to avoid the worst-case scenario, which is when stakeholders see different data in different dashboards. When asked how often this happens, Alexandre explained that it depends on many factors, but mainly on how mature the organization is in providing complete access to information. He then explained that it is essential to avoid this situation because stakeholders will start challenging the data quality, and their trust in the information they are provided with will diminish. This will have implications on how often they will be leveraging data and how much they are going to rely on it for their decision-making. So, this is something to master as a data team — ensuring that the organization provides stakeholders with consistent information throughout time and perspectives. To do this, data teams need to ensure that reports and analyses are done repetitively and consistently.
Edouard also added that while stakeholders might not actively participate in data teams’ DQM roadmaps, they are adamant about it. Stakeholders use data to improve their activities and reach their targets. To do so, they want the maximum level of trust in the data they use, and they rely on the data team to monitor their quality. For Edouard, it is critical to put data at the center of the strategy and ensure that the data team centralizes and controls significant data interactions from stakeholders. In this way, it is also possible to improve data quality at scale. Some best practices were put in place within his organization to implement the needed change. First, creating a vertical organization within the data team ensured that data people were very close to stakeholders and how they interacted with the data. Practically, this means creating different teams dedicated to different sides of the organization — a team works together with product managers, other works with customer-facing teams, and so on. Second, focusing on culture and data literacy is instrumental. In the past, most of the dashboards and monitoring were coming from the data team, but today, decentralization is needed more and more. For Edouard, decentralization is possible when technical barriers are lowered and when you ensure that everyone uses the same metrics’ definitions. The key to enabling this change is to focus on quality and governance, which can only be achieved through changing culture.
When asked about this, Alexandre said that the earlier, the better. For him, it is also essential to implement processes robustly — CI/CD, code review, and so on. Additionally, it is crucial to align and ensure that there are clear definitions and documentation of data, with a specific emphasis on expectations. So, what are you expecting from the data? When are you hoping to get it, and in what format? This is critical to ensure that there are no different perspectives and understandings of what works and what doesn’t. On top of this, you should also apply automated tests, checks, and monitoring. For Edouard, there are many best practices from the software engineering world that can be used for data. Generally, data teams are mostly focused on bringing value and impact, so software engineering basics are not always obvious. However, they are vital. He recalled an episode within his organization in which the CEO shared his concerns about the reliability of the data he was using. To solve this issue, they took the time to audit the systems and understand how they could mitigate the risk of failure. The first reliability issue they found was the stack, so they decided to review it. Second, they focussed on strengthening the reliability. To do so, they enlarged their team to be able to take more care in the way they were releasing new changes and fixes in the pipelines.
Every organization wants to establish a Single Source of Truth. However, its actual implementation is often a topic for debate. Luckily, it comes down to people and tooling.
Gartner defines Data Literacy as the ability to read, write and communicate data in context, including an understanding of data sources and constructs, analytical methods and techniques applied, and the ability to describe the use case application and resulting value.
It is key to ensure that everyone is aligned and speaks the same language when it comes to data in an organization, and it starts with the following key steps.
While we can all agree that achieving a Single Source of Truth starts with people and organization, choosing the right tools is key to successful adoption. Data Observability tools help ensure the reliability of the data and increase trust in data assets. They do so by bringing complete visibility of the data assets and hence allowing data teams to monitor the health of the data platform proactively. Including Data Observability in your data stack (or top of your stack) can help your organization establish a Single Source of Truth.
Observability is a concept that came from the Software Development world, or DevOps more specifically. In DevOps, the notion of Observability is centered around three main pillars: traces, logs, and metrics that represent the Golden Triangle of any infrastructure and application monitoring framework. Solutions like Datadog, NewRelic, Splunk, and others have paved the way for what has become standard practice in software development; it only makes sense that some of their best practices would translate into Data Engineering, a subset of Software Engineering. In data, testing the code or monitoring the infrastructure producing it simply isn’t enough, particularly as external sources and internal data producing processes grow exponentially. Essentially, data observability covers an umbrella of processes — like automated monitoring, alerting, and triaging — that allows data producers and consumers to know when data breaks and resolve the issues in near real-time. But, most importantly, data observability provides enough information to enable data users to resolve problems quickly and prevent those types of errors from occurring again.
A highly proficient Data Observability tool that will help your teams not only conduct automated anomaly detection but solve the issues in the most efficient way while getting ahead of downstream impact should include the following features:
A solution like Sifflet allows you to unlock Full Data Stack Observability by bringing extensive Field Level Lineage across your whole data pipeline from Ingestion to BI and allowing you to fully automate your Data Quality Monitoring workflow by covering thousands of tables with fundamental data quality checks in a couple of minutes (auto-coverage feature). Get in touch for a demo or 15-day free trial contact@siffletdata.com