By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.
Jan 2, 2023
Product

Monitoring Data Deviation with Sifflet

Post by
Wissem Fathallah
&

Data distribution deviation is a widespread pain point for data consumers. Data distribution deviation is when categorical or numerical data distribution changes swiftly or slowly over time - and when these changes go undetected, their downstream impact can be significant.

Let’s dive into a few examples to understand how data distribution changes can affect data users within your organization:

  • For data scientists and machine learning engineers, slow distribution deviation is a top-of-mind concern as it may happen with every feature in their models. This leads to data drifts and, consequently, to the progressive loss of accuracy of models. 
  • For analytics engineers, a new unforeseen value of a categorical field can lead to data loss, even in a query as simple as a CASE WHEN statement within the pipelines. 
  • For data consumers, silent distribution deviation of aggregated business metrics results in the wrong business insights and consequently leads to inaccurate decision making. 

While data teams may rely on manual testing to avoid distribution deviation issues, it requires heavy manual lifts to build and scale it to multiple data assets and, more importantly, considerable time investment in maintaining it over time. 

Sifflet now solves this pain. 

Introducing distribution deviation monitoring

We are happy to introduce Sifflet’s distribution deviation monitoring. Sifflet leverages advanced statistical models to automatically detect distribution deviation at the field level - based on a rolling or a fixed time reference. With this new capability, combined with Siflet’s auto-coverage, data teams can automatically monitor large numbers of datasets of different sizes with a click of a button, allowing them to be in the know if a significant distribution deviation is happening. On top of these features, data teams using Sifflet can then leverage the field-level lineage to get to the root cause of these anomalies and troubleshoot them efficiently. 

The distribution deviation monitoring feature is available for all of our customers. 

 

Video from Sifflet

This feature comes from our continuous efforts to deliver comprehensive, automated, accurate, and actionable data quality monitoring. Want to learn more about our monitoring capabilities? Reach out for a demo.

Know when data breaks and fix it efficiently by leveraging Sifflet’s end-to-end lineage and automated monitoring and anomaly detection. Read more on how our product works.

Related content