data-distrust-how-to-rebuild-confidence-in-problematic-domains
Data Science
Sep 14, 2023

Data Distrust: How to Rebuild Confidence in Problematic Domains

Victor Wiklund
Data Engineer

Victor Wiklund is a Data Engineer in the Business Operations Intelligence team. Having recently inherited a domain with a high level of data quality problems, he outlines his five-pillar approach to resolving issues while rebuilding trust.

{{divider}}

One of the core missions of the Data Platform team here at GetYourGuide is to empower the organization to make data-driven decisions by providing timely and accurate data. As a strongly data-driven organization, this is a critical responsibility.

However, what do you do when the data and pipelines in question are inherently challenging to work with? How do you ensure that you can generate trust when the data is considered unreliable from the get-go?

Building trust in spite of challenging data is the topic of this blog post, where I detail my experience in taking on a domain where problems with data quality were the rule rather than the exception. Most importantly I want to share what I identified as the key components of making the domain I became responsible for more trustworthy.

The Components of Trust

1. Reliability

First and foremost you need to make sure that you are always on top of the issues. Perhaps the data quality is low and the issue frequency is high, and you can’t do anything about that at the moment. Nevertheless, you can still take steps to make sure that you are reliable.

  • Set expectations by reaching out to key individuals using the data. Everyone should understand that it is a challenging domain, that steps will be taken to remedy the situation, and that you are there to take those steps.
  • Next, ensure that you are aware of issues before anyone else. You should be the one to tell stakeholders when a problem has occurred, not the other way around.
  • Understand why there is an issue, so that the impact of the problem can be isolated and analyzed.
  • Have an idea of when the issue will be resolved so that plans and decisions can be adjusted accordingly.

If you can fit in these cornerstones you will already have gone a long way to improving trust in the domain. Obviously this is not a trivial task, and it will require some trial and error – you likely won’t be able to immediately provide an accurate ETA for issue resolution for example – but making an attempt and then iterating on it goes a long way. 

2. Speed

Speed is the second component of building trust. To be more precise, having the ability to quickly understand and identify the root cause of an issue is critical. 

Many issues come with shared patterns. Developing tooling that will allow you to drill down and dissect the problem will be a big time saver in the long run, as well as help you understand the data and communicate with your stakeholders. 

For this reason I would suggest syncing with your manager to obtain time for identifying patterns and setting up tooling, arguing for how early investment into improving your investigation speed will pay dividends later on.

3. Clarity

At this point you are able to understand and resolve issues quickly, which is fantastic. But if you fail to adequately communicate what you are doing, your work towards building trust will end up being hamstrung.

There are multiple elements that go into enabling clarity, but some of the more fundamental should be the following:

  • Establish a set mode of communication for incidents and updates. Because people who work with the data should not have to worry about the status of ongoing issues, make the communication channels crystal clear. 
  • I went with daily updates on a dedicated Slack channel whenever there was an ongoing issue, but meetings or emails are also valid approaches.
  • Establish a set format for your communications and ensure they are brief, easy to understand, and convey the key message. For example:
  • What is the issue?
  • When will the issue be resolved?

This way you will save time on writing your updates and reduce any risk of user confusion or miscommunications.

As an additional benefit, a consistent format of communications can be leveraged later on as a source for understanding resolution times and issue types if you don’t already have that in place.

4. Accountability

With reliability, speed, and clarity in place, we reach the topic of accountability. Or rather, placing the responsibility for the issues you are dealing with where it belongs. In my case the provider of the data had no quality checks in place, their response rate was slow, their average handling time for issues even worse, and every resolution broke something else. The source of the problems in the data lay with an external partner we were stuck with for the foreseeable future.

I was not to blame for these issues. But sometimes you have to look beyond your direct responsibilities to truly improve things. For a sustainable improvement of trust in your domain it’s essential to address the root cause.

In my case this equaled holding the data providers to a higher standard. What this ‘higher standard’ looks like will obviously vary on a case-by-case basis, but here are some of the agreements we reached. 

  • Aligning on service-level agreements for the response and resolution time of issues.
  • Setting up a daily email describing the landing time and status of each delivery.
  • Setting up a biweekly sync to go through current issues and next steps.
  • Establishing a set mode of communication when resolving issues. For example always including the root cause and a short description of the fix when closing tickets.
  • Setting up automatic quality controls on key dimensions at the data providers’ end, stopping suspect data from being ingested by our systems without our approval.

I also want to make clear that this type of conversation should not be antagonistic. You are data partners. It should be in the best interests of both parties to improve the relationship and the service; by having an honest discussion and taking an iterative approach you should both be able to reach a positive outcome.

5. Visibility

This final component is one of the most important, but it can only come into place after the others have been managed: making sure your work is seen.

If your domain has been having issues for a long time, there may be a distrust towards the data, and reluctance to act on it. And that suspicion comes from a valid place. But if you actually have managed to improve things, you need to correct this flawed perception of where things stand. If not, the distrust risks being perpetuated for no reason. 

As always, there are many ways to go about this. I would suggest the following:

  • Establish health indicators for the domain and track them.
  • E.g., Issues raised, average handling time, time spent, and so on.
  • Visualize this data (because a nice graph is more compelling than a list of numbers.)
  • Make sure the right people are aware of these metrics, agree on them, and use them whenever the question of the health of your domain comes up.

You have worked hard to improve things. Now make sure that everyone sees that your efforts have borne fruit.

Final Thoughts

In my experience of taking on a domain with multiple data quality issues, I identified five pillars of trust: Reliability, Speed, Clarity, Accountability, and Visibility.

Every journey is different, but I hope you managed to find some inspiration in mine. Trust should always be at the core of data, and I wish the reader all the best in building theirs – no matter how challenging the domain.

Other articles from this series
No items found.

Featured roles

Marketing Executive
Berlin
Full-time / Permanent
Marketing Executive
Berlin
Full-time / Permanent
Marketing Executive
Berlin
Full-time / Permanent

Join the journey.

Our 800+ strong team is changing the way millions experience the world, and you can help.

Keep up to date with the latest news

Oops! Something went wrong while submitting the form.