Victor Wiklund is a Data Engineer in the Business Operations Intelligence team. Having recently inherited a domain with a high level of data quality problems, he outlines his five-pillar approach to resolving issues while rebuilding trust.
One of the core missions of the Data Platform team here at GetYourGuide is to empower the organization to make data-driven decisions by providing timely and accurate data. As a strongly data-driven organization, this is a critical responsibility.
However, what do you do when the data and pipelines in question are inherently challenging to work with? How do you ensure that you can generate trust when the data is considered unreliable from the get-go?
Building trust in spite of challenging data is the topic of this blog post, where I detail my experience in taking on a domain where problems with data quality were the rule rather than the exception. Most importantly I want to share what I identified as the key components of making the domain I became responsible for more trustworthy.
First and foremost you need to make sure that you are always on top of the issues. Perhaps the data quality is low and the issue frequency is high, and you can’t do anything about that at the moment. Nevertheless, you can still take steps to make sure that you are reliable.
If you can fit in these cornerstones you will already have gone a long way to improving trust in the domain. Obviously this is not a trivial task, and it will require some trial and error – you likely won’t be able to immediately provide an accurate ETA for issue resolution for example – but making an attempt and then iterating on it goes a long way.
Speed is the second component of building trust. To be more precise, having the ability to quickly understand and identify the root cause of an issue is critical.
Many issues come with shared patterns. Developing tooling that will allow you to drill down and dissect the problem will be a big time saver in the long run, as well as help you understand the data and communicate with your stakeholders.
For this reason I would suggest syncing with your manager to obtain time for identifying patterns and setting up tooling, arguing for how early investment into improving your investigation speed will pay dividends later on.
At this point you are able to understand and resolve issues quickly, which is fantastic. But if you fail to adequately communicate what you are doing, your work towards building trust will end up being hamstrung.
There are multiple elements that go into enabling clarity, but some of the more fundamental should be the following:
This way you will save time on writing your updates and reduce any risk of user confusion or miscommunications.
As an additional benefit, a consistent format of communications can be leveraged later on as a source for understanding resolution times and issue types if you don’t already have that in place.
With reliability, speed, and clarity in place, we reach the topic of accountability. Or rather, placing the responsibility for the issues you are dealing with where it belongs. In my case the provider of the data had no quality checks in place, their response rate was slow, their average handling time for issues even worse, and every resolution broke something else. The source of the problems in the data lay with an external partner we were stuck with for the foreseeable future.
I was not to blame for these issues. But sometimes you have to look beyond your direct responsibilities to truly improve things. For a sustainable improvement of trust in your domain it’s essential to address the root cause.
In my case this equaled holding the data providers to a higher standard. What this ‘higher standard’ looks like will obviously vary on a case-by-case basis, but here are some of the agreements we reached.
I also want to make clear that this type of conversation should not be antagonistic. You are data partners. It should be in the best interests of both parties to improve the relationship and the service; by having an honest discussion and taking an iterative approach you should both be able to reach a positive outcome.
This final component is one of the most important, but it can only come into place after the others have been managed: making sure your work is seen.
If your domain has been having issues for a long time, there may be a distrust towards the data, and reluctance to act on it. And that suspicion comes from a valid place. But if you actually have managed to improve things, you need to correct this flawed perception of where things stand. If not, the distrust risks being perpetuated for no reason.
As always, there are many ways to go about this. I would suggest the following:
You have worked hard to improve things. Now make sure that everyone sees that your efforts have borne fruit.
In my experience of taking on a domain with multiple data quality issues, I identified five pillars of trust: Reliability, Speed, Clarity, Accountability, and Visibility.
Every journey is different, but I hope you managed to find some inspiration in mine. Trust should always be at the core of data, and I wish the reader all the best in building theirs – no matter how challenging the domain.
Highlights from PyCon De PyData Berlin 2023
PyData Berlin: How The Data Community Comes Together - Feb Edition