Luis Rosales Carrera is a Software Engineer at GetYourGuide on the Partner Tech team. He explains how his team is migrating the public API away from the monolith.
I’m Luis, and I drive the migration of our public API, which consists of extracting the public API from the monolith into a new separate service. Every large business at some point in time exposes its data to customers in order to facilitate customer integrations. For example, Google Maps’ API helps you insert the tool directly on your own site.
At GetYourGuide, our public API provides functionality along the complete booking process, including:
In contrast to a private API which is only used company-wide, a public API is publicly available for all developers to access. They allow developers outside of an organization's workforce to access backend data that can then be used to enhance their own applications.
Historically, our public API started within a monolith. A monolithic application describes a single-tiered software application in which the user interface and data access code are combined into a single program. Our monolith initially facilitated some aspects of its development, adding simplicity and reducing operational costs when debugging, testing, and deploying applications.
Fast-forward to today, and as more and more engineers are joining GetYourGuide, new mission teams are being created. The upshot is that the aspects that previously facilitated the creation, extension, and maintenance of our public API are no longer feasible, as it would be too large and complex to fully understand the application from multiple teams. For this reason, we had to look for alternatives to scale up our public API outside the monolith.
Broadly speaking, this migration has been divided into two steps:
In this blog post, I’ll explain how we’re carrying out the first migration phase. This aims to use dedicated components outside the monolith that cover the same functionality needed from our public API as within the monolith.
As a first step in the decoupling phase, we focused on detecting all domains (like Search, Pricing, Checkout, Booking, etc) inside the API which depended on internal monolith functionality. Take the activity search functionality: prior to decoupling, our API was accessing the database directly (monolith functionality) instead of retrieving this data through the upstream correspondent service.
It was crucial to find all dependencies for each existing domain. Dependencies included data sources, data models, and utility classes used from our API inside the monolith that needed replacing with new functionality created in the API bundle.
Once we detected all the external dependencies with their use cases for each domain, we analyzed those domains to prioritize them within this migration. The priority assigned to each domain depended on different factors such as development complexity, team capacity, and availability of the required functionality within external services.
Each mission team has ownership of one or more of the functionalities. We immediately contacted the relevant teams to align on whether they have the complete functionality required to give each domain the correct priority within our migration roadmap.
We realized that parts of the functionality needed were still under development and couldn’t be immediately integrated with their systems. We communicated our requirements and agreed on a possible due date so that both parties could integrate with as little friction as possible.
The domain migration was divided into three steps:
To facilitate the continuous delivery of the full functionality of the API, we split each domain into straightforward tasks to be tested as separate pieces of functionality.
Take the search domain, for example, which needed to be split into a migration of four endpoints. For each of these endpoints, we had to replace the part of the code that made direct access to the database by setting up a REST client that calls the search service looking for the activities requested by the user.
As this migration must happen while the system is running, any downtime on our API was not an option. For this reason, we introduced a rollout pattern widely used within GetYourGuide that allows us to deploy our code on production targeting a percentage of users with the code before and after the migration.
I’ll continue referring to the migration of the search domain and its four search endpoints to explain our approach here.
Since these endpoints have a high volume of requests per hour, we had to ensure that only a minimum number of users would be impacted by unexpected errors caused by the migration.
Given our large number of partners using these endpoints (~2000 active partners), and to avoid making significant communications referencing downtimes on our API, this migration needed to be transparent to our users.
We used what we at GetYourGuide call a staged rollout, which is part of our experimentation framework. It consists of a flag at the infrastructure level that allows us to identify all incoming requests and decide what percentage of them will target the code affected by the migration.
We used this flag to wrap up each endpoint code to decide what percentage of requests will execute the code before the migration and what percentage after. This allowed us to add an extra degree of protection and coverage when deploying our changes.
When deploying to production, each endpoint had to have its old implementation and new implementation coexisting side by side; and the execution flow split and separated by the staged rollout flag.
Initially, this flag had to be configured so that only the flow of the old implementation was executed. This ensured that at deployment time, everything still worked correctly since the same code was still running.
Once both implementations within the same endpoint were deployed, we monitored our systems to make sure that the old part was running correctly and there was no error rate encountered for having both implementations coexisting together.
If there were no errors, we started balancing our requests so that the new code could start being executed.
At first, we directed one percent of the requests to the new code and monitored the behavior of the endpoint. Whenever we found bugs, we would cut off the traffic going to the migrated part, bug-fix it, and redeploy the code to production.
We repeated this until we detected that no more errors were caused by the migration, and pragmatically increased the percentage of requests going to the migrated part. We did a linear increase (5% > 10% > 25% > 50% > 100%) over two days until we balanced all the requests to the migrated part of the endpoint.
After all traffic was directed to the migrated part of the endpoint and there were no errors in a period of about one week, we took care of the clean-up of the old part of the code (Actually, dead code).
In addition, we removed the staged rollout wrapper added to the endpoint. We now had a fully functional endpoint without any dependency on the monolith.
We’re currently applying this robust mechanism to continue migrating functionality within our API transparently to our users. Once the first phase of the migration is complete and there are no internal dependencies left in the monolith, we can proceed to decouple the public API from the monolith into a new service.
Carrying out a migration of this magnitude is a significant endeavor and takes a lot of effort. It requires a large number of resources to be dedicated not only internally but also externally to the team, and with a high degree of communication between teams. It also requires a well-established plan with milestones and goals.
One should also remember that extracting from the monolith to a separate service is not a race, but a slow marathon. It requires a lot of patience and good architectural decisions have to be made.
My advice to anyone embarking on a similar migration is this: take your time in the planning phase; contact all relevant stakeholders; make sure that all necessary services have the full functionality required; and make sure to divide the whole migration into different stages with set deadlines. This last point can make it much easier to look at the overall picture and be clear about where we are and where we are going.
Thanks to other mission teams continually migrating more domains out of the monolith into their own services, we’re able to continue the migration of our public API in a seamless manner, always providing our users with a robust and fully functional API.
If you’re interested in joining our Engineering team, check out the open roles.
How we Leverage Postgres for our Search Data Processing Pipeline at GetYourGuide
The Long and Winding Road to Short and Smooth Releases