Engineering

Android Release Automation - Our Journey to Fully Automated Weekly Releases

Uncover GetYourGuide's revolutionary journey from manual biweekly releases to fully automated weekly app deployments. This in-depth blog post by our Senior Mobile Engineer, Marjana Karzek, reveals key strategies for implementing robust automated testing, optimizing CI pipelines, and streamlining release automation. Learn how to reduce feature rollout time by 57%, enhance release reliability, and empower your mobile development team with cutting-edge tools and practices.

Marjana Karzek

Senior Android Engineer

Key takeaways:

Marjana Karzek, Senior Mobile Engineer, gives you a behind-the-scenes look at how GetYourGuide went from manual biweekly to automated, reliable, and weekly Android app releases.

Abstract

In 2023, our release process involved many manual steps, leading to delays and inconsistencies. By automating End-to-End tests with Kaspresso and Jetpack Compose UI Test and integrating these into our CI, we reduced feature rollout time by more than 57%. We implemented custom scripts for automated incremental rollout and introduced a fast-track feature for rapid 100% availability. We now have a streamlined process, minimal failures, and a consistent, efficient release schedule.

Starting Point

In 2023, our release process involved several manual steps. Every second Sunday night, the release was built on the CI, manually validated Monday to Wednesday by the whole team, and gradually rolled out starting Thursday. Engineers ran test cases, rotating to avoid bias, with a Release Captain ensuring all tests were performed and reported. The Release Captain is a rotating position within the guild and acts as the first responder in case something goes wrong. However, outdated and inconsistent test cases led to execution issues, especially during busy periods, as complex or less relevant tests would be skipped to prioritize feature work.

The Release Captain also manually checked release quality, managed rollout, e.g., increment the rollout percentage, and notified stakeholders. If issues arose, they halted the release, notified the team, and coordinated hotfixes. Bugs found on Tuesday afternoons typically delayed the release by four days, as fixes, feature validations, and the Google Review pushed the rollout to the following Monday. Vacations and public holidays further extended the process, considering there is no dedicated QA team. Due to issues like main branch problems, flaky UI tests, or Google Play Store API downtimes, CI build failures were frequent challenges. Overall, it took 21 to 25 days to fully roll out a feature from merging to 100% availability, delaying crucial product experimentation and causing stress for engineers, particularly before code freeze, often leading to more bugs and broken features.

Testing Strategy

We aimed to match the confidence level of our manual release tests with automated End-to-End (E2E) tests. Our previous UI tests with mocked backend responses were too flaky, so we prioritized stability in our tool selection.

We chose Kaspresso with Jetpack Compose UI Tests for their built-in flakiness protection and declarative, human-readable tests. Kaspresso's step-based approach makes logs easy to understand, even for engineers unfamiliar with the domain. Migrating from Espresso to Kaspresso was straightforward, and Jetpack Compose UI Tests provide fast, reliable testing.

Our E2E tests run in parallel on Firebase Test Lab using Flank, allowing us to configure test environments, parallel runs, and retries for flaky tests. Our tests take about 7 minutes, with 15 tests per minute. We recommend combining tests into flows to manage the heavy testing load. The testing output is used to notify code owners about failing tests so they can fix them promptly.

We use Paparazzi for screenshot tests for UI verification, which run as unit tests without an emulator. Our open-sourced Android Studio plugin previews tests during development: Paparazzi Plugin. We test various UI states, including light/dark mode, different font sizes, and right-to-left layouts.

The CI compares generated images against a source of truth for each PR. Differences cause test failures, and PRs include automatic screenshots for design reviews on GitHub. We use Git Large File Storage to manage screenshots.

Some tests, like social authentication, can only partially be fully automated. We use feature toggles to quickly turn off failing features when we get an alert from our monitoring, allowing for hotfixes or future releases without affecting the user for too long.

This strategy achieved higher confidence levels than manual testing, and we removed the manual testing step from our release process.

Optimising CI

The main challenge was the early discovery of issues on our CI, such as the main branch build not working or tests failing. We needed a way to make our feedback loop faster. After migrating manual tests, we added them to our CI for pull requests. For each PR, we run unit, screenshot, UI, and E2E tests only for the affected modules, providing engineers with feedback before PR reviews.

We also use GitHub’s merge queue to run all tests for queued changes, ensuring the main branch is always buildable and releasable. This eliminates the need for UI or E2E tests during release builds and significantly reduces release build failures.

With a fast feedback loop and reliable release pipeline, we now submit release builds to Google on Saturdays, allowing more time for review and ensuring a smooth rollout the week after.

Rolling out Automatically

To automate the release captain’s tasks, we had to ensure good visibility of our release. We leveraged Datadog dashboards and Crashlytics to create solid monitors and SLOs. As of now, we are alerted of issues within an hour.

We automated rollout incrementation with a custom script using GitHub Actions and Google BigQuery to check the crash-free rate daily. The rollout is automatically increased via Google Play API if the release is stable. Starting with 1% on Saturdays, we increment to 15%, 25%, and 100% as confidence grows. An hourly script evaluates metrics - in case of emergencies, the release is stopped, and stakeholders are informed.

Confident in our release, we implemented a fast-track feature. On Wednesday mornings, a script evaluates the Margin of Error. If it meets our SLOs, the release is set to 100% availability immediately.

Conclusion

Our journey to fully automated weekly releases has transformed our process, reducing the time it takes to successfully roll out a feature to 100% of our users from 21-25 days to just nine days without engineers getting involved. Thanks to robust E2E tests with Kaspresso, Jetpack Compose UI Test, and Paparazzi for UI verification, we achieved high confidence in our testing.

Automating the release captain's tasks and integrating a fast feedback loop within our CI process minimized the risk of failures. The merge queue strategy ensures the main branch is always buildable and releasable, while our fast-track feature allows for rapid 100% availability when conditions are met.

These improvements enable a consistent release schedule, quick issue response, and a more manageable workload for our engineering team. Our automated release process enhances efficiency and delivers high-quality features swiftly to our users.

‍

For a deeper dive into the evolution of our release processes and how they align with industry practices, explore our detailed accounts in The Long and Winding Road to Short and Smooth Releases and revisit our insights shared during DroidCon.