Cant find the role you’re looking for?
Sign up to be the first to know about future job openings at GetYourGuide
Stay notified of new jobsCopyrighted 2008 – 2023 GetYourGuide. Made in Zurich & Berlin.
As a leading travel marketplace, the continuous creation of new activities is pivotal for us to ensure that we offer a wide variety of experiences for our travelers. To set up our activity providers for success, we continuously focus on making this activity creation process as seamless and efficient as possible, all while maintaining high standards of content quality. This has carryover effects on the demand side of our marketplace, as travelers interact and engage directly with the activities our activity providers create.
In this two-part article series, we share key learnings and challenges that come from launching a Gen AI product feature that can significantly impact both the demand and supply sides of a business.
These highly impactful features are usually a product of multiple teams working together over an extended period of time. We will discuss broadly about how having perseverance in your convictions, being data-driven, and effectively communicating and collaborating can turn curveballs into big wins.
{{divider}}
Our activity onboarding process involves going through a 16-step product creation wizard where activity providers add various information such as descriptions, photos, availability, pricing, and locations. Based on feedback and research done on the experiences of activity providers, we learnt that this process was long and arduous - activity providers were sometimes spending up to an hour manually creating a new product with our wizard.
Additionally, while we provide instructions, tips, and examples, the content our activity providers create can contain missing or contradictory information and might not always have our traveler-friendly tone of voice. This leads to traveler confusion, negatively impacting conversion rates for new products, and a high ratio of travelers needing to contact our care department to clarify information before making a booking.
We strongly believed that introducing generative AI into this product creation process would not only speed it up and make it easier for activity providers but also enhance content quality by generating engaging, user-friendly descriptions. This would ensure consistency across experiences and lead to more trust from customers.
Today, our full-scale rollout of the feature enables activity providers to paste their existing content (such as from their own website) into a designated input box. We then do some AI magic to generate both the longer free-text content sections, like the full description, as well as fill out many of the structured fields, like what kind of transport is used and what locations should be tagged. With AI, product onboarding becomes faster and more accurate. The AI auto-completes 8 key steps, allowing activity providers to review content in minutes, reducing time spent on manual data entry.
Reaching this stage wasn't straightforward; it involved overcoming significant challenges along the way. So, how did we transition from the ideation stage to a failed experiment, and ultimately to a successful full-scale rollout?
This was a complex cross-functional project that spanned over multiple quarters and involved contributors from several teams: the new Supply Data Products team, which did the heavy lifting for setting up the AI model; the catalog tech team, responsible for our product inventory infrastructure, the content management team, responsible for overseeing the content quality of our inventory and the Analytics team, responsible for setting up the experiment and measuring its success.
One of these challenges was evaluating the experiment’s success. While we have a very proficient in-house experimentation platform, its functionalities were not fully accessible for experiments done on activity providers and while we consistently do tests on our activity providers, this was the largest test in terms of both its impact and scope. This meant our primarily traveler-focused AB experimentation framework could not easily be relied upon. Even though activity providers were exposed to A vs. B variants, travellers couldn't be separately assigned to these variants. After all, a product created through AI could not also have a non-AI created version (since that is not what the supplier opted for when creating the product).
Additionally, evaluating AI outputs is challenging because these systems often act as black boxes, making it hard to discern how decisions are made and what to tweak for the final product. The correctness and suitability of AI-generated results depend on factors like input data quality and algorithm design. While outputs may be technically correct, they might not meet what our platform could support. One common pitfall is scope creep—underestimating the limitations of AI and over-promising what AI can realistically achieve in the marketplace. It’s crucial to balance ambitious goals with practical constraints. The scope must be adaptable to rapid advancements yet precise enough to guide development effectively.
Faced with these challenges, we dotted the i’s, crossed the t’s and then launched our first full-scale experiment.
For our first experiment, 75% of our user base was assigned to the treatment group, where they could opt to use the AI feature, whilst the other 25% would form our control group and would not be exposed to the feature. We chose a larger base to be in our treatment group, as there was an opt-in scenario. Activity providers could still choose not to use the feature. We knew from smaller tests done previously that the adoption rate would be around 60% for the AI feature, meaning that 60% of the exposed 75% would result in around 50% of activities being created via AI.
In the early days of the experiment, we noticed that our primary success metric, which measures the percentage of activities that get submitted out of the ones that start the product creation wizard, was significantly lower than expected. Our treatment group lagged behind the control group.
A significant pitfall during AI feature launches is user confusion and a lack of trust. In our case, activity providers didn’t understand how the AI tool fit into the onboarding process, leading to higher drop-off rates and confusion about how the AI feature worked.
Above, you can see a screenshot of the page where activity providers started this AI-assisted process. They would start creating an activity, land on this page, and then think they were in the wrong place. So they would go to create a new activity again. We realized that the UI didn’t really explain what the automatic content creator was or what it was helping them do.
Our second finding was that activity providers in the treatment group were spending longer on the pages that were not filled out by AI. We learned that these users were frustrated about having to fill out some of the pages manually. Different prompts for different experiences impacted the way activity providers behaved during the activity creation process.
Lastly, we encountered a measurement problem. Our planned standard A/B experiment analysis approach would not hold under the given circumstances. We learned that our experiment groups A and B showed significant differences in both traveler — and supplier-side metrics, even when we analyzed them before the experiment. Some activity providers could significantly skew the results towards A or B based on where these activity providers were assigned. This meant that we had to think of a different approach to analyzing the success of the experiment.
With our primary success metric showing a decrease, we closed the experiment to A and did not launch the feature. We collected what we had learned, went back to the drawing board, and decided to conduct a second test.
Ahead of the second test, we made a few UX improvements based on our learnings. We clarified that the AI page was a step within the normal product creation wizard, with the left-side menu/progress bar visible. This guaranteed that activity providers knew they were in the right place. We also improved the visual design and microcopy of the AI input page to better set expectations about how the tool works and what it would and wouldn’t do. We also refined the LLM to improve the content quality and to automatically fill out more sections.
The UI improvements reduced user drop-off from the AI input page to the following page in product creation by 5 percentage points (tackling the high bounce rate we described earlier). Users also had similar times spent on non-AI-assisted pages of product creation, so the time on these manual pages was decreased from the first experiment (due to users having been informed what was coming and knowing what they would have to do themselves).
We also validated the success of the AI tool with qualitative feedback via Hotjar surveys. One supplier quoted “it was very very convenient to use and very helpful… it was my first time posting an ad and it went by with a breeze. Thank you for this”. We found instances where supply partners progressed through the product creation process in just 14 minutes from start to finish.
A significant achievement also came in the form of building an experimentation framework that allowed us to measure both the supply side and the demand-side impact of the tool. This framework accounted for potential skew introduced by certain activity providers and recognized that our metrics were not directly comparable even before the experiment began (i.e., our metrics were not similar for A vs B even before the experiment, which meant any uplifts during the experiment would be hard to attribute to the feature itself).
The end result of our efforts was an increase in all our success metrics. By assisting activity onboarding, activity providers were more likely to complete onboarding for their activities, and the higher quality content meant we saw a solid increase in activity performance for activities onboarded through AI. We closed the experiment and launched the feature to 100% of our user base.
In the interest of keeping this reading brief, we made several omissions in the events that happened during both tests in terms of how the project developed and the countless decisions we had to make. However, our key learnings from this multi-quarter feature launch that impacted both the supply and demand side of the market, whilst involving 4 collaborating teams, are summarized below:
While AI has undoubtedly been the talk of the town in recent days, integrating it into a product is far from straightforward. It requires meticulous planning, a data-driven approach, streamlined cross-functional collaboration, and a willingness to embrace complexity and overcome hurdles. We hope that the insights and learnings shared here prove valuable to readers in their own endeavors.
Special thanks to Agus for being our sparring partner, and Raslam and Konrad for reviewing this article.
Want to help us revolutionize the experience industry? Check out our open roles and join our journey.