Data Science

The good, the bad, and the blurry: replacing intuition with data in visual selection at GetYourGuide

Discover how GetYourGuide leverages AI-powered image ranking to boost traveler trust and engagement. Learn how data-driven models assess aesthetics, quality, and relevance to display high-impact visuals, enhance user experience, and drive bookings across our global travel marketplace.

Anna Braidotti, Yasamin Klingler and Ákos Horváth

Key takeaways:

Great imagery plays a crucial role in helping travelers build trust in our products and activities. They’re processed instantly in the human brain, often faster than text, which makes them critical in user decision-making. 

However, not all images are created equal. While some inspire and engage, others fail to resonate, or worse, detract from the user experience. This blog post examines how we utilize data to rank our visual selection, then apply this ranking across our platform to enhance user engagement.

Why focus on image rankings?

Imagine you’re browsing for a spa or massage experience. You find two tours that appear to have similar written content, but one features a stunning image of a serene spa setting, while the other shows a blurry photo of someone’s apartment. Which would you choose? 

The answer is obvious, highlighting the importance of displaying the right images to users. Images are the first thing users notice, and travelers often make decisions based on them before reading any text. This is why we aim to surface high-quality, relevant images across all touchpoints of our marketplace — from activity cards to landing pages, destination pages, swim lanes, and marketing ads. 

Our goal is to replace blurry, uninspiring visuals with images that engage our users by utilizing intuitive, AI-powered models specifically designed to detect and address these pain points. So let’s dive into the methodology.

The evolution of image ranking

The old model

Until late 2024, we used various models trained on images from our own photo shoots. It included images that we deemed ‘good’ internally, and relied on a third-party service to surface similar images. While it worked to some extent, it had some limitations that caused roadblocks:

  • Scalability: Our photo shoot images numbered in the thousands — not nearly enough to build a robust model.
  • Subjectivity: The model was based on what we thought was good, not what users preferred.
  • Compliance: Security concerns necessitated the discontinuation of the third-party service.

The new model

Rather than simply replacing the old model, we decided to rethink the problem entirely. Our new approach focuses on what users consider ‘good’, aiming for scalability and personalization. The ultimate goal is to show each user the images they are most likely to appreciate.

Breaking down the problem

Images are complex. To build a robust ranking system, we needed to break down what makes an image ‘good.’ This led us to identify multiple dimensions, or questions, that can be asked about an image:

  • Aesthetics: Is the image beautiful?
  • Quality: Is the image clear and unpixelated?
  • Relevance: Does it represent the destination or activity accurately?
  • Context: What time of day is depicted? Are there threats or distractions?

Each dimension can be scored, weighted, and combined to create rankings tailored to different use cases. For example, an image that scores high in aesthetics but low in relevance might work well for marketing but not for activity cards.

Building the foundation of our model

Aesthetics and quality

Aesthetics answer the question, “How beautiful is this image?” While beauty is subjective, our approach focused on translating the larger-scale, quantitative human ratings into numerical scores. Additionally, we believe that certain qualities — such as sharpness, composition, and lighting — are universally valued. We have experimented with various state-of-the-art approaches from the fields of Image Aesthetic Assessment (IAA) and Image Quality Assessment (IQA). 

Our models assign a ‘quality’ and an ‘aesthetic’ score to all our images, enabling better validation of the uploaded images for GetYourGuide, as well as a richer image ranking.

    Fig. I. Sample of low-quality and low-aesthetic images detected by our model
Fig. II. Sample of high-quality and high-aesthetic images detected by our model

Relevance

Relevance measures how well an image represents the destination or activity. For example, a picture of the Eiffel Tower is more relevant for Paris than a generic cityscape. Relevance ensures that travelers see images that align with their expectations. 

We describe the relevance of an image to a location from various aspects, such as the presence of key landmarks or the cultural aspects of that destination. Some of those categories are:

  • Extremely relevant: A significant representation of the location is visible, for example, the Eiffel Tower in Paris or the Colosseum in Rome. Users can name the location without additional research.
  • Marginally relevant: The image captures some aspects of the location, but is not identifiable without knowing the destination very well.
  • Irrelevant: The picture is not taken in that location, or does not represent the destination at all.

Thanks to LLMs, it is simpler than ever to understand the context of an image. For this project, we used OpenAI to achieve this goal.

Working towards personalization

While a good baseline ranking is essential, our long-term vision is personalization. Each user is different, and their preferences should be reflected in the images they see. To achieve this, we’ll first narrow down the rankings based on factors such as destination, region, or marketing channel. For example, users from certain regions might prefer images that reflect their cultural preferences. Then, we’ll eventually use anonymized user data — such as past searches or bookings — to surface images tailored to their unique tastes.

Challenges and experimentation

Building a scalable, personalized image ranking system is no small feat. It requires:

  • Experimentation: Testing different dimensions and weights to find the optimal combination.
  • Data collection: Gathering enough data to train models that reflect user preferences.
  • Continuous improvement: Iterating on the model to adapt to changing user behaviors and preferences.

Conclusion

Replacing intuition with data in visual selection is a complex but consistently rewarding journey. By focusing on aesthetics, quality, and relevance, we’ve built a strong foundation for image rankings. As we move towards personalization, we aim to create a system that surfaces beautiful, relevant images aligned with individual user preferences, actively encouraging trust and conversion. 

By leveraging data, we can ensure that every image sparks interest, tells the right story, and draws in travelers from all over the world.

Ready to pioneer your own path in the travel experience industry? Find out more about our tech community here, and find your dream job on our open roles page.

Quick Tip

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
01/05

Highlights

01/05