Revolutionizing Food Delivery: The Role of AI in Image Generation at Delivery Hero

Table of Contents

  1. Key Highlights:
  2. Introduction
  3. The Importance of Image Quality in Food Delivery
  4. Building the Minimum Viable Product (MVP)
  5. The Functionality of Stable Diffusion
  6. Scaling the System for High Volume Production
  7. Quality Control and Evaluation
  8. Cost Efficiency and Model Optimization
  9. Challenges in Fine-Tuning for Local Dishes
  10. AI Safety System Implementation
  11. Quantifiable Results and Impact

Key Highlights:

  • Delivery Hero has enhanced its food delivery platform by implementing generative AI for image generation, leading to improved customer engagement and conversion rates.
  • The company utilized Stable Diffusion and a range of AI models to create high-quality, appealing images for menu items, resulting in a notable 6-8% increase in conversion rates from menu views to cart additions.
  • Strategic challenges such as cross-cloud integration and optimization of computational costs were addressed, demonstrating a comprehensive approach to deploying AI-driven solutions.

Introduction

In an era where digital consumer experiences heavily influence purchasing decisions, the visual appeal of a product is paramount. Delivery Hero, a global leader in food delivery, has turned to artificial intelligence (AI) not just to optimize delivery logistics but to enhance the aesthetic representation of its offerings. By integrating generative AI for image creation, the company aims to improve user engagement and ultimately, sales conversion rates. This article delves into the intricate processes behind Delivery Hero’s AI-driven image generation project, exploring its objectives, methodologies, challenges, and measurable impacts.

The Importance of Image Quality in Food Delivery

The foundation of this ambitious AI project is the fundamental premise that high-quality menu content enhances conversion rates. Initial data analyses revealed a gap: many menu items lacked visual representations, with only 14% of products sold without images. This insight signaled a clear opportunity: to leverage generative AI to create compelling visuals that could help in attracting customer interest and converting visits into sales.

The hypothesis was straightforward—if appealing images increase sales, then a systematic approach to generating images via AI could have a tangible impact. This guided the decision to develop a generative AI project aimed specifically at image creation, laying the groundwork for significant financial growth.

Building the Minimum Viable Product (MVP)

The development process began with designing a Minimum Viable Product (MVP) using Google Cloud Platform (GCP). The technology stack included Cloud Run for backend processing, PostgreSQL for database management, Google Cloud Pub/Sub for messaging, and Google Cloud Storage (GCS) for storing generated images.

A standout feature of the MVP was the Vertex AI Pipelines, which facilitated the creation of streamlined data processing workflows. These pipelines were essential in orchestrating machine learning tasks and ensuring efficient model training and inference. Leveraging OpenAI’s DALL·E model at the outset, the team established a robust pipeline for extracting data from the warehouse, executing AI-driven image generation, and delivering results to local content teams for quality assurance.

The Functionality of Stable Diffusion

At the heart of Delivery Hero’s image generation capabilities lies the Stable Diffusion model, which operates through a series of well-defined processes. This architectural design encompasses several key components:

  1. Variational AutoEncoder (VAE): Allows for the conversion of images into a latent vector space.
  2. U-Net: Serves as a denoising function, removing added noise from generated images.
  3. OpenAI’s CLIP: Maps text and images into the same latent space for alignment.

The model operates through a two-stage diffusion process: the forward pass, which applies Gaussian noise to the input image, and the backward pass, where U-Net denoises the image. This separation of functions enables effective generation of new images based on user input, with additional conditioning on textual descriptions to enhance relevance.

Overcoming Language Barriers in Generated Content

Despite its multilingual training capabilities, the initial implementation of CLIP encountered challenges with non-Latin languages, notably in generating product descriptions in Mandarin and Arabic. The unexpected issues prompted a practical solution—translating all non-Latin product names into English via Google API to improve accuracy in image representations. This swift adaptation illustrated the agile nature required in AI implementation, balancing capabilities and operational demands.

Scaling the System for High Volume Production

As the MVP demonstrated promise, the next challenge was system scalability, leading to a strategy that involved migrating parts of the infrastructure to AWS cloud services. This decision, however, presented unforeseen complications with cross-cloud integrations that hindered progress.

Nonetheless, the final architecture achieved served several production goals effectively. Continuous Integration/Continuous Deployment (CI/CD) processes were refined using GitHub Actions to manage and route image generation requests efficiently, leveraging separate models hosted on various cloud infrastructures. This configuration supported an impressive output of up to 100,000 AI-generated images daily.

Quality Control and Evaluation

Quality assurance was critical as Delivery Hero rolled out images for use. With no definitive ground truth against which to gauge generative outcomes, a creative benchmarking framework was required. Product quality guidelines defined the standards for image composition, positioning, and aesthetic attributes.

Using a dataset of 1,000 product descriptions, images generated by the models were assessed against these criteria to yield a quality score. This innovative approach provided quantifiable metrics to track improvements through different iterations of the model, ensuring that only the highest quality images were selected for production.

Cost Efficiency and Model Optimization

AI model optimization aimed to enhance both speed and cost-effectiveness, especially as production scaled. Various GPU configurations were tested, with the L4 GPU emerging as the most cost-efficient option, yielding a reduction in image generation costs to just 1.6 cents per image—substantially lower than previous expenses incurred by utilizing external services.

Strategies employed for optimizing processing time included reducing precision from float 32 to float 16, replacing the vanilla VAE with a lightweight version, and tuning parameters such as the number of diffusion steps. These adjustments collectively led to an 85% decrease in image generation time, offering significant financial savings as output volumes soared.

Challenges in Fine-Tuning for Local Dishes

Diverse menu offerings presented another realm of complexity. Regional specialties, such as exotic dishes like stuffed pigeon or ant egg salad, highlighted gaps in the AI’s training data. Fine-tuning models to accurately replicate these local dishes posed the dilemma of managing thousands of variations—a task that was impractical for LoRA fine-tuning.

Instead, leveraging full model fine-tuning in a controlled environment became the method of choice. By utilizing OneTrainer, Delivery Hero could strategically implement necessary adjustments to improve performance in generating localized dishes while maintaining the quality of well-established items.

AI Safety System Implementation

With enhancements in image generation came the responsibility of ensuring content safety. Instances of poorly generated images could jeopardize customer perceptions and brand integrity. Thus, an AI Safety System was established featuring multiple detection mechanisms to identify problematic outputs, including object detection for food items, text recognition for malformed or nonsensical titles, and evaluation of image quality regarding color consistency and contrast.

This proactive measure minimized the occurrence of flawed images making it to the final selection pool, further refining the quality assurance process and reinforcing trust in the AI-driven offering.

Quantifiable Results and Impact

The culmination of these efforts led to the generation of over one million images and coverage of more than 100,000 menu items. An analysis of the impact revealed a compelling 6-8% increase in conversion rates following the addition of images to menu items, validating the initial hypothesis and demonstrating the effectiveness of the generative AI project.

Lessons Learned

Several key insights have emerged from this transformative journey:

  1. Avoiding Cross-Cloud Complexities: Integrating multiple cloud services introduced challenges that could be circumvented with a more straightforward infrastructure.
  2. Return on Optimization Investments: Dedicating time to model refinement and process optimization yielded substantial cost savings and operational efficiencies.
  3. Automated Quality Measurement Practices: Establishing automated systems for evaluating output contributed significantly to maintaining high standards and facilitating informed development of the generative model.

FAQ

What types of images can be generated with AI?

AI can generate a wide variety of images, including product images for food delivery, artistic renderings, and images based on textual descriptions across various industries.

How does generative AI affect conversion rates in e-commerce?

Improving the quality of product images through generative AI can significantly enhance consumer interest, leading to higher conversion rates as customers are more likely to purchase products they find visually appealing.

What are the primary challenges of using AI for image generation?

Challenges include ensuring image quality, addressing language and cultural nuances in product representations, optimizing system scalability and computational costs, and implementing safeguards against poor-quality outputs.

How does Delivery Hero ensure the quality of AI-generated images?

The company employs a robust quality evaluation framework, utilizing predefined product guidelines to assess the images produced by the AI, ensuring compliance with aesthetic and content standards.

Is the generated content legally compliant?

Delivery Hero marks AI-generated images as such to comply with regulations in various jurisdictions, signifying that they are for reference purposes, which helps mitigate legal risks surrounding customer expectations.