Real-Time Bidding in Recommendation Systems with Contextual-Bandit Framework

In an article recently posted to the Meta Research website, researchers introduced a contextual-bandit (CB) framework for optimizing real-time bidding mechanisms in internet applications. It aimed to enhance user engagement by jointly optimizing bid prices and item rankings in recommendation systems. Using reinforcement learning with deep neural networks, the proposed method achieved significant improvements in user interactions, as demonstrated in online experiments on Facebook services.

Study: Real-Time Bidding in Recommendation Systems with Contextual-Bandit Framework. Image credit: Apichatn21/Shutterstock
Study: Real-Time Bidding in Recommendation Systems with Contextual-Bandit Framework. Image credit: Apichatn21/Shutterstock

Background

In the realm of modern recommendation systems, providing users with a diverse and personalized selection of content has become pivotal for enhancing user experiences and preventing content fatigue. Noteworthy platforms like Pinterest and Facebook have adopted real-time bidding (RTB) mechanisms to optimize content presentation during auction sessions.
Previous studies in RTB, particularly in advertising and sponsored search domains, have extensively explored utility prediction, bid price optimization, and budget allocation. However, the specific problem of bid price optimization over multiple candidates with the added requirement of ranking has been underexplored.

Traditional approaches often involve human-designed value formulas with adjustable coefficients, limiting their ability to represent complex environments effectively. While existing research in RTB has primarily focused on scenarios where each service bids for presenting a single content candidate, this paper addressed a more generalized challenge – the joint decision-making of bid prices and ranking for multiple content pieces in a slot.

The researchers introduced a novel framework named bidding and ranking together (BART) that leveraged CB algorithms for learning optimal policies in RTB scenarios. The innovative approach reduced the need for manual parameter tuning and allowed for the derivation of sophisticated policies from sub-optimal demonstrations. By applying the proposed algorithm to major services in the home feed of Facebook, the authors demonstrated superior performance over hand-tuned baselines in online experiments, showcasing its potential impact on enhancing user engagement and experience. It filled a critical gap in the literature by addressing bid price optimization and ranking in a more generalized setting, contributing to the advancement of recommendation systems.

Model Formulation

In the context of recommendation systems, the BART problem was addressed by formulating it as a CB setup. Each user session triggered an auction session where services bid for the opportunity to present content. The service selected a set of candidates, and their features, including utility predictions, form the contextual state. The goal was to learn a policy determining the scores for each candidate, influencing both bid prices and rank orders.

The bid price function was defined as a weighted sum of sorted scores, proportional to the empirical conversion rate. If the service's bid was the highest, top-ranked items were sequentially shown to the user, deducting the bid price from the budget. The CB setup involved states (user and candidate information), actions (candidate scores), and rewards, which included bid costs and potential benefits from user interactions with displayed items. The reward function considered bid losses, bid costs, and engagement metrics, aiming to optimize bidding and ranking strategies. This approach provided a comprehensive framework for learning effective policies in real-time recommendation scenarios.

Policy Optimization

The authors introduced the policy optimization process for BART, detailing the top-K Gaussian policy formulation. The policy defined the probability distribution of actions based on the state. To address the issue of irrelevant randomness, a top-K Gaussian policy was proposed, focusing on the top candidates contributing to bid prices and user engagement. The batch learning objective was to maximize the expected reward, optimizing parameters via offline training. The training objective accounted for variance issues, employing the Top-K Gaussian policy to enhance stability.

The researchers also presented a reward-shaping algorithm to determine hyperparameters in the reward function involving the bid loss and engagement reward. The algorithm involved inferring these parameters through a simpler policy tuned in online experiments. This approach simplified the computation compared to traditional inverse reinforcement learning methods, providing accurate reward settings for effective policy optimization in the BART framework.

Experimental evaluation

The experiments evaluated the BART method on two Facebook home feed services: "Groups you should join" (GYSJ) and "Friend requests" (FR). BART competed for the same content slot, displaying the top 20 items to users. In GYSJ, the existing linear formula combined the expected click-through rate (eCTR) and post-click conversion rate (eCVR) to maximize user engagement, while FR aimed to encourage users to accept friend requests based on a probability model. The experiments involved a 22-day and 30-day experiment for GYSJ and FR, respectively.

For GYSJ, the BART policy outperformed the hand-tuned value formula, increasing engagement metrics by 0.44%, with a 9.8% rise in impressions and a 14.7% increase in group joins. The BART policy adjusted bid prices more aggressively than the value formula. In FR, the BART model increased accepted friend requests by 7.0%, whereas the logging policy showed an 11.3% drop. Both services experienced improvements in sessions and viewed friend requests.

The BART models were deployed into Facebook production, yielding statistically significant improvements in daily and monthly active users. The backtest results aligned with pretests, confirming the efficacy of BART in enhancing user engagement metrics across the evaluated services. The approach's ability to combine bidding and ranking strategies proved beneficial in different service contexts.

Conclusion

In conclusion, the authors framed the BART problem in a free-market recommendation system as a CB. Using top-K Gaussian policies and a lightweight reward-shaping algorithm, they removed noise in offline stochastic gradients. Their approach, validated in online experiments on Facebook services, significantly improved top-line user engagement metrics. Future work aims to enhance policy uncertainty understanding, exploring solutions for joint optimization across multiple services.

Journal reference:
Soham Nandi

Written by

Soham Nandi

Soham Nandi is a technical writer based in Memari, India. His academic background is in Computer Science Engineering, specializing in Artificial Intelligence and Machine learning. He has extensive experience in Data Analytics, Machine Learning, and Python. He has worked on group projects that required the implementation of Computer Vision, Image Classification, and App Development.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Nandi, Soham. (2024, January 17). Real-Time Bidding in Recommendation Systems with Contextual-Bandit Framework. AZoAi. Retrieved on January 15, 2025 from https://www.azoai.com/news/20240117/Real-Time-Bidding-in-Recommendation-Systems-with-Contextual-Bandit-Framework.aspx.

  • MLA

    Nandi, Soham. "Real-Time Bidding in Recommendation Systems with Contextual-Bandit Framework". AZoAi. 15 January 2025. <https://www.azoai.com/news/20240117/Real-Time-Bidding-in-Recommendation-Systems-with-Contextual-Bandit-Framework.aspx>.

  • Chicago

    Nandi, Soham. "Real-Time Bidding in Recommendation Systems with Contextual-Bandit Framework". AZoAi. https://www.azoai.com/news/20240117/Real-Time-Bidding-in-Recommendation-Systems-with-Contextual-Bandit-Framework.aspx. (accessed January 15, 2025).

  • Harvard

    Nandi, Soham. 2024. Real-Time Bidding in Recommendation Systems with Contextual-Bandit Framework. AZoAi, viewed 15 January 2025, https://www.azoai.com/news/20240117/Real-Time-Bidding-in-Recommendation-Systems-with-Contextual-Bandit-Framework.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Deep Reinforcement Learning Boosts Robotic Manipulation