Introduction

Taboola is the world’s leading native advertising platform, serving thousands of websites with a diverse array of products, including the Taboola Feed, mid-article ads, and homepage personalization, generating hundreds of millions of impressions daily. Despite our relentless efforts to optimize performance through sophisticated features, advanced models, and robust infrastructure, navigating such intricate ecosystems inevitably unveils areas of sub-optimization.

While some widgets exclusively feature ads, most exhibit a blend of organic items and advertisements (some also contain eCommerce and paywall content), as mixing organic content helps greatly in avoiding ad blindness and generating page views.

Taboola currently uses static slot allocation for organic content and ads, which fails to adapt to user preferences and inventory opportunities. In this blog post, I’ll delve into our solution designed to address this challenge, outlining the hurdles encountered and the adjustments made to overcome them.

Current Setup

At the core of our strategy lie a nuanced understanding of our publishers, where we deliver two main values:

  1. Direct monetization through ad clicks.
  2. Increased page views (PVs) fueled by organic clicks.

Direct revenue through ad clicks is straightforward to evaluate for both Taboola and the publishers, while organic clicks generate substantial revenue for the publisher through alternative monetization methods. However, this does not translate similarly for Taboola.

The balance between these two values (Revenue/PVs) is a unique attribute tailored for each publisher. Ideally, it should correlate with the ratio between the revenue accrued from Taboola and other revenue streams. For instance, if a publisher derives much greater value per PV from other monetization forms, such as subscriptions, than from a Taboola ad click, the inclination would be to showcase more organic content and minimize ad slots to get more subscribers. Thus, organic content would be prioritized over ads in Taboola’s widgets.

Figure 1 – Current Schema

For sponsored items, we have two types of models: one for predicting the eCTR (estimated click-through rate) and another for predicting the eCVR (estimated conversion rate) and setting the ad’s CPC (cost per click). We calculate eCTR * CPC = eCPM for each item and rank them accordingly. For organic items, we use only one model to predict the eCTR and use its output to rank them.

Unified List Solution for Ad Space Mixing

Two main principles guide the solution:

  1. PVs derived from clicks on Taboola’s organic recommendations generate value for the publisher and thus for Taboola too, so we should be able to put a price on that value.
  2. In order to find the optimal allocation for our ads and organic items, we must rank them together.

Putting these two principles together, we came up with the idea of using an eCPC for organic items (organic eCPC) and using this organic eCPC for calculating the organic eCPM (estimated cost-per-mille), allowing us to rank both organic items and ads at once.

But how can we determine this organic eCPC?

Implementation

Our solution includes three steps:

  1. Determining the organic eCPC for each publisher.
  2. Estimating eCPM for both organic items and ads.
  3. Ranking the items accordingly.

Organic eCPC

The main challenge revolves around determining the right organic eCPC to meet our specified goals, whether it’s achieving targeted revenue per day or maintaining a specific ratio between revenue and engagement.

As a preliminary step, we decided to simulate the ranking we would have achieved using a single organic eCPC for all the organic items on one of our international websites. Upon analyzing the results, we found that even though the share of organic and sponsored items seemed reasonable, cohorts of users were receiving only organic items. This wasn’t initially alarming, but we thought it wise to check if these users had something in common. To our surprise, they did—the vast majority of these users were from third-world countries.

This, of course, makes sense, as most campaigns set up in Taboola had location-based targeting. Users from different countries had different inventories of ads to choose from, and the items available for the third-world users had mostly low CPCs, so our organic items were beating them almost every time.

To solve this problem, we simply calculated the average eCPM for each country and normalized our eCPC accordingly, allowing us to set the eCPC once for each country.

Determining the Initial Organic eCPC

As mentioned above, to gauge the potential impact of our solution, I conducted an analysis using actual requests from different placements in several major publishers. We checked what the DCG (discounted cumulative gain) would be for both revenue and organic clicks for different shadow bids, ending up with a Pareto frontier depicting the ratio of revenue to page views generated. This surpassed our current static state by hundreds of percent for both organic clicks and revenue.

But how can we find the optimal organic eCPC for this website? Unfortunately, there is no definitive answer (unless you know what the average RPM is from all income sources for the website after a click on an organic item). In most cases, websites choose to withhold this information from Taboola.

We had to use arbitrary proxy goals, whether it was aiming for the current revenue/PVs ratio, maintaining the same number of page views generated as in the static setting while maximizing revenue, etc. Regardless of the goal, we used the eCPC that got us this value in the Pareto analysis as an estimate to build upon.

Figure 2 – Offline Analysis Using NDCG

Controller

The open web is an ever-changing environment, and one aspect of it is that the average CPM isn’t static either. This means we must continuously update our organic eCPC for each publisher. To do so, we implemented a simple “controller,” updating the eCPC once a day based on a single week’s data, adjusting it according to the optimization goal. For example, if we aim for a specific number of clicks per user and are exceeding it by 5%, our calculation would be (1 – 0.05*learning_rate) * current organic eCPC.

In most cases, using the initial organic eCPC from the offline analysis, it takes a week or two to match our optimization goal.

Calibration Challenges

Another significant challenge we encountered was calibration. Ensuring the calibration of both our organic and ad models is paramount. Each placement we support must have accurately calibrated predictions. For instance, if our models predict a 10% chance of an item being clicked, this prediction should align closely with real-world outcomes for both organic and sponsored items. If it doesn’t, we will end up with suboptimal rankings.

To address this challenge, we employed two key strategies. Firstly, we revamped the way we stratify the data for our model, transitioning from per-publisher stratification to per-widget stratification, ensuring our model can make more accurate corrections. Secondly, we implemented isotonic regression [1] techniques to ensure alignment between predicted probabilities and observed outcomes for all our models.

Figure 3 – Suggested Schema

Online Results

Measuring Impact Online

Traditional metrics like RPM and CTR are inadequate in our case, as they pit engagement against revenue. Instead, we focus on clicks and revenue per user. Moreover, filtering out super users and bots eliminates noise from the data. We evaluate the effect on both affected widgets and entire page views to gain comprehensive insights.

Initial findings indicate promising lifts of 8.7% in RC clicks per user and 10.5% in revenue per user. Additionally, there’s a discernible decrease in ad density, reducing the number of sponsored items by 18%, signaling a more balanced and optimized ad space.

Conclusion

In conclusion, our unified approach not only enhances revenue potential but also fosters a more tailored and engaging user experience, reinforcing Taboola’s position as a leader in native advertising optimization.

Key Takeaways

  • The unified ranking approach performs very well; it optimizes ad space allocation successfully, yielding significant lifts in both engagement and revenue.
  • Calibration is crucial; we need to account for the numerous different widgets on different sites with different layouts, as well as users from different countries with different average CPCs and ad inventories.
  • Choosing the right metrics is essential: Focusing on metrics like RC clicks per user and revenue per user allows for accurate evaluations.

Work to be Done

  • Address weekly seasonality in organic eCPC: Users behave differently on weekends, which changes the average CPM.
  • Implement the LinkedIn Gap Effect [2].
  • Analyze long-term effects.
  • Apply this approach to different types of content, such as eCommerce and paywall pieces.

Supplementary Information

Offline Analysis

Simply put, in the offline analysis, I used our data records to simulate what the ranking would have been given the deeper and smarter bid predictions, and summed up the estimated amounts of revenue and organic clicks (eCTR * oCPC for ranking).

There are two caveats that might differentiate the analysis from reality:

  1. I used the DCG method of dividing the estimate by log(1+slot index); this is not accurate, and it would have been best to use an accurate calibration factor for each slot.
  2. Some items that should have been blocked might have been included. I removed the items that were already marked as blocked, but this might not be enough.

I’ll add that before running the analysis, I checked the calibration for both organic and sponsored CTR predictions. They both seemed very accurate (post-multiplication), with about a 3% error on average, allowing us to trust the cumulative metrics.

References

[1] Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017, July). On calibration of modern neural networks. In International Conference on Machine Learning (pp. 1321-1330). PMLR.

[2] Yan, J., Xu, Z., Tiwana, B., & Chatterjee, S. (2020, August). Ads allocation in feed via constrained optimization. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 3386-3394).

Originally Published:

Start Your Taboola Career Today!