- Taboola Blog
- Machine Learning
Read this article to learn more about what conversions are, how Taboola handle billions of daily events at scale, and how it all presents meaningful data to customers.
Something strange happened while I worked with Kafka. While adding a new consumer from Kafka to one of our services, the service stopped consuming from ALL other existing consumers. As part of my job at Taboola as a team leader on a production team in the Infrastructure group, we’re supposed to remove bottlenecks, not create them. This post will describe how I investigated the issue, explain what I discovered, and share my insights into the whole situation. Some background Before I get into the rest of the story, here’s some background on how we use Kafka at Taboola’s events handling pipeline and why it’s critical to our infrastructure. Taboola’s recommendations appear on tens of thousands of web pages and mobile apps every second. As users engage with the content, multiple events are fired to signal that recommendations are rendered, opened, clicked, and so on. Each event triggers one or more Kafka messages, […]
Find out the secrets to how Taboola deploys and manages the thousands of servers that bring you recommendations every day.
Taboola is one of the largest content recommendation companies in the world. We maintain hundreds of servers in multiple data centers around the world, while obligated to strict SLAs. Thus, you might understand why our engineers would appreciate a little heads up when the system gets overloaded. Like most companies today, we use metrics to visualize our services’ health, and our challenge is to create an automatic system that will detect issues in multiple metrics as soon as possible, without any performance impact. A real life example Wouldn’t it be nice if we could predict the impact on our response time metric when major events are about to happen? For example “Black Friday”, “Cyber Monday” or even the “Kobe Bryant’s tragedy”, on the 26/1/2020? – as can be seen below: Figure 1: Kobe Bryant’s downtime 26/1/20 And yes – the gap with no metrics around the 26/1 is the downtime […]
At Taboola, we work daily on improving our Deep-Learning-based content-recommendation model. We use it to suggest personalized news articles and ads to hundreds of millions users a day, so naturally we must stick to state-of-the-art deep learning modeling methods. But our job doesn’t end there – analyzing our results is a must too, and then we sometimes return to our data science roots and apply some very basic techniques. Let’s lay such a problem out. We are investigating a deep model that behaves rather strangely: it wins over our default model for what looks like a random group of advertisers, and loses for another group. This behavior is stable in the day to day, so it looks like there might be some inherent advertisers qualities (what we’ll call – campaign features) to blame for this. You can see a typical model behavior for 4 campaigns below. So we hypothesize that […]
Our core business at Taboola is to provide the surfers-of-the-web with personalized content recommendations wherever they might surf. We do so using state of the art Deep Learning methods, which learn what to display to each user from our growing pool of articles and advertisements. But as we challenge ourselves manifesting better models and better predictions, we also find ourselves constantly facing another issue – how do we not listen to our models. Or in other words: how do we explore better? As I’ve just mentioned, our pool of articles is growing, meaning more and more items are added each minute – and from an AI perspective, this is a major issue we must tackle, because by the time we finish training a new model and push it to production, it will already have to deal with items that never existed in its training data. In a previous post, I’ve […]
Introduction Newsrooms are under constant pressure to deliver the most up to date, relevant, and engaging information possible. At Taboola, we are building tools to make this faster, easier, and now–predictable. As soon as an article is published the team has a critical eye on engagement data. Garnering insight on article performance as soon as possible is critical for guiding content strategy. Some articles receive wide attention immediately, drawing hundreds of thousands of page views within minutes, others may only see the first page view after a few hours. Taboola aims to narrow this gap even further by leveraging Machine Learning Models to predict article performance the moment after it becomes available to the reader. Read on for details on our latest research and fascinating discoveries around predicting article performance! Article Data Taboola Newsroom is a real-time optimization technology that empowers editorial teams with actionable data around what stories, headlines, […]
At Taboola, our goal is to predict whether users will click on the ads we present to them. Our models use all kinds of features, yet the most interesting ones tend to be related to the users’ history. Understanding how to use these features well can have a huge impact on the model’s personalization capabilities, due to the user-specific knowledge they hold. User history features vary strongly between different users; for example, one popular feature is user categories – the topics a user had previously read. An example for such a list might look like this – {“sports”, “business”, “news”}. Each value in these lists is categorical and they have multiple entries, so we name them Multi-Categorical features. Multi-Categorical lists can have any number of values per user – which means our model must handle both very long lists and completely empty lists (for new users). Supplying inputs of unknown length […]
For the past year, my team and I have been working on a personalized user experience in the Taboola feed. We used Multi-Task Learning (MTL) to predict multiple Key Performance Indicators (KPIs) on the same set of input features, and implemented a Deep Learning (DL) model in TensorFlow to do so. Back when we started, MTL seemed way more complicated to us than it does now, so I wanted to share some of the lessons learned. There are already quite a few posts about implementing MTL in a DL model (1, 2, 3). In this post I will share some specific points to consider when implementing MTL in a Neural Network (NN). I will also present simple TensorFlow solutions to overcome the discussed issues. Sharing is caring We wanted to start with the basic approach of hard parameter sharing. Hard sharing means we have a shared subnet, followed by […]
A joint post with Ofri Mann We went to ICLR to present our work on debugging ML models using uncertainty and attention. Between cocktail parties and jazz shows in the wonderful New Orleans (can we do all conferences in NOLA please?) we also saw a lot of interesting talks and posters. Below are our main takeaways from the conference. Main themes A good summary of the themes was in Ian Goodfellow’s talk, in which he said that until around 2013 the ML community was focused on making ML work. Now that it’s working on many different applications given enough data, the focus has shifted towards adding more capabilities to our models: we want them to comply to some fairness, accountability and transparency constraints, to be robust, use labels efficiently, adapt to different domains and so on. A slide on ML topics, from Ian Goodfellow’s talk We noticed a […]