- Taboola Blog
- Data Science
We wanted to see if there was a way we could sync our Kubernetes NetworkPolicies dynamically with tools we already use, like Consul and Calico.
Read this article to learn more about what conversions are, how Taboola handle billions of daily events at scale, and how it all presents meaningful data to customers.
Something strange happened while I worked with Kafka. While adding a new consumer from Kafka to one of our services, the service stopped consuming from ALL other existing consumers. As part of my job at Taboola as a team leader on a production team in the Infrastructure group, we’re supposed to remove bottlenecks, not create them. This post will describe how I investigated the issue, explain what I discovered, and share my insights into the whole situation. Some background Before I get into the rest of the story, here’s some background on how we use Kafka at Taboola’s events handling pipeline and why it’s critical to our infrastructure. Taboola’s recommendations appear on tens of thousands of web pages and mobile apps every second. As users engage with the content, multiple events are fired to signal that recommendations are rendered, opened, clicked, and so on. Each event triggers one or more Kafka messages, […]
Find out the secrets to how Taboola deploys and manages the thousands of servers that bring you recommendations every day.
Taboola is one of the largest content recommendation companies in the world. We maintain hundreds of servers in multiple data centers around the world, while obligated to strict SLAs. Thus, you might understand why our engineers would appreciate a little heads up when the system gets overloaded. Like most companies today, we use metrics to visualize our services’ health, and our challenge is to create an automatic system that will detect issues in multiple metrics as soon as possible, without any performance impact. A real life example Wouldn’t it be nice if we could predict the impact on our response time metric when major events are about to happen? For example “Black Friday”, “Cyber Monday” or even the “Kobe Bryant’s tragedy”, on the 26/1/2020? – as can be seen below: Figure 1: Kobe Bryant’s downtime 26/1/20 And yes – the gap with no metrics around the 26/1 is the downtime […]
Taboola is responsible for billions of daily recommendations, and we are doing everything we can to make those recommendations fit each viewer’s personal taste and interests. We do so by updating our Deep-Learning based models, increasing our computational resources, improving our exploration techniques and many more. All those things though, have one thing in common – we need to understand if a change is for the better or not, and we need to do so while allowing many tests to run in parallel. We can think of many KPI’s for new algorithmic modifications – system latency, diversity of recommendations or user-interaction to name a few – but at the end of the day, the one metric that matters most for us in Taboola is RPM (revenue per mill, or revenue per 1,000 recommendations), which indicates how much money and value we create for our customers on both sides – the […]
At Taboola, we work daily on improving our Deep-Learning-based content-recommendation model. We use it to suggest personalized news articles and ads to hundreds of millions users a day, so naturally we must stick to state-of-the-art deep learning modeling methods. But our job doesn’t end there – analyzing our results is a must too, and then we sometimes return to our data science roots and apply some very basic techniques. Let’s lay such a problem out. We are investigating a deep model that behaves rather strangely: it wins over our default model for what looks like a random group of advertisers, and loses for another group. This behavior is stable in the day to day, so it looks like there might be some inherent advertisers qualities (what we’ll call – campaign features) to blame for this. You can see a typical model behavior for 4 campaigns below. So we hypothesize that […]
Introduction Newsrooms are under constant pressure to deliver the most up to date, relevant, and engaging information possible. At Taboola, we are building tools to make this faster, easier, and now–predictable. As soon as an article is published the team has a critical eye on engagement data. Garnering insight on article performance as soon as possible is critical for guiding content strategy. Some articles receive wide attention immediately, drawing hundreds of thousands of page views within minutes, others may only see the first page view after a few hours. Taboola aims to narrow this gap even further by leveraging Machine Learning Models to predict article performance the moment after it becomes available to the reader. Read on for details on our latest research and fascinating discoveries around predicting article performance! Article Data Taboola Newsroom is a real-time optimization technology that empowers editorial teams with actionable data around what stories, headlines, […]
At Taboola, our goal is to predict whether users will click on the ads we present to them. Our models use all kinds of features, yet the most interesting ones tend to be related to the users’ history. Understanding how to use these features well can have a huge impact on the model’s personalization capabilities, due to the user-specific knowledge they hold. User history features vary strongly between different users; for example, one popular feature is user categories – the topics a user had previously read. An example for such a list might look like this – {“sports”, “business”, “news”}. Each value in these lists is categorical and they have multiple entries, so we name them Multi-Categorical features. Multi-Categorical lists can have any number of values per user – which means our model must handle both very long lists and completely empty lists (for new users). Supplying inputs of unknown length […]
About 8 months ago my team and I were facing the challenge of building our first Deep Learning infrastructure. One of my team members (a brilliant data scientist) was working on a prototype for our first deep model. The time arrived to move forward to production. I was honored to lead this effort. Our achievements: we built an infrastructure that ranks over 600K items/sec, our deep models have beaten the previous models by a large margin. This pioneer project has led the way for the subsequent Deep Learning projects at Taboola. So the prototype was ready, and I was wondering: how to go from a messy script to a production ready framework? In other words, if you are into establishing a deep model pipeline this post is for you. This blog post is focused on the training infrastructure, without the inference infrastructure. Prerequisites Assume you have basic knowledge in: Python […]