- Taboola Blog
- Java
Read this article to learn more about what conversions are, how Taboola handle billions of daily events at scale, and how it all presents meaningful data to customers.
Something strange happened while I worked with Kafka. While adding a new consumer from Kafka to one of our services, the service stopped consuming from ALL other existing consumers. As part of my job at Taboola as a team leader on a production team in the Infrastructure group, we’re supposed to remove bottlenecks, not create them. This post will describe how I investigated the issue, explain what I discovered, and share my insights into the whole situation. Some background Before I get into the rest of the story, here’s some background on how we use Kafka at Taboola’s events handling pipeline and why it’s critical to our infrastructure. Taboola’s recommendations appear on tens of thousands of web pages and mobile apps every second. As users engage with the content, multiple events are fired to signal that recommendations are rendered, opened, clicked, and so on. Each event triggers one or more Kafka messages, […]
Find out the secrets to how Taboola deploys and manages the thousands of servers that bring you recommendations every day.
In this article you will learn what Samplex is and how it is used to make processing of large raw datasets more efficient.
This post is not about K8S – nor is it about AWS. It is not about containers – nor is it about some new, “cool” technology for managing large-scale applications. Rather, this post is about how we deploy a highly sophisticated Java service, a heavy service that is very actively developed on a daily basis, to 1000s of servers across our 7 data centers around the world. So what’s the problem? Isn’t it enough to take a list of servers, get the version to deploy and run it with an automation tool like ansible? Well, it’s not as simple as it might seem. This service serves Taboola’s recommendations and responds to hundreds of thousands requests per second. The service has to be fast – so fast that its p95 should be below 500 milliseconds per request. Which means we can’t have any downtime at all, or even afford slower […]
The story starts with metrics. Every mature software company needs to have a metric system to monitor resource utilisation. At some point, we noticed under-utilization of spark executors and thier CPUs. Usually, dynamic allocation is used instead of static resource allocation in order to improve CPU utilisation through sharing. In this blog post, we’ll define the problem, share the goals we worked towards and highlight many technical peculiarities regarding dynamic allocation usage along the way. At Taboola, we use Grafana, Prometheus with a Kafka-based pipeline to collect metrics from several data-centers around the world. Metrics at scale is a very interesting topic and involves multiple problems in itself and we have previously covered these in our blog and meetup presentations. Our data platform comprises several services that compute data projections and, importantly, those are long-running processes with long-living spark context. Periodically, when triggered, these services process new chunks of data, […]
Introduction Newsrooms are under constant pressure to deliver the most up to date, relevant, and engaging information possible. At Taboola, we are building tools to make this faster, easier, and now–predictable. As soon as an article is published the team has a critical eye on engagement data. Garnering insight on article performance as soon as possible is critical for guiding content strategy. Some articles receive wide attention immediately, drawing hundreds of thousands of page views within minutes, others may only see the first page view after a few hours. Taboola aims to narrow this gap even further by leveraging Machine Learning Models to predict article performance the moment after it becomes available to the reader. Read on for details on our latest research and fascinating discoveries around predicting article performance! Article Data Taboola Newsroom is a real-time optimization technology that empowers editorial teams with actionable data around what stories, headlines, […]
As a content discovery product, we need to be able to pace the campaign through its life on real time – spending the budget entirely without overspending. The team I am leading is responsible for serving Taboola’s video content. Our main goal is to enable growth of our business. Owning the entire serving process of the video content can be crucial to this end. Depending on 3rd parties serving systems with the core process would leave us vulnerable to rising prices, compromised features, serving latency, reduced performance and so on. In order to serve the video on our own we had to come up with a way to pace the amount of times we want to display the video and prevent instances in which the entire budget of the video would drain in a few seconds – common scenario in Taboola’s scale when the video is not targeted aggressively. We […]
A couple of months ago my team had its first experience working with Java fibers, we needed to make our main application work asynchronously. In this 3 part series, I will share my team’s experience and how we deploy and implement Java fibers in production. In Part 1 we talked about what fibers are in high level, how they compare to threads and why we started to explore them. In Part 2 we went further in-depth about how fibers differ from threads, how to create fibers, how to work with them and the basic concepts of how they work. In this part, we’ll discuss what’s going on under the hood in fibers and deep dive into the implementation of how fibers work and what lessons we learnt during our journey working with them. We will also see how this magic happens… Under the hood Fibers are implemented by instrumenting […]
A couple of months ago my team had its first experience working with Java fibers, we needed to make our main application work asynchronously. In this 3 part series, I will share my team’s experience and how we deploy and implement Java fibers in production. In the previous part (Part 1), we talked about what fibers are in high level, how they compare to threads and why we started to explore them. In this part we’ll focus further in-depth about fibers and how they differ from threads, we’ll see how to create fibers, how to work with them, and the basic concepts of how they work. Threads vs. Fibers We searched for a reason why not to stay with threads. We researched the costs and performance penalties of working with threads vs. fibers. We wanted to find proof that fibers can work better than threads, or at least shine in […]