- Taboola Blog
- Tips and Tricks
We wanted to see if there was a way we could sync our Kubernetes NetworkPolicies dynamically with tools we already use, like Consul and Calico.
Read this article to learn more about what conversions are, how Taboola handle billions of daily events at scale, and how it all presents meaningful data to customers.
Kafka is an open-source distributed event streaming platform and something went wrong while working with it. Let’s see how it was investigated and resolved.
Find out the secrets to how Taboola deploys and manages the thousands of servers that bring you recommendations every day.
During the pandemic, most companies quickly adapted and moved to a work-from-home model, as a sudden necessity of the lockdown restrictions introduced by efforts to combat the spread of COVID-19.
You wrote your code. You even tested it. And now, you are eager to git push it. But how can you verify that it really works? In Taboola, we test our code in production! In this article, you will see how every software engineer, even on the first day in the company, can test in production – all thanks to a dedicated Jenkins pipeline job and lots of metrics. How hard is it to test in production? Quite hard. You probably already knew that. Everybody fears that moment when they need to test changes in production. The main reason is that not everyone has the required IT skills. Moreover, people have to repeat error-prone, manual tasks – which might result in downtime and revenue loss. For our release engineers, it was also an unmanageable headache – a “thundering herd” of developers eager to test their features in production. […]
Taboola is responsible for billions of daily recommendations, and we are doing everything we can to make those recommendations fit each viewer’s personal taste and interests. We do so by updating our Deep-Learning based models, increasing our computational resources, improving our exploration techniques and many more. All those things though, have one thing in common – we need to understand if a change is for the better or not, and we need to do so while allowing many tests to run in parallel. We can think of many KPI’s for new algorithmic modifications – system latency, diversity of recommendations or user-interaction to name a few – but at the end of the day, the one metric that matters most for us in Taboola is RPM (revenue per mill, or revenue per 1,000 recommendations), which indicates how much money and value we create for our customers on both sides – the […]
This post is not about K8S – nor is it about AWS. It is not about containers – nor is it about some new, “cool” technology for managing large-scale applications. Rather, this post is about how we deploy a highly sophisticated Java service, a heavy service that is very actively developed on a daily basis, to 1000s of servers across our 7 data centers around the world. So what’s the problem? Isn’t it enough to take a list of servers, get the version to deploy and run it with an automation tool like ansible? Well, it’s not as simple as it might seem. This service serves Taboola’s recommendations and responds to hundreds of thousands requests per second. The service has to be fast – so fast that its p95 should be below 500 milliseconds per request. Which means we can’t have any downtime at all, or even afford slower […]
Optimizing Spark Executor Utilization: Harnessing Dynamic Allocation and Resource Management for Efficient Workload Processing.
In Taboola, we deal with scale, huge scale. A small issue might turn into a disaster in a matter of hours. Re-writing and replacing an existing service with a new one is a real challenge, moreover doing it without causing downtime is SCARY. Reading logs is not an option. Logs are gigantic, unwieldy and span over many machines. It would take hours to combine and analyze them. In this post I will share with you three graphs in Grafana that I think are a must for observing new code. Let’s start… Did I break production? You write your shiny code, you (even) test it, but, how would you verify that you didn’t break the production environment? Luckily, we use Grafana, and this actually makes a big difference. My plan was to compare old code vs. new in Grafana, but, where to start? You have Grafana… let’s use it! Frankly, I […]