- Taboola Blog
- System
Find out the secrets to how Taboola deploys and manages the thousands of servers that bring you recommendations every day.
You wrote your code. You even tested it. And now, you are eager to git push it. But how can you verify that it really works? In Taboola, we test our code in production! In this article, you will see how every software engineer, even on the first day in the company, can test in production – all thanks to a dedicated Jenkins pipeline job and lots of metrics. How hard is it to test in production? Quite hard. You probably already knew that. Everybody fears that moment when they need to test changes in production. The main reason is that not everyone has the required IT skills. Moreover, people have to repeat error-prone, manual tasks – which might result in downtime and revenue loss. For our release engineers, it was also an unmanageable headache – a “thundering herd” of developers eager to test their features in production. […]
By Ariel Pisetzky and Tarek Shama Taken for Granted Taken for granted. That’s the way most users and even techies think of DNS. Or more precisely, they just don’t think of it at all. DNS is one of those things that for most users is a solved problem. You have a server with very reliable and stable software that can run for a very long time with little maintenance. The resolvers even have a nice built in failover mechanism for a secondary server. So, what more is there to say about this subject? Performance. Performance with DNS services has been seen as a geographical issue for years now. Yes, there have been paid DNS services that are faster at the DNS search level itself, especially if you are talking about complex records that have logic attached to them. Yet, the popular discussion is mostly around the global DNS providers. In […]
If you fall, fall right – a tale of SRE critical incident management By Yehuda Levi, Tal Valani, Ariel Pisetzky & Eli Azulai Imagine this scenario – your data center is down. 1500 servers are down. Each server needs to be handled and monitored and the responsibility for each should be divided between all teammates. Each team is looking after the status of their services. Client facing services are impacted. New information keeps flowing in from different channels and the status of the outage and servers keep changing. How to get the list of the server affected? How to put it all in one place? How to assign responsibility for each? What is the status of each server? How can the internal clients receive ongoing status updates? What happens if a server was intentionally down before the incident? What happens if a more complex issue occurs and the time to […]
Failure. I need to talk about failure, and not any failure, my failure. I need to share it with everyone in the production group, everyone in R&D. My team, my peers, my managers. The meeting will start in just a few minutes and I am under fire to explain what went wrong, how I failed the organization and how we need to be better. General George S. Patton Jr. said “The test of success is not what you do when you are on top. Success is how high you bounce when you hit the bottom.” There is a lot to learn from that saying, and not only for people. Successful systems need to bounce back from a failure and do it well. IT systems need to be able to endure a catastrophic event and just dust it off. This is what we expect of our production systems in Taboola, […]
So, the firefighters are in your data center, there is no electricity, and the pager is more like a DDoS attack on your phone than anything informative. You look at your watch, multiple thoughts running through your head. Why me? Why now? What was the last DR test result? How do you pull the team out and through this IT catastrophe and survive to write about it? This is my story, my personal fight with the IT “Murphy laws” and how we can all benefit from it. It was a Friday, one you know you need to be extra careful with. It’s always the end of the work week or smack in the middle of the night. (No IT catastrophe ever happens when it’s convenient to you, now does it? They always cluster and bunch around the most difficult times.) Anyway, it’s the end of the day Friday and multiple […]
Optimized MySQL slave replication for faster WAN connections. Explore key strategies and configurations for enhanced MySQL replication performance.
Discover the journey of creating synchronized analog clocks using microcontrollers. Learn about power-efficient design, NTP synchronization, and more.
As a content discovery product, we need to be able to pace the campaign through its life on real time – spending the budget entirely without overspending. The team I am leading is responsible for serving Taboola’s video content. Our main goal is to enable growth of our business. Owning the entire serving process of the video content can be crucial to this end. Depending on 3rd parties serving systems with the core process would leave us vulnerable to rising prices, compromised features, serving latency, reduced performance and so on. In order to serve the video on our own we had to come up with a way to pace the amount of times we want to display the video and prevent instances in which the entire budget of the video would drain in a few seconds – common scenario in Taboola’s scale when the video is not targeted aggressively. We […]
What is the connection between kernel system calls and database performance, and how can we improve performance by reducing the number of system calls? Performance of any database system depends on four main system resources: CPU Memory Disk I/O Network Performance will increase while tuning or scaling each resource – this blog will cover the CPU resource.It’s important to note that whenever we release a bottleneck in the system, we might just encounter another one. For example, when improving CPU performance the database load shifts to IO, so unless our storage is capable of delivering more IOPS, we might not actually see the improvement we hoped for. But don’t be discouraged, performance tuning is sometimes a game of whack-a-mole… We all know that the more processing power available for your server, the better the overall system is likely to perform. Especially when the CPU spends the majority of its time […]