- Taboola Blog
- Web development
Rbox, our recommendation product, is a 3rd party service embedded in publisher sites
This post is not about K8S – nor is it about AWS. It is not about containers – nor is it about some new, “cool” technology for managing large-scale applications. Rather, this post is about how we deploy a highly sophisticated Java service, a heavy service that is very actively developed on a daily basis, to 1000s of servers across our 7 data centers around the world. So what’s the problem? Isn’t it enough to take a list of servers, get the version to deploy and run it with an automation tool like ansible? Well, it’s not as simple as it might seem. This service serves Taboola’s recommendations and responds to hundreds of thousands requests per second. The service has to be fast – so fast that its p95 should be below 500 milliseconds per request. Which means we can’t have any downtime at all, or even afford slower […]
Writing features as added chunks into an ever growing one bulk of code is unorganized and messy. Overtime, the tasks of testing new behaviors becomes harder and harder. Why is that? Chunky Code is hard to Navigate When working with developers on a new feature, we have to identify what we need to test. As the code grows, the challenge of finding the parts that were modified and should be tested is increasing. If a feature becomes problematic or irrelevant, reverting it becomes more difficult, since we need to go back to every line of code we changed. This way we endanger production environment and are affecting our end-users. Last but not least, it is irritating to impossible to manage when dealing with “legacy” code, that needs to be reverse engineered to find how to control. Coming to compare a buggy behavior with its intended fix or testing a new […]
What is the connection between kernel system calls and database performance, and how can we improve performance by reducing the number of system calls? Performance of any database system depends on four main system resources: CPU Memory Disk I/O Network Performance will increase while tuning or scaling each resource – this blog will cover the CPU resource.It’s important to note that whenever we release a bottleneck in the system, we might just encounter another one. For example, when improving CPU performance the database load shifts to IO, so unless our storage is capable of delivering more IOPS, we might not actually see the improvement we hoped for. But don’t be discouraged, performance tuning is sometimes a game of whack-a-mole… We all know that the more processing power available for your server, the better the overall system is likely to perform. Especially when the CPU spends the majority of its time […]
If you are using web cookies to operate your online business you probably know already that just like in real life, cookies do not last long. This is an especially known fact to whoever uses online cookies to store unique user IDs. Most online marketing companies rely on cookies for that purpose, but when cookies disappear – it makes it harder for them get persistent user data. Interested to know for how long does a cookie really last? in this post I’ll try to provide some answers. Who is eating web cookies? Cookies can disappear for various reasons, such as: Clearing the browser historical data by the user Setting the browser to reject third-party cookies Using tools that clean up your device and free up storage space Use of VPNs, Ad Blockers and more. One very common reason cookies disappear is the use of private browsing modes such as Incognito […]
The web is full of third-party scripts. Sites use them for ads, analytics, retargeting, and more. But this isn’t always the whole story. Scripts are unpredictable, they execute code, but you don’t know what this code actually does. With Taboola’s advertising video player solution, we struggle with 3rd party scripts daily. Working with different advertisers has exposed us to a variety of malicious behavior: sound violations, auto scroll and change page DOM are just some of them. In this post we’ll take a closer look at how we detect sound violations. Steer clear of 3rd party script risks A sound violation is a state in which the video plays sound without user interaction. Several advertisers do this, to ensure that the user will notice their ads. We struggled with this often and received lots of complaints from publishers. Sound violations should be prevented by the video player, but […]
Imagine you’re walking down the street and you see a nice car you’re thinking of buying. Just by pointing your phone camera, you can see relevant content about that car. How cool is that?! That was our team’s idea that awarded us first place in the recent Taboola R&D hackathon aptly named – Taboola Zoom! Every year Taboola holds a global R&D hackathon for its 350+ engineers aimed at creating ideas for cool potential products or just some fun experiments in general. This year, 33 teams worked for 36 hours to come up with ideas that are both awesome and helpful to Taboola. Some of the highlights included a tool that can accurately predict the users’ gender based on their browsing activity and an integration to social networks for Taboola Feed. Our team decided to create an AR (Augmented Reality) application that allows a user to get content recommendations, […]
One pleasant morning I got to work, thinking this day couldn’t get any better. But as Murphy would have it, there was my boss walking frantically toward me. It turned out that almost over night one of the main data pipeline systems had become a major bottleneck for the company, and a solution was needed, Fast! Usually in a startup, let alone a company moving as fast as Taboola, these things can occur on a weekly basis. I needed to find some quick wins to relieve some of the bottlenecks inside the system. Luckily Re2 was there to the rescue – in this post I will share how to find the bottlenecks using Gprof2dot beautiful image rendering, and of course, what Re2 is and how to use it. * Note that this article addresses a pain I had in a Python framework, but because there are Re2 implementations to all […]
This is a tale of heroism, of overcoming obstacles and hardships. This is a tale of ingenuity, of originality and thinking outside the box. This is a tale… of how I was too lazy to go and look if someone was already playing table tennis in the game room. Hiking Across the Office is a Drag Taboola’s Israeli office in Tel Aviv, houses about 350 people, spread over five large floors. The game room, however is smack dab in the middle of them. In smaller companies, if you wanted to know if the game room was available, all you had to do was to look slightly over your monitor and you would have your answer. Here, it takes 60 seconds and 110 steps, including one flight of stairs, to get from my workstation all the way to the game room – believe me, I counted. Unfortunately, due […]
Hello Git user. In this blog post I will discuss a technique for a unique version calculation for every Git commit. You may ask why we need this, after all every commit in Git is identified by a unique sha1 hash. That’s right, let’s take 2 commits, 4bd92c9 and f5fc029, use their sha1 hash as a version and perform a simple A/B test. The test showed that 4bd92c9 is preferred to f5fc029. If this is the case, how can we tell: Which version is newer? If 4bd92c9 is included in f5fc029, or vice versa? What branch they were built from? It seems we need an alternative. The common standard for the versioning is a SemVer scheme. We will use its parts as follows: Major – manual increment Minor – every released feature will increment the minor Patch – will always be 0 Now let’s take a look at our […]