- Taboola Blog
- Tips and Tricks
One pleasant morning I got to work, thinking this day couldn’t get any better. But as Murphy would have it, there was my boss walking frantically toward me. It turned out that almost over night one of the main data pipeline systems had become a major bottleneck for the company, and a solution was needed, Fast! Usually in a startup, let alone a company moving as fast as Taboola, these things can occur on a weekly basis. I needed to find some quick wins to relieve some of the bottlenecks inside the system. Luckily Re2 was there to the rescue – in this post I will share how to find the bottlenecks using Gprof2dot beautiful image rendering, and of course, what Re2 is and how to use it. * Note that this article addresses a pain I had in a Python framework, but because there are Re2 implementations to all […]
This is a tale of heroism, of overcoming obstacles and hardships. This is a tale of ingenuity, of originality and thinking outside the box. This is a tale… of how I was too lazy to go and look if someone was already playing table tennis in the game room. Hiking Across the Office is a Drag Taboola’s Israeli office in Tel Aviv, houses about 350 people, spread over five large floors. The game room, however is smack dab in the middle of them. In smaller companies, if you wanted to know if the game room was available, all you had to do was to look slightly over your monitor and you would have your answer. Here, it takes 60 seconds and 110 steps, including one flight of stairs, to get from my workstation all the way to the game room – believe me, I counted. Unfortunately, due […]
Hello Git user. In this blog post I will discuss a technique for a unique version calculation for every Git commit. You may ask why we need this, after all every commit in Git is identified by a unique sha1 hash. That’s right, let’s take 2 commits, 4bd92c9 and f5fc029, use their sha1 hash as a version and perform a simple A/B test. The test showed that 4bd92c9 is preferred to f5fc029. If this is the case, how can we tell: Which version is newer? If 4bd92c9 is included in f5fc029, or vice versa? What branch they were built from? It seems we need an alternative. The common standard for the versioning is a SemVer scheme. We will use its parts as follows: Major – manual increment Minor – every released feature will increment the minor Patch – will always be 0 Now let’s take a look at our […]
As part of the optimization team at Taboola, we are constantly working with publishers and conducting AB tests, to find the best user interface (UI) for our widget. In doing so, we improve sponsored content (SC) revenue per mille (RPM). When we dealt with a very large news site in India, as part of an ongoing optimization, we needed to get a bit creative. We were already implementing our best practices for our widget on the publisher homepage — so we came up with a different solution. We tested a native UI, one that would help blend the widget into the design of the page. User Behaviour on the Homepage Unlike article pages, where users come to read an article, and then sometimes leave after finish reading it, a user who enters a homepage directly will sometimes scan it from top to bottom to find something to read. When they […]
The existing CI processes in Taboola are quite demanding – a full product build includes 150 maven modules accounting for about 20000 unit tests. Beside running builds from master branch there are also builds for all feature branches and patch releases. All in all accumulating to more than a hundred builds per day. Some of the executed tests are pretty heavy on CPU and memory, performing resource-intensive data crunching. As you can imagine – all this requires substantial CI infra horse power. Which it definitely possesses: Taboola’s Jenkins cluster currently has 35+ Jenkins slaves, each with 20 to 40 CPU cores and at least 100 Gb memory. Each slave runs 5 to 10 executors. All in all – a powerful CI/CD factory. But even with all this power – there are limitations. On the day of the weekly release the volume of builds peaks and we sometimes find ourselves starving […]
Synthetic monitoring is something that we all do. It’s almost something that you don’t think about. You set up a monitor and it just tells you if the service is up or down, most times with just a simple GET. There are the giants in this field (lately consolidated under the Keynote brand as part of AppDynamics) and the new comers like Catchpoint, ThousandEyes, Pingdom (now part of SolarWinds) and WorldPing. All solutions have the same basic concept, pull website information from different agents around the world and provide visibility for the web site operator on uptime, response times and other metrics. But what happens with you have a failure, and no alert? These tools have become so widespread and have such long usage history, that it almost seems pointless to compare. This is a solved problem, no? Just take the cheapest one out there and you’re done. Here at […]