- Taboola Blog
- Tips and Tricks
This post is not about K8S – nor is it about AWS. It is not about containers – nor is it about some new, “cool” technology for managing large-scale applications. Rather, this post is about how we deploy a highly sophisticated Java service, a heavy service that is very actively developed on a daily basis, to 1000s of servers across our 7 data centers around the world. So what’s the problem? Isn’t it enough to take a list of servers, get the version to deploy and run it with an automation tool like ansible? Well, it’s not as simple as it might seem. This service serves Taboola’s recommendations and responds to hundreds of thousands requests per second. The service has to be fast – so fast that its p95 should be below 500 milliseconds per request. Which means we can’t have any downtime at all, or even afford slower […]
Optimizing Spark Executor Utilization: Harnessing Dynamic Allocation and Resource Management for Efficient Workload Processing.
In Taboola, we deal with scale, huge scale. A small issue might turn into a disaster in a matter of hours. Re-writing and replacing an existing service with a new one is a real challenge, moreover doing it without causing downtime is SCARY. Reading logs is not an option. Logs are gigantic, unwieldy and span over many machines. It would take hours to combine and analyze them. In this post I will share with you three graphs in Grafana that I think are a must for observing new code. Let’s start… Did I break production? You write your shiny code, you (even) test it, but, how would you verify that you didn’t break the production environment? Luckily, we use Grafana, and this actually makes a big difference. My plan was to compare old code vs. new in Grafana, but, where to start? You have Grafana… let’s use it! Frankly, I […]
At Taboola, we work daily on improving our Deep-Learning-based content-recommendation model. We use it to suggest personalized news articles and ads to hundreds of millions users a day, so naturally we must stick to state-of-the-art deep learning modeling methods. But our job doesn’t end there – analyzing our results is a must too, and then we sometimes return to our data science roots and apply some very basic techniques. Let’s lay such a problem out. We are investigating a deep model that behaves rather strangely: it wins over our default model for what looks like a random group of advertisers, and loses for another group. This behavior is stable in the day to day, so it looks like there might be some inherent advertisers qualities (what we’ll call – campaign features) to blame for this. You can see a typical model behavior for 4 campaigns below. So we hypothesize that […]
Sometimes we need to test urgent features fast. It has to be within a very short timeframe, when there is not enough time to run a full test plan for that feature. This might occur on different occasions. When not having enough manpower in QA to cover a full test plan for a feature. New special demands from an important client right before the release deadline. Product management needs new adjustments before the developer deploying a new product version. It can also happen when a client, team lead or PM wants a new feature and it should have been done YESTERDAY! It can also happen actively. Running every once in a while a wide post-production test, or dedicating limited time for a bug hunt. We at the Taboola Video Solution department call it “Search for a Bug Thursday”. This unplanned development might end up launching a “half baked” product. It […]
At Taboola, our goal is to predict whether users will click on the ads we present to them. Our models use all kinds of features, yet the most interesting ones tend to be related to the users’ history. Understanding how to use these features well can have a huge impact on the model’s personalization capabilities, due to the user-specific knowledge they hold. User history features vary strongly between different users; for example, one popular feature is user categories – the topics a user had previously read. An example for such a list might look like this – {“sports”, “business”, “news”}. Each value in these lists is categorical and they have multiple entries, so we name them Multi-Categorical features. Multi-Categorical lists can have any number of values per user – which means our model must handle both very long lists and completely empty lists (for new users). Supplying inputs of unknown length […]
Discover the journey of creating synchronized analog clocks using microcontrollers. Learn about power-efficient design, NTP synchronization, and more.
A few years ago, one of my friends suggested me to become a cybersecurity teacher in high school once a week as part of a program called Gvahim. I have not planned that it will contribute to my professional career, but I find a lot of analogies to my day to day role. I hope you will enjoy a different angle of management 101 guidelines. Program overview The program’s goals were to increase the knowledge of high school students in cybersecurity and increase the number of girls who study computer science. For three years in the program, students studied about Assembly, networks and operating systems, with an emphasis on security. Unlike traditional materials learned in high school, the lessons in the program put an emphasis on self-learning. The first two semesters were dedicated to learning the theoretical background using self-reading and small coding exercises. The last semester of the […]
In the following article, I describe how we came up with a way to improve the chances that our SDK library gets smoothly integrated in our customers’ Applications and reduce issues when going to production. The main idea is to take a number of significant clients’ applications and replace your existing SDK code with a new code, allowing you to see how the apps perform before you release a new SDK version. Why releasing a reliable SDK is so important Developing an SDK for mobile apps is very different from developing a standalone app. You can think of an SDK as a guest in someone else’s house. You need to behave, you can’t put your legs on the table or wipe your hands on the sofa (well, in most countries you can’t). So what I mean is that you can’t interfere with the app’s normal behavior, break some flows or […]
Delivering good product to live environment requires big effort from R&D. Under the software development life cycle, we can find 6 basic phases: Understanding the requirements, design, coding, testing, deployment (incl. A/B test, if necessary) and maintenance. But how can we measure product quality? By its stability? Scalability? Easy to maintain? Bug free code? There are probably many definitions for what is a good product, but in my opinion, the two foundation stones are product behavior & functionality as defined (be aligned with the product manager’s requirements), and zero critical bugs. The product can serve many goals, but if it doesn’t achieve the main one, it might not have a reason to exist. Naturally, customers are always expecting high quality from the product, so before releasing it to production QA should make sure that indeed critical bugs don’t exist. In order to respect these two, both R&D and QA should […]