- Taboola Blog
- Engineering
The existing CI processes in Taboola are quite demanding – a full product build includes 150 maven modules accounting for about 20000 unit tests. Beside running builds from master branch there are also builds for all feature branches and patch releases. All in all accumulating to more than a hundred builds per day. Some of the executed tests are pretty heavy on CPU and memory, performing resource-intensive data crunching. As you can imagine – all this requires substantial CI infra horse power. Which it definitely possesses: Taboola’s Jenkins cluster currently has 35+ Jenkins slaves, each with 20 to 40 CPU cores and at least 100 Gb memory. Each slave runs 5 to 10 executors. All in all – a powerful CI/CD factory. But even with all this power – there are limitations. On the day of the weekly release the volume of builds peaks and we sometimes find ourselves starving […]
Large production pipelines in TensorFlow are quite difficult to pull off. Training small models is easy, and we mostly do this at first, but as soon as we get to the rest of the pipeline, complexity rapidly mounts. One reason is that the “Computation Graph” abstraction used by TensorFlow is a close, but not exact match for the ML model we expect to train and use. How so? Typically a model will be used in at least three ways: Training – finding the correct weights or parameters for the model given some training data. Often done periodically as new data arrives. Evaluation – calculating various metrics during training on a different data set to evaluate training quality or for cross validation. Serving – on-demand prediction for new data There could be more modes. For example we could re-train an existing model or apply the model to a large amount of […]
Synthetic monitoring is something that we all do. It’s almost something that you don’t think about. You set up a monitor and it just tells you if the service is up or down, most times with just a simple GET. There are the giants in this field (lately consolidated under the Keynote brand as part of AppDynamics) and the new comers like Catchpoint, ThousandEyes, Pingdom (now part of SolarWinds) and WorldPing. All solutions have the same basic concept, pull website information from different agents around the world and provide visibility for the web site operator on uptime, response times and other metrics. But what happens with you have a failure, and no alert? These tools have become so widespread and have such long usage history, that it almost seems pointless to compare. This is a solved problem, no? Just take the cheapest one out there and you’re done. Here at […]
Regression Testing a Complex UI Web applications are a mischievous bunch. When they are born they are usually small, clean and orderly, but they may grow up to become complex and error prone monsters. Each feature added to the mix increases the chance of a new bug appearing, a promise that is usually fulfilled. When developing a large and complex web application, we need to be able to continually check regressions and verify that everything that worked until now is still working. So that at least we won’t break more than we need to. Taboola is are a web company, and must deliver on a very rapid pace. We ship many features on a continuous basis. Under such circumstances no matter how big your QA team is, it will never manage to cover the entire system for every delivered feature. This means an extensive testing automation framework is a must. […]