- Taboola Blog
- Engineering
Writing features as added chunks into an ever growing one bulk of code is unorganized and messy. Overtime, the tasks of testing new behaviors becomes harder and harder. Why is that? Chunky Code is hard to Navigate When working with developers on a new feature, we have to identify what we need to test. As the code grows, the challenge of finding the parts that were modified and should be tested is increasing. If a feature becomes problematic or irrelevant, reverting it becomes more difficult, since we need to go back to every line of code we changed. This way we endanger production environment and are affecting our end-users. Last but not least, it is irritating to impossible to manage when dealing with “legacy” code, that needs to be reverse engineered to find how to control. Coming to compare a buggy behavior with its intended fix or testing a new […]
General This post describes how to use Android ContentProvider to allow automatic system initialisation for your library, therefore help make your library easier to integrate and control its flow. While this article mostly demonstrates one use case, you can use the idea in other cases as well. What is it all about? Always Strive To Simplify Integration Let’s assume you are writing code for a software library that would be used by an Android application. Most common flows require the app using your library to manually call the initialisation of your library and usually, provide it with their own Context. This will require your client to write a code along these lines: This article suggests using a Content Provider to allow: Completely autonomous initialisation, liberating you from having to ask the client for init at all. Avoiding the necessity of asking your client for their Application Context. […]
What is the connection between kernel system calls and database performance, and how can we improve performance by reducing the number of system calls? Performance of any database system depends on four main system resources: CPU Memory Disk I/O Network Performance will increase while tuning or scaling each resource – this blog will cover the CPU resource.It’s important to note that whenever we release a bottleneck in the system, we might just encounter another one. For example, when improving CPU performance the database load shifts to IO, so unless our storage is capable of delivering more IOPS, we might not actually see the improvement we hoped for. But don’t be discouraged, performance tuning is sometimes a game of whack-a-mole… We all know that the more processing power available for your server, the better the overall system is likely to perform. Especially when the CPU spends the majority of its time […]
For the past year, my team and I have been working on a personalized user experience in the Taboola feed. We used Multi-Task Learning (MTL) to predict multiple Key Performance Indicators (KPIs) on the same set of input features, and implemented a Deep Learning (DL) model in TensorFlow to do so. Back when we started, MTL seemed way more complicated to us than it does now, so I wanted to share some of the lessons learned. There are already quite a few posts about implementing MTL in a DL model (1, 2, 3). In this post I will share some specific points to consider when implementing MTL in a Neural Network (NN). I will also present simple TensorFlow solutions to overcome the discussed issues. Sharing is caring We wanted to start with the basic approach of hard parameter sharing. Hard sharing means we have a shared subnet, followed by […]
If you happen to write code for a living, there’s a pretty good chance you’ve found yourself explaining another interviewer again how to reverse a linked list or how to tell if a string contains only digits. Usually, the necessity of this B.Sc. material ends once a contract is signed, as most of these low-level questions are dealt with for us under-the-hood of modern coding languages and external libraries. Still, not long ago we found ourselves facing one such question in real-life: find an efficient algorithm for real-time weighted sampling. As naive as it might seem at first sight, we’d like to show you why it’s actually not – and then walk you through how we solved it, just in case you’ll run into something similar. So buckle up, we’ve got some statistics and integrals coming up next! Why We Need Weighted Sampling in Production? At Taboola, our core business is to personalize […]
My first date with my company – or – how onboarding looks from a freshman’s eye According to LinkedIn, one in three employees decide to quit their job within the first 6 months(!) I’ve been managing people for over 20 years and I’ve spent a long time trying to crack the code of successful onboarding. It was only recently, when I started working for a new company, that my eyes were opened – I actually felt what it’s like to be a new employee. The lessons I’ve learned surprised me so much. So, I took it upon myself to build an onboarding plan addressing exactly what a new employee needs. We started to run this program in Taboola and I’m happy to say it gets great feedback. In this blog post, I’ll shed some light on the psychology of a new employee, give practical ways to deal with it […]
Every candidate we recruit goes through a long process of evaluation. Near the end of the process, after we decided they fit our culture and have the skills we need, we have a reference check. Sometimes we take it as a formal phase in the process just to make sure they’re not a serial killer. Actually a reference check is one of the more important stages in the process, let me explain why. Think for a second of a recruiter that is going to recruit someone who worked with you. You know more about this person than any process. If they could peek inside your head – they will get all the knowledge they need – much more knowledge than they got from their process. Now, while you’re still in the place of the referee, think about how will you actually answer to a reference check. Most of the time, […]
A joint post with Ofri Mann We went to ICLR to present our work on debugging ML models using uncertainty and attention. Between cocktail parties and jazz shows in the wonderful New Orleans (can we do all conferences in NOLA please?) we also saw a lot of interesting talks and posters. Below are our main takeaways from the conference. Main themes A good summary of the themes was in Ian Goodfellow’s talk, in which he said that until around 2013 the ML community was focused on making ML work. Now that it’s working on many different applications given enough data, the focus has shifted towards adding more capabilities to our models: we want them to comply to some fairness, accountability and transparency constraints, to be robust, use labels efficiently, adapt to different domains and so on. A slide on ML topics, from Ian Goodfellow’s talk We noticed a […]
Explore the challenges from testing with Mockito. Misled by IntelliJ’s suggestions, NPE was caused by a missing ‘public’ keyword in a method declaration.
Intro At Taboola we use Spark extensively throughout the pipeline. Regularly faced with Spark-related scalability challenges, we look for optimisations in order to squeeze the most out of the library. Often, the problems we encounter are related to shuffles. In this post we will present a technique we discovered which gave us up to 8x boost in performance for jobs with huge data shuffles. Shuffles Shuffling is a process of redistributing data across partitions (aka repartitioning) that may or may not cause moving data across JVM processes or even over the wire (between executors on separate machines).Shuffles, despite their drawbacks, are sometimes inevitable. In our case, here are some of the problems we faced: Performance hit – Jobs run longer because shuffles use network and IO resources intensively. Cluster stability – Heavy shuffles fill scratch disks of cluster machines. This affects other jobs on the same cluster , since […]