- Taboola Blog
- Tips and Tricks
“Any fool can write code that a computer can understand. Good programmers write code that humans can understand.” Martin Fowler, 2008. Names, they are everywhere in our software. Just think of the things we name, we name our packages, classes, methods, variables, in fact us programmers do so much of it, we should probably know how to do it well. In my opinion, making the code readable is just as important as making your code work. In this post I will give you 5 tips and guidelines to choose your names in order to make your code more readable. 1. Reveal your intent: The name you choose should answer as many questions as possible for the reader, questions like, why it exists, what it does, and how it is used. Choosing good names takes time but saves more than it takes when the going gets tough, so […]
Ever thought about presenting your work to others? Talking in a meetup or a conference? In the past I couldn’t even think about it, I thought that it’s not for me and I won’t get any benefit from it at all. In the last year and a half, things have started to change. In the following post I will share how the will for continuous improvement took me out of my comfort zone, and put me in places and scenarios I never imagined. I started my journey in the software development world 8 years ago. I had some knowledge, and almost no experience. I studied industrial engineering and didn’t think I would practice software development. But things changed and I found my first role as a manual QA engineer, then QA automation engineer, automation developer, and in the last 5 years DevOps / Release engineer. I was always […]
We all have these amazing machines in our development and testing labs, and we know that our real users do not share this wonderful world. They experience our products very differently from us. These differences result in two major challenges: We do not know what the users experience We cannot debug their machines As a Video Advertisement Player team, these challenges are multiplied. Why? Our product is a third party script that serves other third party scripts for websites. Your code runs on different platforms As a third party web product, you do not know which websites your code runs on. Websites have a variety of frameworks, architectures and styles. Frameworks – change the browser’s core behavior, for example, redefining methods, which challenges the product’s basic behavior. Architectures – affect the website’s performance, which impacts on the product’s natural flow. Styles -manipulate the product’s look and feel. Running […]
Knowledge sharing is critical for every company that wants to grow and improve. The bigger the company – the harder it gets. Inefficiency, a lack of alignment within your peers, difficulty training new workers – you name it. In this post we will take a look at the existing methods for knowledge sharing. How they can’t keep up with growth and fast paced changes, and why people are your best resource for knowledge. What is Knowledge? In general, there are three main types of knowledge that need to be shared in a software company – Technical Knowledge, Product Knowledge and Business Knowledge. When a new employee begins their job, most companies will help them to learn, using some of the more traditional methods to share knowledge: Learn from others – via frontal training or assigning a mentor Allow Self learning – online course, or from the company’s knowledge center (Atlassian, […]
If you’ve been following our tech blog lately, you might have noticed we’re using a special type of neural networks called Mixture Density Network (MDN). MDNs do not only predict the expected value of a target, but also the underlying probability distribution. This blogpost will focus on how to implement such a model using Tensorflow, from the ground up, including explanations, diagrams and a Jupyter notebook with the entire source code. What are MDNs and why are they useful? Real life data is noisy. While quite irritating, that noise is meaningful, as it gives a wider perspective of the origins of the data. The target value can have different levels of noise depending on the input, and this can have a major impact on our understanding of the data. This is better explained with an example. Assume the following quadratic function: Given x as input, we have a deterministic output […]
As VP of IT at Taboola, my teams and I are overwhelmed with logs, pinned down by the rate and volume of them. The job of the Production Site Reliability Engineering (SRE) team in Taboola is to keep the technology running smoothly and bring in as many insights as we can from the system, making sure that any and every technical issue (that isn’t self healing or contained) is dealt with quickly. We also support this torrent of incoming data to make sure that any insights that can be gleaned from this data are found. With over 1B users discovering what’s interesting and new through the Taboola Feed, we can’t drop the ball or stop thinking about our logs, log management, where to store them and how to process them. This is the challenge of processing over one million lines of logs every second. To address this challenge, we engaged […]
Now that more than a year has passed since our first deep learning project emerged, we have had to keep moving forward and delivering the best models we can. Doing so has involved a lot of research, trying out different models, from as simple as bag-of-words, LSTM and CNN, to the more advanced attention, MDN and multi-task learning. Even the simplest model we tried has many hyperparameters, and tuning these might be even more important than the actual architecture we ended up using – in terms of the model’s accuracy. Although there’s a lot of active research in the field of hyperparameter tuning (see 1, 2, 3), implementing this tuning process has evaded the spotlight. If you go around and ask people how they tune their models, their most likely answer will be “just write a script that does it for you”. Well, that’s easier said than done… Apparently, there […]
Do you know that feeling when you’ve finished working on a feature, pushed the code, but then your CI system refuses to respond? Lagging or slow responsiveness is very common among Jenkins users, and you can find many reported issues on it. Slow CI systems are frustrating, they make you develop slower and waste your time. I have worked on several Jenkins systems with various versions (1.x and 2.x), tens of slaves and hundreds of builds per day. I have managed to improve the performance of those systems using a few simple guidelines. In the following post I will share 5 tips that can make your Jenkins better, and put a smile on your developers’ faces. Tip 1: Minimize the amount of builds on the master node The master node is where the application is actually running, this is the brain of your Jenkins and, unlike a slave, it […]
Back in 2012, when neural networks regained popularity, people were excited about the possibility of training models without having to worry about feature engineering. Indeed, most of the earliest breakthroughs were in computer vision, in which raw pixels were used as input for networks. Soon enough it turned out that if you wanted to use textual data, clickstream data, or pretty much any data with categorical features, at some point you’d have to ask yourself – how do I represent my categorical features as vectors that my network can work with? The most popular approach is embedding layers – you add an extra layer to your network, which assigns a vector to each value of the categorical feature. During training the network learns the weights for the different layers, including those embeddings. In this post I will show examples of when this approach will fail, introduce category2vec, an alternative method […]
One pleasant morning I got to work, thinking this day couldn’t get any better. But as Murphy would have it, there was my boss walking frantically toward me. It turned out that almost over night one of the main data pipeline systems had become a major bottleneck for the company, and a solution was needed, Fast! Usually in a startup, let alone a company moving as fast as Taboola, these things can occur on a weekly basis. I needed to find some quick wins to relieve some of the bottlenecks inside the system. Luckily Re2 was there to the rescue – in this post I will share how to find the bottlenecks using Gprof2dot beautiful image rendering, and of course, what Re2 is and how to use it. * Note that this article addresses a pain I had in a Python framework, but because there are Re2 implementations to all […]