- Taboola Blog
- Engineering
We wanted to see if there was a way we could sync our Kubernetes NetworkPolicies dynamically with tools we already use, like Consul and Calico.
Read this article to learn more about what conversions are, how Taboola handle billions of daily events at scale, and how it all presents meaningful data to customers.
Kafka is an open-source distributed event streaming platform and something went wrong while working with it. Let’s see how it was investigated and resolved.
Find out the secrets to how Taboola deploys and manages the thousands of servers that bring you recommendations every day.
Rbox, our recommendation product, is a 3rd party service embedded in publisher sites
In this article you will learn what Samplex is and how it is used to make processing of large raw datasets more efficient.
Many R&D buzzwords and acronyms can seem like complex jargon — unnecessary shortcuts for concepts that are already pretty basic.
During the pandemic, most companies quickly adapted and moved to a work-from-home model, as a sudden necessity of the lockdown restrictions introduced by efforts to combat the spread of COVID-19.
We announced our plans to acquire Connexity, bringing eCommerce recommendations to the open web. And this is just the beginning.
The world is not flat, it’s highly nested With over 4 billion page views per day and over 100TB of data collected daily, scale at Taboola is no joke. Our primary data pipe deals with masses of data and endless read paths. Could we optimize our schema for all these read paths? Guess not… Our schema is HUGE and highly nested. After digesting the data, we keep it in hourly Parquet files on HDFS, where each hour consists of about 1-1.5TB of compressed data. Our schema roughly looks like this: root |– userSession: struct | |– maskedIp: long | |– geo: struct | | |– country: string | | |– region: string | | |– city: string | |– pageViews: array | | |– element: struct | | | |– url: string | | | |– referrer: string | | | |– widgets: array | | | | |– element: […]