Keren Corsia, Lior Chaga, Tom Sisso, Author at Taboola Blog

Lior Chaga

Lior Chaga is a member of the Data Platform team at Taboola Infrastructures group. With over 6 years of experience in Taboola, he already crunched hundreds of Petabytes with Kafka, Spark, Cassandra and other state of the art technologies. Before joining Taboola, Lior risked his fingers at IDF C4I Corps, and worked at a fintech company. He is also a husband, a father of two, and a deprecated tour guide.

All Stories

Engineering

8 August 2022

Surviving Spark Upgrade in Production

Looking to upgrade your data pipeline to Spark3? Taboola had some issues during the upgrade, and we want to share them with you.

Big Data

27 May 2021

ScORe – Schema On Read for Spark SQL

The world is not flat, it’s highly nested With over 4 billion page views per day and over 100TB of data collected daily, scale at Taboola is no joke. Our primary data pipe deals with masses of data and endless read paths. Could we optimize our schema for all these read paths? Guess not… Our schema is HUGE and highly nested. After digesting the data, we keep it in hourly Parquet files on HDFS, where each hour consists of about 1-1.5TB of compressed data. Our schema roughly looks like this: root |– userSession: struct | |– maskedIp: long | |– geo: struct | | |– country: string | | |– region: string | | |– city: string | |– pageViews: array | | |– element: struct | | | |– url: string | | | |– referrer: string | | | |– widgets: array | | | | |– element: […]

Lior Chaga

All Stories

Start Your Taboola Career Today!