Open in app

Sign in

Write

Sign in

Zachary Ennenga
Zachary Ennenga

283 Followers

Home

About

Pinned

A P-O-X On Both Your Houses: Reverse Engineering a 20 year RF protocol

In 2001 Hasbro began a viral marketing campaign for their new game, P-O-X. They started in the Chicago area, asking kids — in arcades, in schools, at skateparks — who they thought the coolest kid¹ at that location was. They continued up the chain until they found someone who thought…

Sdr

26 min read

A P-O-X On Both Your Houses: Reverse Engineering a 20 year RF protocol
A P-O-X On Both Your Houses: Reverse Engineering a 20 year RF protocol
Sdr

26 min read


May 24, 2021

Second Order Parallelism in Spark-based Data Pipelines

The entire purpose of Spark is to efficiently distribute and parallelize work, and because of this, it can be easy to miss places where applying additional parallelism on top of Spark can increase the efficiency of your application. Spark operations can be broken into Actions, and Transformations. Spark Transformations —…

Spark

4 min read

Spark

4 min read


Published in

The Airbnb Tech Blog

·Mar 3, 2020

On Spark, Hive, and Small Files: An In-Depth Look at Spark Partitioning Strategies

One of the most common ways to store results from a Spark job is by writing the results to a Hive table stored on HDFS. While in theory, managing the output file count from your jobs should be simple, in reality, it can be one of the more complex parts of your pipeline. — Author: Zachary Ennenga Background At Airbnb, our offline data processing ecosystem contains many mission-critical, time-sensitive jobs — it is essential for us to maximize the stability and efficiency of our data pipeline infrastructure.

Spark

17 min read

On Spark, Hive, and Small Files: An In-Depth Look at Spark Partitioning Strategies
On Spark, Hive, and Small Files: An In-Depth Look at Spark Partitioning Strategies
Spark

17 min read


Published in

The Airbnb Tech Blog

·Sep 24, 2019

Scaling a Mature Data Pipeline — Managing Overhead

There is often a hidden performance cost tied to the complexity of data pipelines — the overhead. In this post, we will introduce its concept, and examine the techniques we use to avoid it in our data pipelines. — Author: Zachary Ennenga Background There is often a natural evolution in the tooling, organization, and technical underpinning of data pipelines. Most data teams and data pipelines are born from a monolithic collection of queries. As the pipeline grows in its complexity, it becomes sensible to leverage the Java or Python Spark…

Spark

11 min read

Scaling a Mature Data Pipeline — Managing Overhead
Scaling a Mature Data Pipeline — Managing Overhead
Spark

11 min read


Feb 2, 2019

Please Destroy My Face: Reverse Engineering Scorched Earth’s MTN File Format

Humble Beginnings Once upon a time, in my youth, I was enamored with a game called Scorched Earth. The premise of the game was simple, given two variables — angle and power — you would launch an increasingly destructive arsenal of missiles and weapons at your opponents in an attempt to destroy…

Programming

23 min read

Please Destroy My Face: Reverse Engineering Scorched Earth’s MTN File Format
Please Destroy My Face: Reverse Engineering Scorched Earth’s MTN File Format
Programming

23 min read

Zachary Ennenga

Zachary Ennenga

283 Followers

Help

Status

About

Careers

Blog

Privacy

Terms

Text to speech

Teams