Binary Pipeline with Python

Adrian Macal
Nov 13, 2020

--

Data are moved between systems in binary format. The medium is a network, a file system or something else, but the lowest layer is always a stream of bytes. You do not need to look into the data to move it. Here I have some code I played with such binary pipeline.

The Idea

I would like to have a high level pipeline abstraction in Python which looks like the following snippet. The code opens remote file, computes md5 and sha1, ungzip it, computes md5 and sha1 and writes it the output to S3.

The whole implementation is here and sample output may look like this:

Component

Each component exposes three important functions.

Pipeline

The pipeline run defined workflow by binding all its members and starting the first component. Everything is expected to work in the push mode, which means whenever we got some data from previous component we need to push data to next one as soon as possible. At the end we may flush our state.

--

--

Adrian Macal
Adrian Macal

Written by Adrian Macal

Software Developer, Data Engineer with solid knowledge of Business Intelligence. Passionate about programming.

No responses yet