FTP, ungzip, S3 on-the-fly

Unpacking archives in the command line takes a file and writes a file. Is it possible to take the data from FTP, decompress it and write it back to S3? Yes, and here is my code to do it.

Solution

The pipeline has three stages: FTP, ungzip and S3. Between stages I put a buffer which ensures there is enough data for next stage.

FTP download pushes data to the first buffer which should contain at least 1MB to start decompressing it (avoiding BadZipFile).

With enough data the file-like object (actually the buffer) is used when we read the data from GzipFile stream. We read only 256kB because we know we have at least 1MB uncompressed bytes. The decompression result is pushed to the second buffer.

When the second buffer has at least 32MB (chunksize, used to multipart upload) we create new piece and upload it to S3. At the end when FTP transfer finishes we need to flush both buffers.

Code

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store