FTP, ungzip, S3 on-the-fly

Nov 12, 2020

Unpacking archives in the command line takes a file and writes a file. Is it possible to take the data from FTP, decompress it and write it back to S3? Yes, and here is my code to do it.

Solution

The pipeline has three stages: FTP, ungzip and S3. Between stages I put a buffer which ensures there is enough data for next stage.

FTP download pushes data to the first buffer which should contain at least 1MB to start decompressing it (avoiding BadZipFile).

With enough data the file-like object (actually the buffer) is used when we read the data from GzipFile stream. We read only 256kB because we know we have at least 1MB uncompressed bytes. The decompression result is pushed to the second buffer.

When the second buffer has at least 32MB (chunksize, used to multipart upload) we create new piece and upload it to S3. At the end when FTP transfer finishes we need to flush both buffers.

FTP, ungzip, S3 on-the-fly

Solution

Code

Written by Adrian Macal

No responses yet