Running AWS Glue Pythonshell

AWS is a service to do a lot of things around ETL. One piece of it are the jobs which allow you to run your code. Here I am going to show you how to run the simplest job written in the Python.

Goal

I aim to download just single Wikipedia archive file from FTP directly to S3. The file size is around 65GB and will be transferred on-the-fly by the python script.

Infrastructure

We need a bucket, glue job, IAM role with some permissions, a python script to be executed by the job and SSM parameter pointing at our bucket.

Python script

The code is just kind of abstraction of samples I did before. The main idea is to transfer data between FTP and S3 without storing it locally.

Outcome

How long did it take? 49 minutes.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store