Running Python on AWS ECS Fargate

Adrian Macal
Nov 20, 2020

I have a pure Python script which moves data from FTP, ungzips it, converts XML to JSON and writes it back to S3. It runs quite fast on single c5.2xlarge machine with multiprocessing and does not work on AWS Glue PythonShell (deployments issues). How about moving it to the AWS ECS?

Now

Currently my python script uses 3 FTP servers to download *.xml.gz files. Each file is around 1GB and each FTP can support up to 3 connections. The code spins up to 9 processes and utilizes a queue to acquire a token which allows to access FTP server without crossing connection quota.

Infrastructure

I need to have ECR, ECS, IAM role and a task.

Dockerfile

In the image I need to copy only two collected artifacts.

Build

Because I am going to run docker container I need to prepare it first. The following script should do it:

Execution

I ran it and it took around 47 minutes, comparing to c5.2xlarge where it took 19 minutes. As expected CPU is the bottleneck.

--

--

Adrian Macal

Software Developer, Data Engineer with solid knowledge of Business Intelligence. Passionate about programming.