From multiprocessing to asyncio

The Python’s old school way towards concurrency is multiprocessing module which spawns new processes. The asyncio module is a step to modernity. I will migrate my small data pipeline and compare the performance.

Before changes

The classic multiprocessing can be triggered by the Pool. You do not need to take care of anything just to pass list of tasks you want to run concurrently.

After changes

The modern asyncio brings the same simplicity what multiprocessing. You need to have a Pool of threads and schedule some tasks. The threads are good enough if you are going to run IO intensive tasks.

Comparison

In my case when I scheduled concurrently 20 tasks and each task just starts ECS task and waits for its result. The multiprocessing version created 20 processes and each used about 40–50MB of RAM, which makes around 1GB in total. The asyncio version used only 100–200MB of RAM.

The code is available here.

Software Developer, Data Engineer with solid knowledge of Business Intelligence. Passionate about programming.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store