FTP download and big files in Python
Recently I was playing with downloading some files and even uploading them to S3. It was working pretty well with file around 300MB in size. Today I wanted to transfer 3.5GB between FTP and S3 and it was not completing.
Background
FTP protocol is using two types of connections: control and transfer. You are sending commands and receiving responses using control connection. If you are storing or retrieving data you are additionally using transfer connection.
What happened?
The transfer was initiated over control connection (RETR command) and the entire traffic started happening using transfer connection. After few minutes either the server or client timed out the control connection, because it was not used.
Solution
Generally you should avoid timeouts. To do it you need to communicate over control channel to keep connection constantly open. FTP protocol supports NOOP command which I send after each 32 MB of received data.