AWS S3 Last Modified date
File systems tend to expose creation and modification dates of each entry. AWS S3 is not a file system, but exposes “Last Modified” date which is a bit confusing, because S3 object is not modifiable, but can be overwritten.
Goal
The goal of the experiment is to figure out how it really behaves, especially in the multipart upload scenario. I have an output file which I am going to write in 128MB parts using my slow internet connection. It is going to take several minutes to upload 1GB.
Log
The process created multipart upload at 14:00, the first part completed at 14:01 and the multi-part upload was closed at 14:07.
14:00:37 test: download measured 280089294
14:00:37 test: downloading offset 0-33554431
14:00:38 test: upload started current18.json
14:00:55 test: downloading offset 33554432-67108863
14:01:25 test: part 1 completed; 134217728 bytes
14:01:40 test: downloading offset 67108864-100663295
14:02:12 test: part 2 completed; 134217728 bytes
14:02:27 test: downloading offset 100663296-134217727
14:02:58 test: part 3 completed; 134217728 bytes
14:03:12 test: downloading offset 134217728-167772159
14:03:50 test: part 4 completed; 134217728 bytes
14:04:04 test: downloading offset 167772160-201326591
14:04:39 test: part 5 completed; 134217728 bytes
14:04:53 test: downloading offset 201326592-234881023
14:05:26 test: part 6 completed; 134217728 bytes
14:05:39 test: downloading offset 234881024-268435455
14:06:13 test: part 7 completed; 134217728 bytes
14:06:25 test: downloading offset 268435456-280089293
14:06:34 test: download completed current18.xml.gz
14:07:00 test: part 8 completed; 134217728 bytes
14:07:04 test: part 9 completed; 13848989 bytes
14:07:05 test: upload completed current18.json
Inconclusive
When I checked the object from AWS CLI I got the following output:
{
"AcceptRanges": "bytes",
"LastModified": "Wed, 25 Nov 2020 13:00:39 GMT",
"ContentLength": 1087590813,
"ETag": "\"f7599503c6df35f460724cf2f15b5099-9\"",
"ContentType": "binary/octet-stream",
"Metadata": {}
}
The value did not match to any result, but I do not see everything in my logs. I cannot see the timestamp of starting sending the first part.
Again
I changed the code to do extra 30 seconds sleep between creating multipart upload and starting sending the first part and got following logs and object’s metadata.
14:33:07 test: download measured 280089294
14:33:07 test: downloading offset 0-33554431
14:33:08 test: upload started current18.json
14:33:55 test: downloading offset 33554432-67108863
14:34:24 test: part 1 completed; 134217728 bytes{
"AcceptRanges": "bytes",
"LastModified": "Wed, 25 Nov 2020 13:33:09 GMT",
"ContentLength": 1087590813,
"ETag": "\"f7599503c6df35f460724cf2f15b5099-9\"",
"ContentType": "binary/octet-stream",
"Metadata": {}
}
The value is still not the exact match, but I am pretty sure it was not the start of uploading the first part, but rather creating multipart upload.
One shot
I am bit confused, because I do not know what to expect in single upload by calling put object method. I did another test to verify it.
14:47:08 test: download measured 280089294
14:47:08 test: downloading offset 0-33554431
14:47:25 test: downloading offset 33554432-67108863
14:47:47 test: downloading offset 67108864-100663295
14:48:09 test: downloading offset 100663296-134217727
14:48:31 test: downloading offset 134217728-167772159
14:48:53 test: downloading offset 167772160-201326591
14:49:15 test: downloading offset 201326592-234881023
14:49:37 test: downloading offset 234881024-268435455
14:50:00 test: downloading offset 268435456-280089293
14:50:08 test: download completed current18.xml.gz
14:50:13 test: upload started; 1087590813 bytes
14:53:22 test: upload completed; 1087590813 bytes{
"AcceptRanges": "bytes",
"LastModified": "Wed, 25 Nov 2020 13:50:16 GMT",
"ContentLength": 1087590813,
"ETag": "\"c7023a9501c72ea77085e91fa06cc0ad\"",
"ContentType": "binary/octet-stream",
"Metadata": {}
}
The outcome says that the “Last Modified” represents somehow the upload start timestamp. At least very consistent with previous experiments.
Lesson learned
I know the meaning of “Last Modified” value and I should not use this value to assume that one file became earlier available than the other one. The column can only indicate when the file started being uploaded.