Intermittent Upload Issues
Incident Report for Files.com
Postmortem

The Incident

On August 22nd, 2019, at 2:43 AM PST the Files.com team received reports that intermittent errors were occurring when uploading files in the UK region. Retrying the upload operation would complete successfully. This intermittent upload error would randomly occur in various regions across the globe, with the retry operation completing successfully.

All error logs produced by the intermittent upload failures pointed to an issue with Amazon Web Services (AWS) S3 storage service communications. Files.com personnel reviewed the errors and were monitoring the AWS service announcements for any indication that this was a widespread S3 service outage. After 24 hours with no decrease in the intermittent errors, Files.com decided to create, test and deploy a new way of communicating with the AWS S3 storage.

The new AWS S3 communication updates were deployed at 6:56 AM PST. Extensive testing was conducted by Files.com staff with no errors detected. Customers who had submitted trouble tickets were informed starting at 8:57 AM PST that Files.com was fully functional.

We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Success team.

Lessons Learned

Files.com has implemented improved monitoring that will produce better logs on the communications between the application and the AWS S3 storage service.

Posted Aug 30, 2019 - 10:43 PDT

Resolved
On August 22nd, 2019, at 2:43 AM PST the Files.com team received reports that intermittent errors were occurring when uploading files in the UK region. Retrying the upload operation would complete successfully. This intermittent upload error would randomly occur in various regions across the globe, with the retry operation completing successfully.

All error logs produced by the intermittent upload failures pointed to an issue with Amazon Web Services (AWS) S3 storage service communications. Files.com personnel reviewed the errors and were monitoring the AWS service announcements for any indication that this was a widespread S3 service outage. After 24 hours with no decrease in the intermittent errors, Files.com decided to create, test and deploy a new way of communicating with the AWS S3 storage.

The new AWS S3 communication updates were deployed at 6:56 AM PST. Extensive testing was conducted by Files.com staff with no errors detected. Customers who had submitted trouble tickets were informed starting at 8:57 AM PST that Files.com was fully functional.

We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Success team.
Posted Aug 22, 2019 - 02:43 PDT