FTP: Elevated error rates in USA region only

Incident Report for Files.com

Postmortem

On November 13 from 7:25am to 8:10am PST we experienced an issue with our FTP services that resulted in elevated error rates and inability to connect entirely to FTP for some, but not all, customers. This issue did not affect any other network services at Files.com, such as SFTP, WebDAV, AS2, API, etc. It was specific to FTP in our USA region.

According to our logs, this issue affected approximately half of all traffic connecting to FTP in our USA region during the impacted time window. All other regions were unaffected.

This incident related to a deployment of a critical software update to our production proxy servers in the USA region. We have 20 worldwide proxy servers, and they are the toughest devices in our fleet to update. These servers often need to be updated in place because they route traffic for all of our network services. Of those 20, 8 are located in our USA region.

In this incident, 4 of the 8 proxy servers failed to restart the FTP proxy service after the critical software update. Because we experienced more than one proxy server failure at once, the incident caused total failure of FTP to some customers.

The impact of this incident was mainly caused by a failure in our deployment process to detect the failure after the first proxy server failed, leading to a larger failure. We have revised our deployment process to reduce the chance of this sort of cascade in the future.

We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Support team.

Posted Nov 25, 2024 - 10:17 PST

Resolved

We have resolved elevated error rates on the FTP service on Files.com in our primary USA region. This incident did not impact other network services such as API, SFTP, WebDAV, AS2, and others.

This incident occurred between the times of 7:25am PST and 8:12am PST.

If you previously moved any workloads to another region in response to this incident, you are cleared to move those regional workloads back to the USA region.

We are compiling a Root Cause Analysis for this incident, which we will post here.
Posted Nov 13, 2024 - 08:32 PST

Investigating

FTP only: We are investigating elevated error rates on the FTP service on Files.com in our primary USA region.

This incident does not impact other network services such as API, SFTP, WebDAV, AS2, and others.

This incident does not impact FTP in any other regions.

If you are affected by this incident and have an urgent need to access Files.com, we recommend using SFTP in lieu of FTP. If you must connect via FTP, you should be able to immediately connect (and access your existing files and account) using the hostname of our Canada region, which is app-ca-central-1.files.com.
Posted Nov 13, 2024 - 07:54 PST
This incident affected: FTP/FTPS and USA Region.