On November 13 from 7:25am to 8:10am PST we experienced an issue with our FTP services that resulted in elevated error rates and inability to connect entirely to FTP for some, but not all, customers. This issue did not affect any other network services at Files.com, such as SFTP, WebDAV, AS2, API, etc. It was specific to FTP in our USA region.
According to our logs, this issue affected approximately half of all traffic connecting to FTP in our USA region during the impacted time window. All other regions were unaffected.
This incident related to a deployment of a critical software update to our production proxy servers in the USA region. We have 20 worldwide proxy servers, and they are the toughest devices in our fleet to update. These servers often need to be updated in place because they route traffic for all of our network services. Of those 20, 8 are located in our USA region.
In this incident, 4 of the 8 proxy servers failed to restart the FTP proxy service after the critical software update. Because we experienced more than one proxy server failure at once, the incident caused total failure of FTP to some customers.
The impact of this incident was mainly caused by a failure in our deployment process to detect the failure after the first proxy server failed, leading to a larger failure. We have revised our deployment process to reduce the chance of this sort of cascade in the future.
We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Support team.