On December 5th, 2023, at 4:06 AM PST, Files.com correlated multiple customer tickets indicating ‘authentication errors when logging into SFTP’, which resulted in an incident being declared. The Incident Management Team (IMT) convened and immediately began investigation.
The ‘authentication errors when logging into SFTP’ issue was resolved on December 5th, 2023 at 6:29 AM PST, returning the platform to full functionality.
In this incident, our SFTP servers became unstable and failed to process requests for certain customers due to a bad configuration file that was applied on 12-04-2023 at 10:41 PM to our SFTP servers via our automated configuration management system.
This failure only affected a small number of customers. Specifically, it only affected customers where our API was required to authenticate the provenance of the origin IP of the connecting SFTP user. This includes customers who use IP Whitelisting or IP Geolocation (such as country whitelist/blacklisting). We use a sophisticated system to cryptographically authenticate the origin IP of the connectiing SFTP user when making upstream calls to our internal API, and it was a configuration related to this system that was inadvertently misapplied.
The reason for the bad configuration file being deployed is as follows:
A separate configuration change was correctly and successfully made to another system (our HTTP servers) via our configuration management systems. Due to a logic error in the code of the change, the change also inadvertently targeted our SFTP systems as well. This change should not have been deployed to our SFTP systems, but was inadvertently deployed to them anyway.
Internally, Files.com runs SFTP services on several dedicated servers in each service region. Our configuration management system deploys changes to servers one at a time, checking to ensure correct operation prior to continuing forward with the rollout of configuration changes.
The contents of this document are for general release and classified PUBLIC
Unfortunately, while this check did validate proper operation of SFTP in general, it did not specifically validate proper operation of the subsystem that provides for cryptographic authentication of IP addresses.
Upon discovery of the incident, Files.com reverted the inappropriate configuration change on the SFTP servers.
The root cause of this incident is twofold.
Firstly, Files.com failed to automatically monitor and validate the correct operation of the subsystem that provides for cryptographic authentication of IP addresses on SFTP servers. While a downtime of this system doesn’t cause a full downtime of SFTP, it causes a functional equivalent of that if customers require IP Whitelisting or IP Geolocation.
Secondly, Files.com failed to provide feedback to the engineers who developed and deployed the original configuration change targeted at the HTTP servers to let them know that the change would also be applied to SFTP servers.
Files.com will be developing two major improvements to its processes as a result of this incident. First, Files.com will implement additional detection and monitoring around the subsystem that provides for cryptographic authentication of IP addresses on SFTP servers. Second, Files.com will develop a system to provide feedback to its infrastructure engineers about exactly which servers will be affected by a configuration change before that change will be approved.
Both of these improvements will require substantial engineering work and are not completed yet. We look forward to completing them in the coming quarter. We are hugely disappointed by the downtime, and we will work hard to implement the additional layers of protection needed to avoid similar incidents in the future.
We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Support team.