Web Interface Only: Failures downloading more than 1 file at a time

Incident Report for Files.com

Postmortem

From October 10th at 5:14 PM PST through October 11th at 6:42 AM PST, Files.com customers experienced failures when downloading multiple files at once through the web interface. Downloads of more than one file use a code path that results in Zipping the multiple files together through a ZIP generation process. This ZIP process was impacted during this incident.

This incident began when we deployed a configuration change to our production environment that was intended to improve the security of our HTTP Headers. While we did extensively test this change, our testing failed to thoroughly test the ZIP download function.

After being notified about this incident through our customer support channel, we identified the issue and rolled back the change on October 11th at 6:42 AM PST.

We are disappointed that this issue took so long to resolve and we’d like to provide some detailed color about the multiple causes of the delay.

Customer Support Hours

This issue began at 5:14PM PST, while our customer support department was closed. Files.com staffs our customer support department from 6am-5pm PST Monday through Friday.

Although the issue was reported to us immediately by multiple customers, these reports were all received while our customer support department was closed, and none of the reports were escalated via our after hours support services.

As a result, we did not become aware of the issue until our support department reopened the following morning. As soon as we became aware of the issue, we fixed the issue promptly.

Files.com also offers a 24/7 Enterprise Support product that about 100 of our customers subscribe to, however none of those customers alerted us about this issue. If you rely on Files.com for business-critical needs, please consider subscribing to our Enterprise Support service so that you have the ability to guarantee resolution of issues 24/7. Learn more at https://www.files.com/enterprise-support.

This is the first incident in recent memory where Files.com’s lack of 24/7 support for non-Enterprise customers has been implicated in the impact profile of a major incident. We are considering adding additional after-hours support resources, however, we are not making any official changes at this time due to the still limited impact of this incident. If you have opinions about this topic, we’d love to hear from you.

Monitoring Deficiency

Additionally, this incident exposed a major deficiency in our monitoring as it relates to ZIP downloads. While we do have sophisticated monitoring that covers ZIP downloads, our monitoring was not sophisticated enough to catch this issue because our monitor did not actually inspect the generated ZIP for correctness.

We have developed an improved monitoring tool that will now more extensively test the ZIP download function. This would make us able to detect this situation and catch it.

We expect to deploy that improved monitoring tool within the next several weeks.

We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Support team.

Posted Nov 25, 2024 - 10:08 PST

Resolved

We have resolved an incident related to download failures of more than one file at a time via the web interface. This incident occurred between 5:14 PM PT on October 10 and 6:42 AM PT on October 11. We are compiling a final Root Cause Analysis for this incident, which we will post here when it is complete.
Posted Oct 10, 2024 - 17:00 PDT