SFTP Service Only: Elevated Error Rates
Incident Report for Files.com
Postmortem

On August 6th, 2024, at 3:05 PM PST, Files.com received multiple monitoring alerts indicating ‘SFTP Service Only: Elevated Error Rates’, which resulted in an incident being declared. The Incident Management Team (IMT) convened and immediately began investigation.

The ‘SFTP Service Only: Elevated Error Rates’ issue was resolved on August 6th, 2024, at 4:06 PM PST, returning the platform to full functionality.

From 3:01 PM PST through 4:06 PM PST, Files.com customers experienced elevated error rates when connecting via the SFTP protocol.

Although this incident seems similar to the incident which occurred on August 2, it was a completely distinct situation.

The elevated error rates during this period were actually caused by a denial-of- service (“DoS”) attack against Files.com’s SFTP service.

Like all large providers of services on the Internet, we are under constant attack from a variety of threat actors.

Files.com uses a variety of sophisticated tools to defend against attacks against its infrastructure.

There are no commercial providers (that we know of) who produce DoS mitigation tools which work specifically for SFTP, and so we’ve had to invest heavily in developing our own protection and mitigation tools specifically for SFTP.

One of our mitigation strategies is to completely block connections from SFTP counter-parties who appear to be abusive.

A very hard challenge associated with this is correctly determining whether a counterparty is being intentionally abusive as opposed to being a misconfigured script or automation from an otherwise legitimate customer.

Accidentally blocking a legitimate customer can take down a major workflow for a customer, and we try very hard to never have that happen.

It’s a delicate balance and we spend a lot of time and engineering resources trying to get this right.

About 4 weeks ago, Files.com released an update to our internal security tools to add more logic to the part of our code where we try to determine abusive connections via SFTP.

This was done with the hope of making it even less likely for a legitimate customer to ever be blocked inadvertently.

While this improvement was good overall, it turns out that this update introduced a regression that allowed a particular type of malicious counterparty to open up SFTP connections and leave them hanging in an idle state.

That’s what happened on August 6. A malicious counterparty “used up” a number of our connection pool slots by opening them and letting them hang idle, leaving them unavailable for legitimate use.

After fixing the logic error in our security software, the malicious counterparty was automatically blocked and full SFTP functionality was restored.

We want to be very clear about two things:

  1. This was not a full outage of SFTP, rather it was a degradation due to partial inability to connect. If you operated SFTP software which used retries, it is likely that your connections worked on retry.
  2. The *only* thing that this malicious actor was able to do was hold open connection pool slots so that legitimate customers weren’t able to connect to them. That’s what a denial-of-service attack is: they denied service to you, the legitimate customer. There was absolutely no access to our systems at all beyond the denial-of-service.

Even denial-of-service attacks cause real economic impact, and we work hard to defend against them.

The root cause of this issue was Files.com's incomplete testing of the security software change from 4 weeks ago. It is hard to produce synthetic testing that simulates anything that a malicious actor might do, but we learned from this incident and have updated our testing accordingly.

We promise a system that works perfectly, all of the time, and we are disappointed that you may have experienced issues today that were caused by a malicious actor.

Defense against the ever-present threat environment is one of the main reasons you chose to use Files.com as opposed to operating your own on-premise server, and it is absolutely our job to prevent these sorts of things from ever affecting your workflows.

We take that mission seriously. If you need additional assistance or continue to experience issues, please contact our Customer Support team.

Posted Aug 09, 2024 - 11:24 PDT

Resolved
We have resolved elevated error rates on the SFTP service on Files.com in all regions. This incident did not impact other network services such as API, FTP, WebDAV, AS2, and others. This incident occurred between the times of 3:01 PM PT to 4:06 PM PT on August 6th, 2024.
Posted Aug 06, 2024 - 16:15 PDT
Investigating
SFTP only: We are investigating elevated error rates on the SFTP service on Files.com in all regions.

This incident does not impact other network services such as API, FTP, WebDAV, AS2, and others.

If you are experiencing problems with SFTP, we recommend using FTP in lieu of SFTP.
Posted Aug 06, 2024 - 15:20 PDT
This incident affected: SFTP.