Intermittent connection failures, followed by a brief outage of all services for sites using the ExaVault Host Key without a Custom Domain

Incident Report for Files.com

Postmortem

On May 1st, 2024, at 5:23 AM PST, Files.com correlated multiple customer tickets indicating ‘SFTP Connection failures with the ExaVault SFTP host key and host name’, which resulted in an incident being declared.  The Incident Management Team (IMT) convened and immediately began investigation.

The ‘Intermittent connection failures, followed by a brief outage of all services for sites using the ExaVault Host Key without a Custom Domain’ issues were resolved on May 1st, 2024, at 5:51 AM PST, returning the platform to full functionality.

Files.com released a resolution posting to the Status Page on May 1st, 2024, at 5:51 AM PST stating:

‘ExaVault is a service that was acquired by Files.com.  A small percentage of former ExaVault customers are still using the ExaVault host key after their migration to Files.com.  If your site was not migrated from ExaVault, or if you no longer use the ExaVault host key, this incident did not affect you.

We have resolved an incident that caused intermittent connection failures, followed by a brief outage of all Files.com core and auxiliary services in all regions for sites that use the ExaVault Host Key without a Custom Domain.

Sites with a Custom Domain and those that use the default Files.com Host Key were not affected by this outage.

From 6:30PM Pacific Time on 4/30 until 5:35AM Pacific Time on 5/1, connection attempts failed intermittently on affected sites.  From 5:35AM to 5:51AM, services were entirely down for the affected sites.

All services were restored and operational at 5:51AM.

We will follow up with an Incident Report within ten (10) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.’

Files.com acquired ExaVault, another MFT service, in 2022.  Although all former ExaVault customers have been migrated to the mainline Files.com platform, we do still support and maintain connectivity options which use legacy former ExaVault SFTP Host Key. 

This allows former ExaVault customers to continue their connections which were established using that host key.  We recently deployed a change to expand our support for the ExaVault host key from beyond just our USA region service to all 7 of our global Files.com service regions.  This was done at customer request and to improve performance for non-US based customers.

As part of this deployment, we replaced several customer-facing systems and we updated our automatic DNS management system to use regional “latency-based” routing for the ExaVault Host Key domain.

Unfortunately, a configuration issue occurred which resulted in incorrect DNS records being published to DNS, but only for customers configured to use the ExaVault SFTP Host Key.  This resulted in some IPs being returned by DNS no longer representing valid, active servers.  This resulted in intermittent connection failures for customers using the ExaVault Host Key.

After becoming aware of this problem, we immediately moved to correct the DNS using manual intervention. This caused a short but complete outage of the ExaVault Host Key domain as we removed the automated entry and repopulated it with the correct IP addresses.  We subsequently identified the automated DNS configuration issue and resolved it, moving the DNS back into automation.

While this sort of change is rare, we regret the impact on our customers and we are committed to perfecting our processes.

We have already implemented new monitoring to alert us to cross check all IPs that are published to DNS against other internal resources which list active servers.

 We’ve also improved logging to make the requested public IP address more visible to customers and our Customer Support team.

 Additionally, we have started a project to improve visibility into the DNS management logs, so that similar future bugs will be readily apparent.

We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Support team.

Posted May 09, 2024 - 11:16 PDT

Resolved

ExaVault is a service that was acquired by Files.com. A small percentage of former ExaVault customers are still using the ExaVault host key after their migration to Files.com. If your site was not migrated from ExaVault, or if you no longer use the ExaVault host key, this incident did not affect you.

We have resolved an incident that caused intermittent connection failures, followed by a brief outage of all Files.com core and auxiliary services in all regions for sites that use the ExaVault Host Key without a Custom Domain.

Sites with a Custom Domain and those that use the default Files.com Host Key were not affected by this outage.

From 6:30PM Pacific Time on 4/30 until 5:35AM Pacific Time on 5/1, connection attempts failed intermittently on affected sites. From 5:35AM to 5:51AM, services were entirely down for the affected sites.

All services were restored and operational at 5:51AM.

We will follow up with an Incident Report within ten (10) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.
Posted May 01, 2024 - 05:51 PDT

Investigating

ExaVault is a service that was acquired by Files.com. A small percentage of former ExaVault customers are still using the ExaVault host key after their migration to Files.com. If your site was not migrated from ExaVault, or if you no longer use the ExaVault host key, this incident does not affect you.

We are investigating a major outage on Files.com affecting all Files.com core and auxiliary services in all regions for sites that use the ExaVault SFTP Host Key without a Custom Domain.

Sites with a Custom Domain and those that use the default Files.com Host Key are not affected by this outage.

We will provide updates on this situation as they become available. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone.
Posted May 01, 2024 - 05:48 PDT
This incident affected: Core Services / API.