We had an unplanned outage of our TestRail Hosted infrastructure today that affected a significant number of our customers. We were automatically notified when this issue started to appear and we started working on the problem immediately within 1-2 minutes.
The problem was with one of our database clusters and because multiple, unfortunate circumstances came together, our failover mechanism for the affected database cluster was not working as expected and getting the cluster back to a working state took longer than planned. We were working with our database’s vendor emergency support today to get this issue resolved as quickly as possible.
All customer instances are back up and running again and we are really sorry about today’s outage and the length of downtime that some customer accounts experienced. We invest a lot of resources into making our infrastructure as reliable as possible and we do have a very high uptime for our infrastructure otherwise. We know that our customers depend on TestRail and every unplanned outage is something we review very critically. We have identified multiple methods and approaches to mitigate similar problems in the future and we plan to implement this as soon as possible.
We also want to thank all customers for the understanding and the offers to help troubleshoot this issue today, it’s really appreciated. If you have any questions about today’s outage please email us so we can get back to you as soon as possible.