Join 34,000+ subscribers and receive articles from our blog about software quality, testing, QA and security.
 

Noticable uptick in 503 (Service Unavailable) response messages


#1

I am a contributor to and heavy user of the testrail-python bindings, and have noticed an uptick in 503 (“Service Unavailable”) responses from the TestRail API. Anecdotally it seems as if they have become more common over the last ~month or so. I have added code to automatically handle and retry these 503 responses, but even with multiple retries, delays, and back off I will still ultimately get a 503.

Given that, I have two questions:

  1. Is Gurock aware of an uptick in 503 responses?
  2. Is there a recommended way to handle these? Perhaps a recommended amount of time to wait for the service to become available? The API documentation is fairly vague: (“Applications and libraries that use TestRail’s API are responsible for handling 5xx errors and are supposed to retry requests later in this case.”)

#2

Hi Levi,

Thanks for your posting. Do you use a TestRail Cloud instance or a self-hosted installation? We monitor our systems 24/7 and haven’t noticed any 503 increases. Do you know if the 503s are spread across the day or is there a small window every day where you see the 503s? When we take a backup of a TestRail Cloud instance, this instance is put to maintenance mode for 1-5 minutes and this would result in 503s for API requests.

Cheers,
Tobias


#3

tgurock, thank you so much for the reply.

To answer your first question, I use a TestRail Cloud instance.

I went back through my logs for the past 30 hours I saw 8 503s, all of which happened between 05:02:00 and 05:03:59 UTC on both 11/16 and 11/17. Based on that I’m guessing you start your snapshot process at midnight local time, and, since I run my tests at the top of every hour, I am hitting the maintenance mode.

Based off of this and your response, I think I can adjust my retry attempts to wait up to maybe 7 minutes total, and that should hopefully get my tests through the maintenance mode.

Regards,
-Levi


#4

Hi Levi,

Thanks for the additional details. Yes, this would be the maintenance/backup period and waiting for a couple of minutes before retrying would be recommended in this case. The same is true for other 5xx errors, or if you hit the rate limit for API requests (429 Too Many Requests).

Cheers,
Tobias