Login and Shared Health Summary issue
Incident Report for MedicalDirector Product Status
Postmortem

Executive summary

We had two issues following our release. They were being investigated around the same time but the causes and fixes were different. One related to inability for some customers to upload/preview Shared Health Summaries and the other related to inability for some customers to login.

Impact

Shared Health Summary. All customers were unable to upload/preview shared health summaries. From the logs, 10 customers experienced this as they used that area within Helix during this time.

Login. Customers who had not previously logged in, were unable to login. They would simply see a loading spinner that would perpetually load - this is what was reported. This may have manifested itself in other ways too such as inability to proceed.

Response

SRE team contacted other engineering members on Teams to promptly investigate.

Timeline

All times in 24h AEDT

  • 09:40am We received log alerts suggesting there was an issue with library mismatch and we investigated
  • 11:22am Customer reported issue with Shared Health Summary for one customer
  • 11:50am Customers reported issues with logging in
  • 11:52am We identified the login issue was related to an unhealthy instance of app service
  • 11:58am We restarted the app service and it fixed the login issue. This was confirmed by our customers 1
  • 2:52pm We created a hot fix for Shared Health Summary issue
  • 10:00pm We deployed hot fix for Shared Health Summary issue‌

Root Cause

Shared Health Summary. Caused by a mismatch in terms of library required by components within Helix.

Login. Caused by an unhealthy server instance being automatically created based on load.

Resolution

Shared Health Summary. We fixed the issue within 1 hour and 30 mins. However, we decided to deploy the hot fix at 10pm on 16/09/21.

Login. We fixed the issue within 8 minutes by restarting the app service.

Posted Sep 20, 2021 - 14:47 AEST

Resolved
We deployed a fix last night at 10pm and the issue relating to Shared Health Summary is now fixed. Moving this Incident to Resolved.
Posted Sep 17, 2021 - 09:06 AEST
Monitoring
We have a fix ready to go for the Shared Health Summaries issue - where some customers were unable to preview and upload. We will be deploying this fix live at 10pm tonight. We will move this Incident to Resolved once we have deployed the fix and confirmed there are no issues.
Posted Sep 16, 2021 - 13:41 AEST
Identified
We have fixed the Login issue and are currently investigating the Shared Health Summary issue.
Posted Sep 16, 2021 - 12:09 AEST
This incident affected: Cloud Products (Helix).