Helix multiple issues - OPV check not working, Appointment book not loading
Incident Report for MedicalDirector Product Status
Postmortem

Postmortem summary

Status RESOLVED
Incident Helix multiple issues - OPV check not working, Appointment book not loading
Incident started Apr 15, 2021 11:01 am
Incident ended Apr 15, 2021 11:15 am
Time to resolve 14 minutes
Priority P1
Incident manager Head Of TechOps
Affected services Helix

Executive summary

Intermittent connectivity between NATS and services (Medicare and ePrescribing services) running inside the MirthVMs caused degradation and outage in OPV checks, appointment book.

Postmortem report

Leadup

On 15/April/2021 at 10:59 am, there was a connectivity issue between NATS and Medicare and prescribing services running in MirthVMs and these services failed to reconnect with NATS, which resulted in service degradation and led to outage of OPV check and appointment book in Helix.

Fault

Intermittent connectivity between NATS and services (Medicare and ePrescribing services) running inside the MirthVMs cause degradation OPV checks and creating appointments.

Impact

THE entire MT-MEGA & MT-NEXT stack was impacted.

Detection

The customer Service team raised an incident ticket after two sites contacted them for support.

Response

Site Reliability Engineers responded to the ticket and started investigating the issue.

Head of TechOps posted an update on Status Page to notify our users of the Helix partial Outage.

Recovery

After analyzing the SEQ logs for MTNEXT and MTMEGA it was confirmed that the issue is caused due to connectivity between NATS and Medicare and ePrescribing services running inside the MirthVMs for the customer.

Once, the root cause was Identified, Site Reliability Engineers initiated restart of Medicare and ePrescribing services in all the MirthVMs

Timeline

11:01 am CS Team posted in the incident war room channel. 11:01 am Head of TechOPs posted an update on Status Page. 11:02 am Head of TechOPs tagged Site Reliability Engineers to Investigate the issue. 11:03 am SRE acknowledged the incident message and started investigating. 11:07 am CS team posted more sites were encountering similar issues. 11:07 am Head of TechOps acknowledged the message and requested to raise the Incident ticket and posted SREs are investigating. 11:09 am SREs Restarted Medicare and prescribing services running in VMs. 11:12 am CS team raised the Incident Ticket. 11:14 am SREs confirmed services were restarted and requested the CS team to validate the fix with Customers. 11:15 am CS Team confirmed the issue is resolved. 11:15 am Incident updated Status Page and marked as Resolved. 11:16 Incident Jira ticket was closed.

Blameless root cause

Intermittent connectivity issue between NATS and services (Medicare and ePrescribing services) running inside the MirthVMs leads to degradation & outage OPV checks and creating appointments. however, the Medicare and ePrescribing services that are running inside the MirthVMs failed to reconnect themselves after initial disconnection with NATS.

Posted Apr 20, 2021 - 11:08 AEST

Resolved
All functions are now fully restored. We apologise for the inconvenience caused and thank you for your patience.
Posted Apr 15, 2021 - 11:15 AEST
Identified
We are currently experiencing a full-service outage with Helix.

The issue has been identified and our engineering and operations teams are working hard to resolve it as soon as possible. Every effort is being made to minimise the impact to you, your staff and your patients.

We will send an update in the next hour or as soon as more information becomes available. We apologise for any inconvenience.
Posted Apr 15, 2021 - 11:01 AEST
This incident affected: Cloud Products (Helix).