Status | RESOLVED |
---|---|
Incident | Helix multiple issues - OPV check not working, Appointment book not loading |
Incident started | Apr 15, 2021 11:01 am |
Incident ended | Apr 15, 2021 11:15 am |
Time to resolve | 14 minutes |
Priority | P1 |
Incident manager | Head Of TechOps |
Affected services | Helix |
Intermittent connectivity between NATS and services (Medicare and ePrescribing services) running inside the MirthVMs caused degradation and outage in OPV checks, appointment book.
Leadup
On 15/April/2021 at 10:59 am, there was a connectivity issue between NATS and Medicare and prescribing services running in MirthVMs and these services failed to reconnect with NATS, which resulted in service degradation and led to outage of OPV check and appointment book in Helix.
Fault
Intermittent connectivity between NATS and services (Medicare and ePrescribing services) running inside the MirthVMs cause degradation OPV checks and creating appointments.
Impact
THE entire MT-MEGA & MT-NEXT stack was impacted.
Detection
The customer Service team raised an incident ticket after two sites contacted them for support.
Response
Site Reliability Engineers responded to the ticket and started investigating the issue.
Head of TechOps posted an update on Status Page to notify our users of the Helix partial Outage.
Recovery
After analyzing the SEQ logs for MTNEXT and MTMEGA it was confirmed that the issue is caused due to connectivity between NATS and Medicare and ePrescribing services running inside the MirthVMs for the customer.
Once, the root cause was Identified, Site Reliability Engineers initiated restart of Medicare and ePrescribing services in all the MirthVMs
Timeline
11:01 am CS Team posted in the incident war room channel. 11:01 am Head of TechOPs posted an update on Status Page. 11:02 am Head of TechOPs tagged Site Reliability Engineers to Investigate the issue. 11:03 am SRE acknowledged the incident message and started investigating. 11:07 am CS team posted more sites were encountering similar issues. 11:07 am Head of TechOps acknowledged the message and requested to raise the Incident ticket and posted SREs are investigating. 11:09 am SREs Restarted Medicare and prescribing services running in VMs. 11:12 am CS team raised the Incident Ticket. 11:14 am SREs confirmed services were restarted and requested the CS team to validate the fix with Customers. 11:15 am CS Team confirmed the issue is resolved. 11:15 am Incident updated Status Page and marked as Resolved. 11:16 Incident Jira ticket was closed.
Blameless root cause
Intermittent connectivity issue between NATS and services (Medicare and ePrescribing services) running inside the MirthVMs leads to degradation & outage OPV checks and creating appointments. however, the Medicare and ePrescribing services that are running inside the MirthVMs failed to reconnect themselves after initial disconnection with NATS.