Own the Outage: Smart Cloud Strategies for Healthcare

“Own the outage,” three key words, and one principle, that CloudWave’s engineering teams have been trained to bring to every incident we attend. When it comes to helping facilities deal with the fallout from degraded hyperscalers, we encounter some unique challenges in applying this philosophy.

Customer service is at the center of our business; we have a vested interest in helping our partners identify and achieve the shortest and safest path to resolution. If you’re reading this article, you’ve likely heard of—or been impacted by—some of the recent public cloud outages. While many public-facing and financial services organizations received broad media attention, the backend systems supporting healthcare information delivery and availability, from ePrescribing and dictation to compliance and coding services, did not. The loss of any one of these functions is enough to cripple efficiency at a number of health systems.

In the wake of these incidents, I’ve challenged our team to focus on what we can control in the event of a recurrence, knowing that the issue resides outside our walls. For example, can we apply any emergency DNS overrides and possibly accelerate time-to-repoint to a new instance, versus waiting for an automatic refresh? Does the client possess any private cloud or on-premises failover instance that we can assist them in repointing configurations toward?

While working through options in real time with our partners during these events, a few vital patterns have materialized:

Know your dependencies, understand your exposures: Many healthcare ISVs have quietly adopted cloud services, and many who consume their services remain unaware. Do you know how (or why) your facility is impacted when hyperscaler A, B, or C goes offline tomorrow? Make it a recurring activity to sit down with your architects and map out your connectivity—you’ll likely discover a new dependency more often than not.

The harder they fall: Even the hyperscaler giants are not immune to unexpected downtime events, nor should you be 100% reliant upon them for your operations. Have you implemented a capable private cloud failback plan for when issues arise with any of these services? Do you have an understanding of your data diversity and what your RPO/RTO would be in the event of a sustained outage or a real public cloud infrastructure disaster? In many cases, facilities have decommissioned private cloud ISV solutions without giving due thought to the potential loss in resiliency.

Your SLA is your parachute: What kind of guarantees has your ISV made? What do you know about their service and infrastructure resiliency? Get clear on how they deliver, and how they will react when the unexpected happens. Share these plans and details with your partners so you can collaborate on a playbook.

Unfortunately, outages are inevitable; may every incident sharpen our approach, every challenge become a strategy, and every disruption reinforce our commitment to stand beside our partners.

Tony Rienzo,
VP of Cloud Service Delivery