Practicing Visibility, Resilience, and Readiness in Healthcare IT

Practicing Visibility, Resilience, and Readiness in Healthcare IT

High-profile disruptions continue to increase pressure on healthcare IT operations. Having access to a suite of advanced security tools is no longer enough. The gap between “having data” and “taking action” determines whether hospitals survive an incident or face weeks of manual downtime.

To make this transition, healthcare IT leaders must focus on three operational imperatives: visibility, resilience, and readiness.

  • Visibility is the ability to see, monitor, and understand all the components of your IT environment in real time.
  • Resilience is the ability of those systems to withstand, recover from, and adapt to disruptions.
  • Readiness is the state of being prepared to detect, respond, and mitigate threats before they occur or escalate.

During a recent Cybersecurity Insider Program peer roundtable for healthcare security leaders, CloudWave’s CISO Ashini Surati, VP of Service Delivery Tony Rienzo, and Security and Operations Leader Richard Phung discussed what “good” looks like in the field today. Their roadmap for building operational maturity is summarized below.

 

Visibility: Telemetry and Context

A common misconception in healthcare is equating visibility with the volume of tools installed. However, true visibility is about context. It means knowing exactly what the IT environment has: the intersection of Asset Inventory, Identity Access Management (IAM), and Privileged Access Management (PAM).

Visibility enables a defender to distinguish between a routine admin task and a catastrophic breach. For example, a privileged account logging in from a foreign or unexpected IP is more than a log entry; it’s a high-fidelity signal of a social engineering hack. Understanding this and making a quick determination are key.

Furthermore, if you are not monitoring at a level that allows you to immediately determine the Where, When, Why, and How of that login, a blind spot exists. In the recent Stryker incident, for example, speculation suggests that social engineering led to compromised Intune privileges. When an attacker gains that level of access, the blind spot isn’t simply that they are in, but the fact that you don’t know where they came from or how they are moving until the malicious activity begins.

The Fragmented Source Trap

The most dangerous gap for visibility in healthcare environments is the fragmentation of sources. Between legacy systems, cloud resources, and a sprawl of vendor and service accounts, identity is often decentralized.

When suspicious activity occurs, analysts often must pivot across Active Directory, VPN logs, and cloud consoles to identify or understand the behavior. Piecing the story together is time-consuming, enabling a threat actor to achieve lateral movement or infiltration.

The “Day Five” Recovery Risk

Beyond being a hindrance to defense, visibility gaps can potentially sabotage recovery efforts. One of the biggest hurdles in emergency restoration is identifying the “safe” set.

Until you have detailed, meticulous findings on the incident—identifying whether it was an insider threat, a malicious outsider, or a compromised service account—a safe restore point cannot logically be recommended. The team must first identify the “clean” point of entry.

Imagine your team spends four long days working 24/7 to restore systems from “yesterday’s” backup. On day five, you realize the attacker had persistence in that backup. The environment is still compromised, and the restoration is now trash. You have to start over from zero.

 

Resilience: Measured by Containment, Not Detection

The term resilience is often misused, as it is frequently framed as the ability to “bounce back.” But in a Security Operations (SecOps) context, resilience is actually measured by containment speed, not detection accuracy.

The 9:15 AM Tuesday Litmus Test

Imagine it is a Tuesday morning at 9:15 AM during peak operational hours. Your endpoint detections start firing. There are clear signs of lateral movement within a department. The help desk suddenly receives a flood of calls about system slowness.

In the 15 minutes between 9:15 and 9:30, the difference between a resilient organization and a fragile one comes down to decisiveness.

Fragile organizations waste critical minutes debating if the alert is real. They wait for additional validation and aim to secure unanimous stakeholder agreement before acting. Meanwhile, the clock is ticking, and the infiltration spreads.

Resilient organizations, on the other hand, do three things well:

  1. Clear Containment Authority: They know exactly who “pushes the button” to isolate a segment. There is no ambiguity about who has the power to make the call.
  2. Predefined Containment Thresholds: They have already decided, under calm conditions, at what point they will pull the plug. They don’t negotiate with the threat in real-time.
  3. Trust in Telemetry: They trust their visibility enough to act. They accept that 5% of the time they might “overshoot” and trigger a false positive. They know that, in the other 95%, decisiveness averts a week, or even months-long, facility-wide blackout.

Avoiding “No Man’s Land”

One of the most time-consuming tasks in a recovery process is the period of indecision between the event and the declaration of a disaster, aka “No Man’s Land.” It is the hours or days spent sitting in a circle assessing what might be happening.

The Parallel Path: Remediation vs. Restoration

A major trap for CIOs is treating remediation and restoration as a linear process: doing one, then the other. A resilient leader delegates these as parallel efforts.

  • The forensic team focuses on the analytical part of the conversation, identifying the “Where, When, Why, and How.”
  • The recovery team immediately assesses the backup environment to see what is online and viable. They aren’t taking irreversible actions yet, but they are assessing controls and mapping out the path to recovery.

By the time the forensic team identifies the breach, the recovery team should already have the “safe set” identified. This avoids the “Day Five Reset” mentioned above. It prevents restoring a system and later finding it contains the original vulnerability that could compromise the system again.

The Shift from “Backup” to “Restore” Testing

We talk about backups constantly but rarely talk about integrated restore testing. Validate the data. A finished backup doesn’t mean a usable application. You must test for application-level corruption.

Update your backup controls every time a resource is added or modified in the healthcare IT environment. If you audit your inventory only once a year during a restore test, you will find that the two critical servers you forgot to back up are the ones you need most.

 

Readiness: Building Operational Muscle Memory

If a four-alarm fire starts at 3:00 AM, you do not want the leadership team hunting for a PDF on SharePoint. Operational readiness involves muscle memory. In a mature healthcare organization, the response should be automatic, not a series of panicked questions.

Coordination between the SOC and IT Department: Trusting the “Blinking Lights”

How do you get to a point where you trust the information coming to you from these systems? The answer is that readiness starts with a deep, practiced relationship between the Security Operations Center (SOC) and the IT department. You get to a point of “instantaneous trust” only through repetition.

  • The Single Pane of Glass: In an environment where hospitals stack security tools like building blocks, a SIEM (Security Information and Event Management) is critical. It enhances telemetry data and correlates conflicting signals from multiple consoles into one high-confidence alert.
  • Knowing your Toolset: Readiness is your team’s experience with its assets and environment. For example, trust your endpoint tools even when they report “quarantined.” A quarantined alert doesn’t mean the threat is gone. It signals you to check who was logged in, what happened before the alert, and what happened after.

The Tabletop Evolution

Most organizations do annual tabletop exercises. The best do them quarterly, and they don’t use PowerPoint.

  • True readiness is tested under pressure. If your tabletop is just a group of people reading slides, you are not building muscle memory. Make it a timed exercise where stakeholders feel the heat of the clock.
  • Ambiguity is the enemy. You need to know exactly who declares the incident, who handles critical communications, and who isolates the network to minimize impact on patient care operations, without looking at a playbook.

Empowerment: Don’t Wait for the Board

A common failure point in healthcare is the “approval chain.” If a CISO sees malicious activity but must call the CIO, who calls the CEO, who then calls the Board for approval to shut down a network, the organization is already compromised.

  • Push Authority Down: Readiness means technical leads on the front lines should have decision-making capability.
  • Support the “False Positive”: Technical teams must feel empowered to act. They should know that if they cause a false positive, in the interest of abundance of caution, leadership will support them. The alternative—waiting for perfect clarity while lateral movement occurs—is far more dangerous.

Separate Testing from “Control Hygiene”

There is a dangerous tendency to wait for the annual restore test to discover what is missing from the backup list. Separate the concept of testing from the ongoing need for up-to-date controls.

  • Continuous Updates: Anytime you add, modify, or decommission a resource, update your backup and security controls that same day.
  • Baseline Assessments: Don’t wait for the audit. If you have 99 things backed up but your environment has 101, you are already behind. Hygiene happens every day; testing happens once a year.

Success Metric: Operational Speed

At the end of the day, readiness is measured by how quickly your organization moves from detection to containment. By leveraging SOAR (Security Orchestration, Automation, and Response), you can remove the “human factors” of fatigue and hesitation, ensuring that your response is as fast as the threat itself.

Readiness isn’t a document; it’s muscle memory. A mature healthcare organization shouldn’t need to open a playbook when a crisis hits; it should be clockwork.

Visibility tells you there is a fire. Resilience is how fast you grab the extinguisher. Readiness is knowing you’ve practiced the drill so many times that you can do it with your eyes closed. The next time a “benign” alert fires, watch the coordination. If there’s a debate, there’s a gap. If there’s action, there’s readiness. Contact CloudWave to learn how we can help.

Interested in joining CloudWave’s Cybersecurity Insider Program, a community focused on taking action to protect your patients and healthcare organizations from cyberattacks? 

Sign up today and gain exclusive access to live quarterly insider sessions, on-demand education, access to cybersecurity experts, and more.