The recent emergency patch rollout by cybersecurity firm CrowdStrike caused significant global disruption, underscoring the critical need for robust resilience planning, automation, and rigorous testing to mitigate IT system failures. In this episode of TFiR: Let’s Talk, Rob Hirschfeld, Co-Founder and CEO of RackN, talks about the necessity of having alternate control paths and backup systems in place to ensure quick recovery during such incidents.
Software patch that caused global disruption, with focus on CrowdStrike’s processes
- Hirschfeld explains that a software patch caused Windows to crash, leading to inadvertently disrupting IT systems globally.
- He attributes the issue to a combination of system errors, automation failures, and human error.
- Hirschfeld adds that CrowdStrike quickly distributed patches to millions of devices, indicating a thorough QA process.
- Hirschfeld highlights that a postmortem investigation will look into why the patch was released without proper vetting, emphasizing the importance of automation in the process.
IT operations challenges and resilience
- Operations teams face challenges in dealing with infrastructure failures, including lack of visibility and resilience planning.
- According to Hirschfeld, Microsoft and Windows are not to blame for the infrastructure failures associated with the recent disruptions.
- Hirschfeld highlights the complexity of IT systems and the shared responsibility for maintaining their security.
- He emphasizes the importance of regularly patching and updating systems to prevent security issues.
Recent patch failure highlights crucial need for IT resilience and backup systems
- Hirschfeld emphasizes the importance of backup and resiliency systems to recover from vendor-related disruptions.
- The discussion also focuses on the impact the recent patch failure had on customers and whether any measures were taken to mitigate the issue.
- Hirschfeld emphasizes the importance of resilience in recovering from cyber attacks, particularly in the face of patch availability.
- He further points out that operations teams lacking resilience may experience prolonged outages when issues arise. In contrast, teams with quick recovery processes can significantly minimize the impact of such disruptions.
Guest: Rob Hirschfeld (LinkedIn)
Company: RackN (Twitter)
Show: Newsroom
This summary was written by Monika Chauhan.





