CrowdStrike Explains Why Bad Update Was Not Properly Tested

Share This Post

CrowdStrike on Wednesday shared information from its preliminary post-incident review, explaining why the update that caused global chaos was not caught by internal testing. 

The cybersecurity giant delivers two types of security content configuration updates to its Falcon agent (sensor): sensor content and rapid response content. 

In the case of sensor content updates, they provide a wide range of capabilities to assist customers in adversary response, and include long-term reusable capabilities for threat detection. These code updates are not dynamically fetched from the cloud, they undergo rigorous testing, and customers can select which parts of their fleet the update should go out to.

Rapid response content, on the other hand, is not a code update but a proprietary binary file that contains configuration data to improve visibility and detections on a device without requiring code changes. A validator component performs checks on this content before it goes out to customers.  

The problematic update rolled out on February 19 was a rapid response content update targeting novel attack techniques that abuse named pipes.

This content validator was trusted to identify any issues, based on tests and deployments conducted since March. However, the validator contained a bug that resulted in the bad update passing validation. 

Because no additional testing was conducted, the problematic update was pushed into production, causing roughly 8.5 million devices running the Windows operating system to enter a Blue Screen of Death (BSOD) loop.

The Windows crash was caused by an out-of-bounds memory read that triggered an exception. CrowdStrike said its content interpreter component is designed to “gracefully handle exceptions from potentially problematic content”, but this exception was not gracefully handled. 

Advertisement. Scroll to continue reading.

Moving forward, CrowdStrike plans on improving rapid response content testing, including through local developer testing, content update and rollback testing, stress testing, fuzzing, stability testing, and content interface testing. Additional checks will be added to the content validator for rapid response content, and error handling will be enhanced. 

In addition, the security firm says it’s implementing a staggered deployment strategy for rapid response content, and customers will have greater control over the deployment of these updates.

CrowdStrike announced on Monday that it has found a way to speed up the remediation of systems impacted by the buggy update, claiming that a significant number of devices have already been restored. 

Described as one of the worst IT failures in history, the incident caused significant outages across the world in sectors such as aviation, financial, healthcare, and education. 

US House leaders want CrowdStrike CEO George Kurtz to testify to Congress about the company’s role in sparking the widespread outage. 

In the meantime, organizations and users have been warned that threat actors are leveraging this incident for phishing, scams and malware delivery.  

Additional news coverage from SecurityWeek and around the web:

This post was originally published on this site

More Articles

Article

Navigating SEC Regulations In Cybersecurity And Incident Response

Free video resource for cybersecurity professionals. As 2024 approaches, we all know how vital it is to keep up to date with regulatory changes that affect our work. We get it – it’s a lot to juggle, especially when you’re in the trenches working on an investigation, handling, and responding to incidents.