Google says a large language model (LLM) project was able to discover a vulnerability in SQLite, and its researchers were not able to find the same vulnerability using traditional fuzzing.
The tech giant in June shared details on Project Naptime, which aimed to evaluate the offensive security capabilities of LLMs. Naptime has since evolved into a project named Big Sleep, which is a collaboration between Google’s Project Zero and DeepMind teams.
On Friday, Google announced that the Big Sleep LLM agent, which is still in the research phase, managed to identify its first real-world vulnerability, an exploitable stack-based buffer overflow in the SQLite open source database engine.
The issue was discovered in early October and patched by SQLite developers within hours of disclosure. Users were not at risk because the vulnerability was found in code that had yet to be officially released.
However, according to Google, this was a noteworthy finding, possibly the first example of an AI agent finding an exploitable memory safety issue in real-world software.
The SQLite vulnerability was found by asking the Big Sleep agent to review recent commits to the code and try to find a security issue that is similar to a recently patched vulnerability that was provided to the agent as a starting point.
The blog post published by Google describes the steps taken by the AI until it discovered the SQLite vulnerability.
Google’s researchers then attempted to identify the same vulnerability using fuzzing, but the flaw was not found even after 150 CPU-hours of fuzzing. The company noted that several years ago its AFL fuzzer was quite effective at finding SQLite bugs, but now “it seems the tool has reached a natural saturation point”.
“When provided with the right tools, current LLMs can perform vulnerability research,” Google said. “However, we want to reiterate that these are highly experimental results. The position of the Big Sleep team is that at present, it’s likely that a target-specific fuzzer would be at least as effective (at finding vulnerabilities).”
AI has been increasingly used by the cybersecurity industry, including for software vulnerability research. Last week, threat intelligence firm GreyNoise credited an AI-powered tool for spotting attempts to exploit critical vulnerabilities in widely deployed IoT cameras.
Google is not the only company using LLMs to find vulnerabilities. AI security firm Protect AI has developed a static code analyzer that leverages LLMs to detect and explain complex, multistep vulnerabilities.
Others have been looking into how LLM agents can exploit both known and unknown vulnerabilities.
Related: AI Models in Cybersecurity: From Misuse to Abuse
Related: Microsoft Copilot Studio Vulnerability Led to Information Disclosure