ChatGPT “not a reliable” tool for detecting vulnerabilities in developed code

Generative AI – specifically ChatGPT – should not be considered a reliable resource for detecting vulnerabilities in developed code without crucial expert human oversight. However, machine learning (ML) models show strong promise in assisting the detection of novel zero-day attacks. That’s according to a new report from NCC Group which explores various AI cybersecurity use cases.

The Safety, Security, Privacy & Prompts: Cyber Resilience in the Age of Artificial Intelligence (AI) whitepaper has been published to assist those wishing to better understand how AI applies to cybersecurity, summarizing how AI can be used by cybersecurity professionals.

This has been a topic of widespread discussion, research, and opinion this year, triggered by the explosive arrival and growth of generative AI technology in late 2022. There’s been a lot of chatter about the security risks generative AI chatbots introduce – from concerns about sharing sensitive business information with advanced self-learning algorithms to malicious actors using them to significantly enhance attacks. Likewise, many claim that, with proper use, generative AI chatbots can improve cybersecurity defenses.

Expert human oversight still crucial to detecting code security vulnerabilities

A key area of focus in the report is whether source code can be input into a generative AI chatbot and prompted to review whether the code contains any security weaknesses in an interactive form of static analysis, accurately highlighting potential vulnerabilities to developers. Despite the promise and productivity gains generative AI offers in code/software development, it showed mixed results in its ability to effectively detect code vulnerabilities, NCC found.

“The effectiveness, or otherwise, of such approaches using current models has been the subject of NCC Group research with the conclusion being that expert human oversight is still crucial,” the report read. Using examples of insecure code from Damn Vulnerable Web Application (DVWA), ChatGPT was asked to describe the vulnerabilities in a series of insecure PHP source code examples. “The results were mixed and certainly not a reliable way to detect vulnerabilities in developed code.”

Machine learning proves effective at detecting novel zero-day attacks

Another AI defensive cybersecurity use case explored in the report focused on the use of machine learning (ML) models to assist in the detection of novel zero-day attacks, enabling an automated response to protect users from malicious files. NCC Group sponsored a masters student at the University College London’s (UCL) Centre for Doctoral Training in Data Intensive Science (CDT DIS) to develop a classification model to determine whether a file is malware. “Multiple models were tested with the most performant achieving a classification accuracy of 98.9%,” the report read.

Threat intelligence involves monitoring multiple online data sources providing streams of intelligence data about newly identified vulnerabilities, developed exploits, and trends and patterns in attacker behavior. “This data is often unstructured textual data from forums, social media, and the dark web. ML models can be used to process this text, identify common cybersecurity nuance in the data, and therefore identify trends in attacker tactics, techniques, and procedures (TTP),” according to the report. This enables defenders to proactively and pre-emptively implement additional monitoring or control systems if new threats are particularly significant to their business or technology landscape, it added.

DevSecOps, Generative AI, Vulnerabilities