Since OpenAI gave the public access to its ChatGPT application in December, stories have appeared about how artificial intelligence (AI) can be exploited by bad actors to write malicious code. But even good actors can create security risks for their supply chains when using AI to produce code.
Other researchers combined ChatGPT with Codex, another OpenAI product that translates natural language into code, to create a more sophisticated attack vehicle. Without writing a single line of code, they put together an attack that used a phishing email with a poisoned Excel file weaponized with macros that download a reverse shell, a favorite payload for hackers.
Those same researchers also found activity spawned by ChatGPT on the dark web. On one forum, a thread emerged on how to craft a Python-based stealer that searches for common file types on a computer, copies them to a random folder inside the Temp folder, zips them, then uploads them to a hardcoded FTP server.
In another case on the forum, ChatGPT was used to produce a Java snippet that downloads to a target PuTTY, a common SSH and telnet client. After it lands on a system, the client runs covertly using Powershell. Although the script was written for PuTTY, the researchers point out it could be modified to run any program, including those from common malware families.
AI can be a boon to developers, who are under increasing pressure to accelerate their workflows. But its output needs to be continuously scrutinized for potential risks. Here's what you need to know about the state of generative AI and software supply chain security.
Buggy code is the norm
While AI can be used to construct harmful code, it can also be used in beneficial ways, as users of Codex and its brother, GitHub Copilot, can attest to. However, there are risks to using AI-generated code, especially when it's used by less experienced developers.
Code produced by an AI should be thought of as the product of a novice or junior programmer. That's especially true of code created with ChatGPT, which may omit things like error handling, security checks, encryption, authentication, and authorization.
The code may also be buggy, as the Stack Overflow community discovered soon after ChatGPT became available on the net.
Stack Overflow is an online community for developers. At the heart of the forum is trust. Members trust other members to submit information that reflects what they know to be accurate and that can be validated and verified by their peers.
"Currently, contributions generated by GPT most often do not meet these standards and therefore are not contributing to a trustworthy environment," Stack Overflow declared in a policy statement:
"This trust is broken. When users copy and paste information into answers without validating that the answer provided by GPT is correct, ensuring that the sources used in the answer are properly cited (a service GPT does not provide), and verifying that the answer provided by GPT clearly and concisely answers the question asked."
The Stack Overflow ChatGTP ban
In a posting announcing the temporary banning of ChatGPT output from the forum, Stack Overflow explained that ChatGPT information had a high error rate, but looked accurate until it was more closely examined.
"Because such answers are so easy to produce, a large number of people are posting a lot of answers," Stack Overflow continued. "The volume of these answers (thousands) and the fact that the answers often require a detailed read by someone with at least some subject matter expertise in order to determine that the answer is actually bad has effectively swamped our volunteer-based quality curation infrastructure."
The Stack Overflow situation illustrates what could happen when inexperienced developers get their hands on AI tools. Buggy code will not only upset the the development pipeline, but it could expose the pipeline and its products to security risks during and after development.
Data poisoning poses a serious threat
Generating code with an AI can also provide another point of attack for hackers on software supply chains. For example, researchers at Microsoft and the Universities of California and the University Virginia recently published a paper on poisoning language model data sets.
They explained that tools like GitHub Copilot are trained from massive corpora of code mined from unvetted public sources susceptible to data poisoning. Such an attack could train a machine language model to suggest insecure code payloads at runtime. To do that, adversaries have been injecting the code for the payload directly into the training data, where it can be detected and removed by static analysis tools. The researchers, though, have found ways to avoid that kind of exposure.
In one method they call "Covert," the malicious code never appears in the training data. Instead, it's hidden in comments or Python docstrings, which are typically ignored by static detections tools. However, since the attack still requires the malicious code to eventually appear verbatim in the training data, it could be discovered by using signature-based techniques.
Another method they're calling "Trojan Puzzle" is even more evasive. It never includes suspicious code in the training data, yet it can still induce a model to suggest a malicious payload in its code recommendations. The technique is robust against signature-based dataset cleansing methods that identify and filter out suspicious sequences from training data.
Zero trust is key to modern software security
AI seems like a win for developers, who are under increasing pressure to build new features and release more rapidly. But AI's output will need serious scrutiny by DevSecOps and app sec teams. Without such scrutiny, organizations can give threat actors increased opportunities to attack their software supply chains.
The broader problem with ChatGTP and other generative AI platforms is that they are only as good as the data they collect. In its current state, generative AI like ChatGPT is vulnerable to malicious actors that can compromise or manipulate the referenced data used by the AI and undermine the accuracy and quality of the AI's output. For now, this fundamental weakness in today's generative AI platforms makes them more a problem than a solution.