Hidden Threat: Code Assistants Know More than They Think

Carding

Professional
Messages
2,828
Reputation
17
Reaction score
2,100
Points
113
All developer secrets will be revealed through the code auto-completion feature.

A team of researchers from the Chinese University of Hong Kong and Sun Yat-sen University has discovered that AI-based automated code completion tools, such as GitHub Copilot and Amazon CodeWhisperer, can inadvertently reveal sensitive user data. This data may include API keys, access tokens, and other sensitive elements.

The researchers created a tool called Hardcoded Credential Revealer (HCR) to identify such data. The study found that among the 8,127 code completion suggestions generated by Copilot, 2,702 turned out to be valid secrets. For CodeWhisperer, this figure was 129 out of 736.

It is worth noting that these "secrets" were originally accidentally published and could have been used or revoked before they got into the models. However, the study highlights the risks associated with reusing data originally intended for training.

It was also revealed that the models not only reproduce secret keys from their training data, but also offer new ones that are not contained in them. This raises questions about what other data could potentially be disclosed.

The study showed that thousands of new unique secrets are published on GitHub every day due to developers ' mistakes or indifference to security practices.

For ethical reasons, the researchers avoided checking for keys that could pose serious privacy risks, such as keys to payment APIs. However, they tested a set of harmless keys and found two working Stripe test keys suggested by both Copilot and CodeWhisperer.

GitHub said in response to the spread of information about this threat that since March 2023, the platform has launched an AI leak prevention system that blocks insecure code patterns in real time.

"In some cases, the model may offer what appears to be personal data, but these offers are fictitious information synthesized from templates in the training data," the company said.

In general, even if the researchers conclusions are only partially correct, there is a precedent, so it is extremely important for technology companies that offer similar tools to developers to implement additional methods for testing and verifying code. After all, in the modern world, in no case should we allow opportunities for the leakage of secrets and other confidential data.
 
Top