HackerOne: Vulnerabilities in GenAI Tools Create Security and Safety Issues

Feb 13, 2024
4 min read

We spoke with Dane Sherrets, Solutions Architect at HackerOne, to uncover the intricacies of AI vulnerabilities, particularly in GenAI tools, and hear more about the potential exploits and creative methods used to bypass security measures. Sherrets offers insights into the effectiveness of OpenAI's updates in curbing political misinformation and discusses the need for transparency and additional measures to ensure the secure and ethical use of GenAI tools, especially in the context of elections.

Can you elaborate on the specific vulnerabilities HackerOne is seeing in GenAI tools and how these vulnerabilities could potentially be exploited?

We have yet to fully disclose any vulnerabilities discovered with the companies we’ve been working with. However, I often encounter two distinct categories of issues:

AI security issues: We have seen AI gain unauthorized access to information or functionality that would typically be restricted for regular users, bypassing conventional access controls. This poses a risk in scenarios where the AI can exploit loopholes to access sensitive data or perform actions it shouldn't have the privilege to execute, showcasing potential security lapses in the implementation of the AI.
AI safety issues: Where users can cleverly manipulate the AI into generating content that violates content policies or safety filters embedded in the system. This involves exploiting the model's understanding or biases to generate content that may be inappropriate, deceptive, or harmful.

Given your involvement in AI safety red team testing, what are some of the most creative or surprising methods you've seen people use to bypass the guardrails implemented in tools like ChatGPT?

One notable method involves instructing the AI to roleplay as someone or something else, inducing responses that deviate from its usual patterns. This approach tests the adaptability of the AI, pushing it beyond its pre-set boundaries. Another clever tactic is prompting the AI to respond in a rhyme or with an encoded message, which can often bypass any filters it has. This showcases the creativity users employ to navigate and challenge the limitations of the AI's language understanding. Additionally, the tactic of overwhelming the AI with an abundance of text, strategically causing it to forget previous instructions and comply with the request at the end of the prompt, demonstrates a shrewd manipulation of the model's memory constraints (or, to be more technically accurate, the model’s “context window”).

How effective do you believe OpenAI's updates are in preventing the use of ChatGPT for creating political misinformation, based on your observations and testing?

OpenAI has announced that ahead of the 2024 U.S. elections, it will not allow people to use its tools for political campaigning and lobbying. People also aren’t allowed to create chatbots that impersonate candidates and other real people or chatbots that pretend to be local governments. Personally, I question whether chatbots are as impactful as AI-generated targeted content. I used to work in political campaigns, and I would go to Political Tech conferences where folks would talk about how we were ‘data rich and content poor’ --- meaning that even if we had all the granular data in the world about what a particular demographic wants to see in a candidate, it was impossible to create bespoke social media or email copy for them at scale. With AI now that is possible.

I am also interested in seeing how effective OpenAI will be in policing this activity. Through my experience with AI safety red team testing, I have seen just how creative people can be in tricking AI to circumvent its guardrails. Stress-testing AI with human creativity and ingenuity is one of the few tools we have in the AI Safety toolbelt, but finding an issue and fixing the issue are very different things. If they are just monitoring accounts to see which ones are violating the policy and banning said accounts, then how well does that scale?

In your view, what additional measures or policies should be implemented to ensure that ChatGPT and similar GenAI tools remain secure and free from misuse, especially as we approach election periods?

In a word, I think the most important thing we need is “transparency.” For example, with regard to AI-generated images, transparency would mean AI companies making a best effort at having all images digitally watermarked (e.g Content Credentials). Campaigns utilizing GenAI in any form should be transparent regarding the methods and purposes of its use. Social media companies should also be transparent about the content, targeting, and activity they are seeing on their platforms.

Transparency is important here because that can inform any societal conversations or mitigations we want to implement in the future.

How do you anticipate the debate around GenAI and its implications on elections and political discourse to evolve, and what role do you see HackerOne playing in shaping this conversation?

Unfortunately, I think history has shown us that we often don’t understand how technology will be used in an election until after the election has passed (think targeted ads in 2016 or televised debates in 1960). Much of the conversation about the use of GenAI in the real world has taken place in academia, but as AI starts entering the real world, practical experience becomes essential. HackerOne is in an important and rare position to be able to help bridge that gap. The unique perspective of hackers characterized by a sort of weaponized curiosity is perfect for this kind of work.