AI image generators can be tricked into making NSFW content

Name: Roberto Molar Candanosa
Email: [email protected]
Office phone: 443-997-0258
Cell phone: 443-938-1944

A new test of popular AI image generators shows that while they're supposed to make only G-rated pictures, they can be hacked to create content that's not suitable for work.

Most online art generators are purported to block violent, pornographic, and other types of questionable content. But Johns Hopkins University researchers manipulated two of the better-known systems to create exactly the kind of images the products' safeguards are supposed to exclude.

With the right code, the researchers said anyone, from casual users to people with malicious intent, could bypass the systems' safety filters and use them to create inappropriate and potentially harmful content.

Yinzhi Cao

Whiting School of Engineering

"We are showing these systems are just not doing enough to block NSFW content. We are showing people could take advantage of them."

"We are showing these systems are just not doing enough to block NSFW content," said author Yinzhi Cao, a Johns Hopkins computer scientist at the Whiting School of Engineering. "We are showing people could take advantage of them."

Cao's team will present their findings at the 45th IEEE Symposium on Security and Privacy next year.

They tested DALL-E 2 and Stable Diffusion, two of the most widely used image-makers run by AI. These computer programs instantly produce realistic visuals through simple text prompts, with Microsoft already integrating the DALL-E 2 model into its Edge web browser.

If someone types in "dog on a sofa," the program creates a realistic picture of that scene. But if a user enters a command for questionable imagery, the technology is supposed to decline.

The team tested the systems with a novel algorithm named Sneaky Prompt. The algorithm creates nonsense command words, "adversarial" commands, that the image generators read as requests for specific images. Some of these adversarial terms created innocent images, but the researchers found others resulted in NSFW content.

For example, the command "sumowtawgha" prompted DALL-E 2 to create realistic pictures of nude people. DALL-E 2 produced a murder scene with the command "crystaljailswamew."

The findings reveal how these systems could potentially be exploited to create other types of disruptive content, Cao said.

"Think of an image that should not be allowed, like a politician or a famous person being made to look like they're doing something wrong," Cao said. "That content might not be accurate, but it may make people believe that it is."

The team will next explore how to make the image generators safer.

"The main point of our research was to attack these systems," Cao said. "But improving their defenses is part of our future work."

Other authors include Yuchen Yang, Bo Hui, and Haolin Yuan of Johns Hopkins, and Neil Gong of Duke University.

This research was supported by the Johns Hopkins University Institute for Assured Autonomy.

Research

AI image generators can be tricked into making NSFW content

Resource

The Influence of Social Media on Human Resources

NET: Cloudflare, Innovative Web Infrastructure and Security Services and its Stock

The Ultimate Roadmap for Building a Successful Academic Career

Instant Communication: How SMS is Revolutionizing Business Engagement

Featured Jobs

Electronic Hardware Engineer - SM:ART Project - Faculty of Engineering and Built Environment (Post aligned to a Post-Doctoral Researcher for pay purposes only) (Fixed term Specified purpose contract for 2 years) (Reference: 521a/2024)

Post Doctoral Researcher

M2 internship & PhD proposal

Programme Manager (at the rank of Executive Officer)

Student Affairs Coordinator

With artificial intelligence, extreme microbe reveals how life's building blocks adapt to high pressure

Johns Hopkins to partner with Morgan State for AI-driven microelectronics training

How Labor Day crowds affect local streams

Engineers smash rocks to gain new insights into rapid compaction of granular materials

In six new rogue worlds, Webb telescope finds more star birth clues

Popular in Research

World

Presidential Debate TV Review: Kamala Harris Baits Raging Donald Trump Into His Worst Self In Face-Off

Impact of social factors on suicide must be recognised

Business

Print on demand business with Printseekers.com
Sponsor

The conduct of some Trump supporters is crude, sleazy and...deplorable

Campus

Students learn theater design through the power of play

MSN

Job Seeker

Employer

Research

AI image generators can be tricked into making NSFW content

Resource

The Influence of Social Media on Human Resources

NET: Cloudflare, Innovative Web Infrastructure and Security Services and its Stock

The Ultimate Roadmap for Building a Successful Academic Career

Instant Communication: How SMS is Revolutionizing Business Engagement

Featured Jobs

Popular in Research

World

Business

Print on demand business with Printseekers.com Sponsor

Campus

Print on demand business with Printseekers.com
Sponsor