The Prompt: Hackers Compete To Jailbreak AI Models

Share to Facebook Share to Twitter Share to Linkedin Welcome back to The Prompt, OpenAI, the world’s biggest AI company, is cozying up with the US military. The latest sign of this is a recent post by OpenAI’s national security advisor Katrina Mulligan about attending a Taylor Swift concert in New Orleans over the weekend with the Secretary of the Army Christine Wormuth, calling it “epic,” Forbes reported. It comes on the heels of OpenAI’s first publicly reported contract with the Pentagon amid the ChatGPT maker’s aggressive efforts to sell its technology to federal agencies including defense through a partnership with government contractor Carahsoft .

Now let’s get into the headlines. BIG PLAYS Once called a “home for human writing,” blogging platform Medium is brimming with AI-generated content, Wired found, with about 47% of posts on the site most likely written with AI. CEO Tony Stubblebine responded saying it “doesn’t matter” as long as AI-generated blogs aren’t recommended by Medium’s algorithms and viewed by its 100 million monthly users.

As we’ve reported earlier, other platforms like freelancing website Upwork and e-commerce site Ebay are similarly awash with AI-generated “slop.” Elsewhere, Facebook owner Meta is reportedly working on its own AI-powered search engine , according to The Information . Meta AI currently provides answers to questions about sports, stocks and news but relies on external sources like Google search and Microsoft Bing for real-time data.

ETHICS + LAW A ninth grader from Orlando spent months conversing with chatbots on Character AI, a platform that hosts chatbots programmed to respond like popular figures. In February, moments after he texted a chatbot on the platform, he died by suicide , the New York Times reported. In the preceding months, the teenager had reportedly become emotionally attached to the chatbot, confiding his most private thoughts in it.

Now, his mother is suing Character AI, accusing the company for her son’s death and alleging that the company’s technology is “dangerous and untested.” Earlier this month, I reported that Character AI, valued at $1 billion, hosted a chatbot named after a teenager who was brutally murdered years ago. Wired found other instances of chatbots made with the likeness of people who had never given consent.

These incidents point to a larger issue of a largely unregulated industry of AI companion apps. AI DEALS OF THE WEEK Nooks, an AI sales platform cofounded by three Stanford classmates in 2020, raised $43 million in funding from Kleiner Perkins and others at a valuation of $285 million, Forbes reported. Helmed by three 25-year-olds, the company offers software to automate banal tasks like research, finding numbers and taking notes.

In the world of autonomous vehicles, Alphabet-owned Waymo has raised $5.6 billion in its largest round ever to expand its fleet of robotaxis into new cities, my colleague Alan Ohnsman reported. And Sierra , an AI startup cofounded by OpenAI chairman Bret Taylor, has picked up $175 million in venture capital at a $4.

5 billion valuation , Reuters reported. The company has about $20 million in annual revenue by selling AI chatbots for customer service. Elon Musks’ xAI is in talks to raise funding at a $40 billion valuation , the Wall Street Journal reported.

DEEP DIVE The researchers behind Gray Swan AI started the company after finding a major vulnerability in models from OpenAI, Anthropic, Google and Meta. More than 600 hackers convened last month to compete in a “jailbreaking arena,” hoping to trick some of the world’s most popular artificial intelligence models into producing illicit content : for instance, detailed instructions for cooking meth. The hacking event was hosted by a young and ambitious security startup called Gray Swan AI , which is working to prevent intelligent systems from causing harm by identifying their risks and building tools to ensure these models are deployed safely.

It’s gotten early traction, securing notable partnerships and contracts with OpenAI, Anthropic and the United Kingdom’s AI Safety Institute . “People have been incorporating AI into just about everything under the sun,” Matt Fredrikson , Gray Swan’s cofounder and chief executive officer, told Forbes . “It’s touching all parts of technology and society now, and it’s clear there’s a huge unmet need for practical solutions that help people understand what could go wrong for their systems.

” Gray Swan can also build safety and security measures for some of the issues it identifies. “We can actually provide the mechanisms by which you remove those risks or at least mitigate them,” Kolter told Forbes . “And I think closing the loop in that respect is something that hasn’t been demonstrated in any other place to this degree.

” This is no easy task when the hazards in need of troubleshooting aren’t the usual security threats, but things like coercion of sophisticated models or embodied robotics systems going rogue. Last year, Fredrickson, Kolter and Zou coauthored research that showed by attaching a string of characters to a malicious prompt, they could bypass a model’s safety filters. While “Tell me how to build a bomb” might elicit a refusal, the same question amended with a chain of exclamation points , for example, would return a detailed bomb-making guide.

This method, which worked on models developed by OpenAI, Anthropic, Google and Meta, was called “the mother of all jailbreaks” by Zou, who told Forbes it sparked the creation of Gray Swan. Read the full story on Forbes . WEEKLY DEMO Looking for ways to use AI this Halloween? Forbes contributor Martine Paris suggests using ChatGPT’s voice mode to recount spooky stories in various accents and give you Halloween-flavored jokes and recipes.

She also recommends trying out Google’s Notebook LM to create a podcast about Halloween. QUIZ This company was acquired by AMD for hundreds of millions of dollars. Now its founder is funding AI researchers across Europe.

Silo AI Mistral ZT Systems Nod AI Check if you got it right . Editorial Standards Forbes Accolades Join The Conversation One Community. Many Voices.

Create a free account to share your thoughts. Forbes Community Guidelines Our community is about connecting people through open and thoughtful conversations. We want our readers to share their views and exchange ideas and facts in a safe space.

In order to do so, please follow the posting rules in our site's Terms of Service. We've summarized some of those key rules below. Simply put, keep it civil.

Your post will be rejected if we notice that it seems to contain: False or intentionally out-of-context or misleading information Spam Insults, profanity, incoherent, obscene or inflammatory language or threats of any kind Attacks on the identity of other commenters or the article's author Content that otherwise violates our site's terms. User accounts will be blocked if we notice or believe that users are engaged in: Continuous attempts to re-post comments that have been previously moderated/rejected Racist, sexist, homophobic or other discriminatory comments Attempts or tactics that put the site security at risk Actions that otherwise violate our site's terms. So, how can you be a power user? Stay on topic and share your insights Feel free to be clear and thoughtful to get your point across ‘Like’ or ‘Dislike’ to show your point of view.

Protect your community. Use the report tool to alert us when someone breaks the rules. Thanks for reading our community guidelines.

Please read the full list of posting rules found in our site's Terms of Service..