Welcome back to The Prompt, Over the weekend, Meta released a cohort of new AI models called Llama 4, which it claimed perform better and more cost efficiently than OpenAI’s GPT4-o and Google’s Gemini 2.0 for tasks such as creative writing, coding and summarizing documents. One of these models, called Llama-4- Maverick, was benchmarked on the popular platform LM Arena, where people rate and compare AI models' answers.
But what Meta benchmarked turned out to be different from the one it had publicly released to developers, TechCrunch reported. Instead, the social media giant benchmarked a more fine tuned version, several AI researchers noted on X and Meta mentioned in fine print in its own blog post. As a result, LM Arena said it is updating its policies for fair and "reproducible" model evaluations in the future.
“Meta’s interpretation of our policy did not match what we expect from model providers,” it posted on X . “As a result of that we are updating our leaderboard policies to reinforce our commitment to fair, reproducible evaluations so this confusion doesn’t occur in the future.” The fiasco has raised concerns about the company’s claims about its models’ performance being potentially misleading and points to the fact that these benchmarks aren’t always a reliable measure for AI’s capabilities.
Now let’s get into the headlines. Last week, President Trump announced sweeping tariffs on 90 countries that shook the stock market. While those tariffs exclude semiconductors — the powerful chips that undergird AI models— they could make imports of essential materials like aluminium and structural steel more expensive and drive up the cost of building the massive data centers where AI models are trained, Forbes reported.
“By the time we have developed the capacity to domestically produce these systems, we will have lost the AI race," Gavin Baker, chief investment officer at private equity firm Atreides Management said on X . AI might have also had a role to play in coming up with Trump’s trade policies , which resemble ChatGPT’s, Gemini’s and Grok’s breakdown of how the tariffs were calculated, Verge reported. The White House denied the claims and published what it said was its formula for the calculations.
Shopify’s billionaire CEO Tobi Lutke said the e-commerce giant is changing its approach to hiring in the age of artificial intelligence . Teams will be required to demonstrate why AI can’t get the job done before asking for more resources and additional headcount, according to a memo sent to employees which Lutke posted on X . Shopify’s workers are also expected to use AI in their day-to-day jobs and their performance reviews will include questions about AI usage.
New York City-based Runway, which develops AI models for video and image editing, raised $308 million. It’s now valued at over $3 billion, according to PitchBook and multiple reports. The company’s tools have been used to create scenes in Oscar-winning film Everything Everywhere All at Once.
Runway’s AI software is used by tens of millions of users that include Hollywood filmmakers as well as budding artists, CEO Cristobal Valenzuela told me. The appeal? Runway’s easy to use tools. “You don't have to learn this complex 30-year-old software stack or creative tools that have been around for so long.
You just can go from the idea to the final execution in a few minutes with Runway,” he says. A Microsoft sign is decorated in celebration of the company's 50th anniversary at Microsoft headquarters, Friday, April 4, 2025, in Redmond, Wash. (AP Photo/Jason Redmond) On Friday, Microsoft celebrated its 50th anniversary.
The company behind the Windows operating system and Microsoft Office suite of applications like Word and Excel used the occasion to highlight its innovations in artificial intelligence , which ranges from its search engine Bing to coding tool GitHib copilot to its family of small-sized language models called Phi . It also announced new capabilities for Copilot, including in-depth research and the ability to carry out specific tasks on a person’s behalf. Microsoft Azure AI Foundry , a software which allows companies to build AI apps and what’s called “agents,” systems to carry out specific tech-related tasks, is used by some 60,000 customers, Microsoft announced.
(The celebration was interrupted by employees who protested Microsoft allegedly selling its software and AI tools to the Israeli military. Microsoft terminated their employment, CNBC reported.) I spoke to Microsoft’s Corporate Vice President for AI platform, Eric Boyd about how the company has evolved over the years.
Boyd joined Microsoft in 2009 on the Bing team. Rashi Shrivastava: Is there a key moment in Microsoft’s history where you realized the significance of this technology and how it could change the way we work in the future? Eric Boyd : For me, it really crystallized when I saw probably GPT-3.5 and this was probably about six months before ChatGPT got released and we were just playing with it.
But it hadn't really broken through to the mainstream. And I remember I was really almost frustrated at the time. I'm like, how come the whole company isn't taking a bet on AI and just refactoring, refocusing everything that they're doing.
And then we saw GPT-4 a few months later and we did exactly that. Satya got all of the key leadership in the company into a room and said, ‘here we have this new technology and look at it, understand it, and he gave very specific instructions. I want you to come back with plans on how you're going to incorporate this in your products.
We already have spellcheck–don't make spellcheck a little bit better. I want you to rethink the way that this is going to work in your products. Every team has plans and every team threw those plans out the window and came up with new plans for how they were going to refocus on this.
And it was just an astounding pivot for the company that led to just this watershed where we realized things are going to be radically different. We're going to operate in a very different way. The world's going to be different as a result of that.
Rashi: What is Microsoft's vision for AI in the future? Eric : Even if we kept doing just no new models, no new capabilities, no new functionality, there's at least five years of adoption to take advantage of the technology that we've already got. But we do have new things coming and I can ask a model to understand complicated things and do reasoning and give me answers back in a chat form. It’s going to change completely again as we move into a world where we expect these agents powered by models to go and do work for them and that I'm now going to be even more productive because I can assign sort of menial tasks.
We now have AI that can do that mind numbing work really efficiently at massive scale and really well and just provide tremendous benefits to people. Rashi : Can we expect to see these agents in Microsoft products like Word and some of the other tools that we use in the future? Eric : I can't imagine that we wouldn't have that. I don't have anything to announce on that, but I definitely would expect that.
Rashi : How has Microsoft's relationship with OpenAI evolved over the years? It seems like the company has been shifting away its reliance on OpenAI’s models to offer models from other companies like Mistral as well as build its own. Eric : “We have a really great partnership with Open AI and we continue to work really closely with them. I have literally 20 people in my team who are sitting at OpenAI's office each and every day working side by side with their engineers on the next things that we're looking to produce and move forward on.
Our commitment to our customers is that we're going to give them the best frontier models produced by OpenAI along with the best models at each point at the price performance curve. I don't see that as a change really in our relationship with OpenAI. It's just making sure that we've got the full segment, all the segments in the market really covered.
In some cases, AI’s medical decisions can be better than those of human doctors , according to a new study by digital health startup K and researchers at Cedars Sinai, Tel Aviv University. Forbes reported. The study, which reviewed 460 patent visits, found that K Health’s AI chatbot, which makes recommendations and diagnoses based on the patient’s medical records and conversations with the patient, matched the doctor’s decisions in two-thirds of cases and in the remaining one-third it offered better-quality care.
A representative of Elon Musk’s Department of Government Efficiency, which has moved through a string of federal agencies to reduce headcounts and slash budgets, appears to be trying to use an AI coding tool to write code for the Department of Veteran Affairs ’ codebase, according to Wired . A DOGE hire called Sahil Lavingia, gained access to the department’s systems and is reportedly trying to digitize the organization with the introduction of AI tools for different uses..
Technology
The Prompt: Microsoft Bets Big On Its AI Future

Plus: Is AI’s medical advice better than a doctor’s?