When is 4.1 greater than 4.5? When it’s OpenAI’s newest model.

On Monday, OpenAI announced the GPT-4.1 model family, its newest series of AI language models that brings a 1 million token context window to OpenAI for the first time and continues a long tradition of very confusing AI model names. Three confusing new names, in fact: GPT‐4.

1, GPT‐4.1 mini, and GPT‐4.1 nano.

According to OpenAI, these models outperform GPT-4o in several key areas. But in an unusual move, GPT-4.1 will only be available through the developer API, not in the consumer ChatGPT interface where most people interact with OpenAI's technology.

The 1 million token context window—essentially the amount of text the AI can process at once—allows these models to ingest roughly 3,000 pages of text in a single conversation. This puts OpenAI's context windows on par with Google's Gemini models , which have offered similar extended context capabilities for some time. At the same time, the company announced it will retire the GPT-4.

5 Preview model in the API—a temporary offering launched in February that one critic called a "lemon" —giving developers until July 2025 to switch to something else. However, it appears GPT-4.5 will stick around in ChatGPT for now.

So many names If this sounds confusing, well, that's because it is. OpenAI CEO Sam Altman acknowledged OpenAI's habit of terrible product names in February when discussing the roadmap toward the long-anticipated (and still theoretical) GPT-5. "We realize how complicated our model and product offerings have gotten," Altman wrote on X at the time, referencing a ChatGPT interface already crowded with choices like GPT-4o, various specialized GPT-4o versions, GPT-4o mini, the simulated reasoning o1-pro, o3-mini, and o3-mini-high models, and GPT-4.

The stated goal for GPT-5 will be consolidation, a branding move to unify o-series models and GPT-series models. So, how does launching another distinctly numbered model, GPT-4.1, fit into that grand unification plan? It's hard to say.

Altman foreshadowed this kind of ambiguity in March 2024, telling Lex Friedman the company had major releases coming but was unsure about names: "before we talk about a GPT-5-like model called that, or not called that, or a little bit worse or a little bit better than what you’d expect...

" GPT-4.1 feels exactly like that "called that, or not called that" model—a significant iteration, but apparently not the generational leap worthy of the GPT-5 moniker, further fragmenting the lineup before the promised consolidation. Also, it's worth noting that Altman said in February that GPT-4.

5 would be the company's "last non-chain-of-thought model." But apparently, plans have changed. Is 4.

1 better than 4.5? Yes and no In some key ways, 4.1 is greater than 4.

5. It makes us wonder if OpenAI has been using LLMs to name its products, owing to the famous example last year where ChatGPT commonly reported that the numerical value "9.11" was greater than "9.

9". Jokes aside, the confusing naming strategy is matched by equally puzzling performance claims. OpenAI positions GPT-4.

1 as a clear advancement over GPT-4o, particularly in coding and following complex instructions (you can see the full benchmarks on OpenAI's site ). The new model family also brings that massive 1 million token context window—about four times larger than GPT-4o's capability. Notably, unlike the multimodal GPT-4o (where "o" stood for "omni"), the announcement for the GPT-4.

1 family makes no mention of audio input or output capabilities, suggesting a focus on text and image inputs with text output, as AI expert Simon Willison noted in his blog. Compared to the soon-to-be-retired GPT-4.5 Preview, the picture becomes far more complicated.

While GPT-4.1 scores significantly better on the SWE-bench Verified coding benchmark (54.6 percent versus 38.

0 percent for GPT-4.5) and generates code diffs more reliably, OpenAI's benchmark data reveals GPT-4.5 still performed better on academic knowledge tests, instruction following, and several vision-related tasks.

(SWE-bench Verified is an industry benchmark that aims to evaluate how well AI models can understand and modify real-world software repositories to fix bugs or implement new features—essentially measuring how useful the AI would be to actual software engineers in production environments.) This raises the question: Why retire a seemingly more capable model in the API? OpenAI explains that GPT-4.1 delivers "improved or similar performance on many key capabilities at much lower cost and latency.

" In other words, GPT-4.1 hits a practical sweet spot—good enough performance for most API use cases, but delivered faster and cheaper than the more resource-intensive GPT-4.5 Preview.

GPT-4.5 is very slow and very expensive. The new models come with lower prices compared to their predecessors.

GPT-4.1 costs $2 per million tokens for input and $8 per million tokens for output, representing a 26 percent cost reduction for median queries compared to GPT-4o. GPT-4.

1 mini is priced at $0.40 for input and $1.60 for output per million tokens, while GPT-4.

1 nano costs just $0.10 for input and $0.40 for output per million tokens.

In comparison, GPT-4.5's pricing was off the charts--costing $75 per million input tokens and $150 per million output tokens through the API. So 4.

1 is an upgrade over 4o that almost matches 4.5 but costs far, far less to run. Got that? The API-only strategy So another question remains: Why create a model that outperforms GPT-4o in important ways but not offer it to ChatGPT users? According to OpenAI, many improvements from these research models "have been gradually incorporated into the latest version of GPT-4o" in ChatGPT, with more features planned for future updates.

Essentially, ChatGPT's GPT-4o has become a constantly evolving "brand" model that absorbs capabilities from the company's various research models over time. This creates a two-track system: developers using the API get specific, consistent models with clearly defined capabilities, while regular ChatGPT users receive a single model that changes behind the scenes. Developers can select precisely which model fits their needs and cost requirements, choosing between 4.

1, 4.1 mini, 4.1 nano, 4o, and other variants.

Meanwhile, consumers get whatever version of GPT-4o OpenAI decides to push out. But it won't make the naming simpler. As one Hacker News commenter astutely observed, "I need an AI to understand the naming conventions that OpenAI is using.

".