De-extinction. A terminology that we are certain to hear more of, in the coming months and years. Humanity has perhaps found its latest look-good project, or maybe there’s more to this.
A question to which an answer isn’t exactly clear is, but may be so in due course — what do humans want to achieve with this de-extinction? The reason I wanted to chat about something that isn’t exactly technology (this is more science; but intersections are blurring) is because earlier this week, American biotechnology and genetic engineering company Colossal Laboratories & Biosciences staking a claim that the dire wolf has become the first animal to be resurrected from extinction (I have a particularly specific fascination to one day photograph wolves in the forests of Poland; don’t ask). Romulus and Remus , their first photos released now as they are a bit older than 6 months, are undoubtedly a furry bundle of cuteness. But are we doing things right with nature? Are humans approaching this (I talk broadly about de-extinction; not questioning Colossal’s intent or undeniable achievements) as a way to learn, heal ecosystems, perhaps atone for past mistakes? Or is this designed for profit? Think about it, as I circle back to Colossal’s approach.
“Leading edge of genetic engineering and restorative biology” is how they define their work (CRISPR genome editing is crucial, as are cloning techniques such as somatic cell nuclear transfer), and the plan is to bring back the Woolly Mammoth back, within the next five years. In a way, this may be a great service for our generation and the next generations. The dire wolf ( Aenocyon dirus ) roamed the earth (its believed to have predominantly populated North America) during the Pleistocene epoch, going extinct between 10,000 to 13,000 years ago, at the end of the last Ice Age.
This predator, incredibly powerful and larger than today's gray wolves, have seen multiple depictions in popular culture. There can of course be an argument that Romulus and Remus aren’t thoroughbred dire wolves, and that does have legs to stand on. Are they "real" extinct species or just hybrids? This de-extinction methodology may well be limited to species with recoverable DNA and close relatives.
Romulus and Remus, along with their sister Khaleesi, are genetically modified gray wolves (Canis lupus); created with the specific goal of replicating the elder’s physical appearance. Weight may prove critical in answering whether the hybrid approach works. Romulus and Remus reportedly already weigh around 80 pounds; but by the time they reach the fully mature stage, they much each tip the scales at around 140-150 pounds (around 68 kilogram); this would be closer to what scientists believe the dire wolves weighed at best health.
For context, typical gray wolves, generally weigh between 70 to 145 pounds depending on the subspecies. HUMAN-LIKE? OpenAI’s GPT-4.5 and Meta’s LLaMa models have passed the Turing Test, a benchmark proposed by Alan Turing in the 1950s to assess whether a machine can exhibit intelligent behaviour indistinguishable from humans.
Researchers Cameron R. Jones and Benjamin K. Bergen from the University of California San Diego, found that GPT-4.
5 performed so convincingly that judges identified it as human 73% of the time—significantly more often than they correctly identified actual human participants. Meta’s Llama-3.1-405B achieved a 56% success rate, essentially matching human performance (around 50%).
That leads me to another question, of which there seem to be many this week — is AI really behaving and responding in a more human-like manner, or have we tuned our minds in such a way that AI gains more acceptability amidst our lives and workflows if we feel it is more human? Read more about the Turing Test, and what it means...
AI PERCEPTION If you are using Gemini or Copilot on your phones, there’s a big change on the way. Google has now unlocked some new smarts with Gemini Live — basically, an ability to talk live with the artificially intelligence Gemini about anything you see. That can be something you point at in the physical world using your phone’s camera, or anything that may be on your phone’s screen at the time.
whether it’s through your phone’s screen or through its camera. A while ago, I had termed this AI vision as fascinating and terrifying . For now, these Gemini Live features are available across Google’s Pixel 9 series, as well as Samsung’s Galaxy S25 smartphones — and without needing a Gemini Advanced subscription plan.
Expect wider rollout in the coming weeks, though Gemini Live as is, can be accessed if you have a subscription in place on any other, recent Android phone. It’s a similar tale of versatility for Microsoft’s Copilot Vision that makes its way to Android phones, the Apple iPhone as well as Windows PCs (as a native app, more so). The vision (no pun intended) is for users to interact with their surroundings in real-time using their phone's camera or through their screen on Windows PCs.
On mobile, it will be able to analyse the real-time video feed, or photos to provide information and suggestions (e.g., identifying plants, offering design tips).
The native Windows app allows users to call upon Copilot while working across multiple applications, browser tabs, or files, enabling tasks like searching, changing settings, organising files, and collaborating on projects without switching apps. Why am I not too enthusiastic about Microsoft, Windows and AI? Read our coverage of phones and PCs adopting AI smarts..
. OPEN AND REASONING Late last month, Google rolled out the Gemini 2.5, which they claim is their “most intelligent AI model” yet.
The Gemini 2.5 Pro Experimental, they insist, leads the benchmarks quite significantly. A few days later, this model which was first rolled out only for the premium subscribers, has been made available to everyone who is willing to try this — albeit with rate limits.
And Google says they’re looking at ways to make this available on the Gemini app too — for now, it’s desktop only. This model is positioned as a "thinking model" with enhanced reasoning and coding capabilities, defined by ability to deploy multi-step logic, nuance and for the sake of benchmarks, mathematical encoding. It’s a significantly large context window of up to 1 million tokens in its experimental form — simply put, this means it can process and understand massive amounts of information.
Interesting to note, Gemini 2.5 Pro achieved a leading score on the "Humanity's Last Exam" (HLE) benchmark, designed to assess complex reasoning and expert-level thinking. I’ll talk about this for a moment.
The Humanity's Last Exam (HLE) is a recent benchmark, the primary focus being to evaluate advanced reasoning and knowledge capabilities of AI models across a number of academic disciplines. This should become a significantly more challenging test, keeping in tune with the rapid AI evolution, than existing benchmarks have proved to be. Scores are based on accuracy, as well as calibration error.
Emerging from this benchmark, Gemini 2.5 Pro led the way with an 18% score, with OpenAI’s GPT-4o (14%), GPT 4.5 (6.
4%) and Claude 3.7 Sonnet 64k Extended Thinking (8.9%) following it.
SCOUTS, MAVERICKS AND BEHEMOTHS Things began well for Meta, when they announced a complete collection of Llama 4 models. There’s the Llama 4 Scout which is claimed to be a small model capable of “fitting in a single Nvidia H100 GPU” (and a 10M context window), the Llama 4 Maverick which will rival the GPT-4o and Gemini 2.0 Flash models, and the still-being-trained Llama 4 Behemoth which Meta CEO Mark Zuckerberg claims will be the “highest performing base model in the world”.
I’ll go back to the Llama 4 Scout for a moment, and specifically the 10M context window (10M being 10 million). Higher is better, for processing complex, successive sequences of information and analysis. Comparing to the Llama 4 Scout’s 10 million context window are Anthropic's Claude 3.
5 Sonnet and Claude 3.7 Sonnet which have the 200,000 token context window, OpenAI’s GPT-4.5 and the o1 family with 128,000 token context windows; incidentally, that’s same as the Mistral Large 2 and the DeepSeek R1.
Impressive? That is where things went pear shaped for Meta. AI researchers began to dig through the benchmarking done by open-platform LMArena. Turns out, there was fine-print which even the benchmark platform wasn’t aware of; they released a statement later.
Meta had shared with them a Llama 4 Maverick model that was “optimised for conversationally”. Not exactly the spec customers would get, would they? “Meta’s interpretation of our policy did not match what we expect from model providers. Meta should have made it clearer that “Llama-4-Maverick-03-26-Experimental” was a customised model to optimise for human preference.
As a result of that we are updating our leaderboard policies to reinforce our commitment to fair, reproducible evaluations so this confusion doesn’t occur in the future,” as factual as it gets, amidst LMArena’s understandable frustration..
Technology
De-extinction contention, Google Gemini’s camera perception, and Meta’s Llama trouble

De-extinction. A terminology that we are certain to hear more of, in the coming months and years.