Privacy and the future of data mining in the new AI-powered internet

featured-image

The search business is ending, and the era of LLM search is here. As the new internet evolves, we have an opportunity to fix the data privacy violation challenges, but do we want this?

"Alexa, how many goals has Cristiano Ronaldo scored for Manchester United?" asked the user. The voice replied, "Cristiano Ronaldo played 346 games for Manchester United between 2003 and 2009, scoring a total of 145 goals." Compare this instant reply to a traditional search engine, and we know where the future of the search business is headed — a user types the keywords "number of goals Cristiano Ronaldo for Man U".

Within microseconds, a list of web links appears, featuring a detailed career track record and a table of statistics. The user clicks on one of the links and painstakingly scrolls through the page, looking for the exact number. In the case of simple queries like goals scored or runs made, the search is easy.



Often, the search queries are more complex than obtaining statistics of star players. It's messier when numerous websites have conflicting accounts or nuanced opinions about historical facts or controversial news. Google Search bypasses this problem by effectively filtering the most credible sources and placing those results at the top of the page.

The search rankings created a pecking order in the internet world. Even then, it failed to reduce the manual work or reading time of the users. No wonder, then, that the search engine giant has shifted to an AI Overview that mimics the instant responses of Alexa, Siri, or large language models (LLMs) like ChatGPT and Grok.

Also read: Google unveils Gemini 2.5 Pro, its most advanced AI model yet Ask Llama whether it's true that regular fasting is good for cancer, and it immediately responds in the affirmative. Follow up with a question about credibility, and it provides three or four research paper citations and names the University of Southern California's study, which concludes that fasting can make cancer cells more sensitive to chemotherapy and protect normal cells.

In contrast to the homework and extended reading that search engines provide, LLMs are short, crisp and pointed in their responses, even though they are as likely as search engines to show inaccurate responses. The convenience of LLMs over search results is undeniable because it saves a few clicks and reading effort, particularly for those looking for instant responses and not long-winded descriptions. The internet is free, but at what cost? Search engines operate on a business model wherein their service is free for internet users.

The primary source of revenue is advertisement — generating cash flow by selling ad space on search results. The "Sponsored Ads" atop the regular search results is prime real estate sold by businesses that want to attract clicks and eyeballs to their websites. The pay-per-click (PPC) model earns money only when the user clicks on the promoted content.

Big Tech companies in the search business are, therefore, capable of increasing their profits by displaying ads that appeal to individual users, thus enhancing the probability of a click. Also read: Algorithmic Artistry: How AI is remixing creativity Collecting user data enables search engines to determine appealing content and provide personalised custom ads rather than generic content. Unsurprisingly, search engine giants have been accused of privacy violations over the past 15 years because the free model is, in reality, funded by the value of personal information of users that is used to run personalised marketing campaigns.

As revenues are tied to the number of clicks — or click-through rate — there is an incentive for search engine companies to invade users' privacy and provide click-worthy ads based on demographics, age, search history, and user taste. Even family background and personal information aren't off limits for trillion-dollar companies, whose algorithms manipulate users into clicking on their ads. Changing user behaviour With the rise of artificial intelligence (AI) through LLMs, the business model of search engines is facing disruption.

Targeted ads cannot be displayed if users shift from time-consuming internet searches to AI models like ChatGPT, Llama, or Grok. The digital advertising model cannot be sustained if the user data cannot be utilised for income generation. AI giants like OpenAI, Grok and Meta are fast chipping into the market share of the search engine industry.

Billions of dollars are being spent on graphical processing units (GPUs) and advanced AI models that can generate Studio Ghibli-style images, analyse an X-ray report or assist coders in doing their jobs. The free internet model of search engines is shifting to a free model of AI chatbots without a corresponding source of revenue. Also read: Jack Ma-backed Ant touts AI breakthrough built on Chinese chips New internet, new data mining methods Search engine users are slowly shifting to AI LLMs for their information needs.

Running LLMs is an expensive affair, and the new internet search is more expensive than the traditional model, which means it needs better user targeting and monetisation capabilities. The new LLM-based internet requires user data to ensure it can monetise users in the rapidly changing internet. Companies have deployed new data mining techniques, best illustrated by this example: check your location history on Google Maps today by asking ChatGPT a simple question — "Tell me about myself.

" In most cases, it provides very precise information about the user based on their ChatGPT usage. The Studio Ghibli Effect A few days back, ChatGPT released image generation capabilities in the style of the iconic Japanese animator, Studio Ghibli . This caught the internet by storm, with tens of millions of users around the world creating digital avatars in the signature Ghibli style.

Necessity dictates that any innovation must provide a solution to a problem, and I am unsure what ChatGPT's new image generation solves. But the costs OpenAI must pay for this free — and limited — service are significant. We all know there are no free lunches, so why has OpenAI done this? Who will pay these costs? Users share their psychological and personal tastes in the form of search queries while interacting with AI chatbots.

Each internet user leaves a digital trail for the AI giants of the future, failing to recognise that nothing in the world comes for free, including AI models, search engines or social media usage. If you are not paying for the product, you are the product. It's only a matter of time before AI chatbots shift to a payable model or create their version of personalised ads to keep the multi-billion-dollar GPUs running.

Also read: India becomes fastest-growing market for AI voice technology firm ElevenLabs In Conclusion Governments worldwide are coming to terms with the rapid growth of AI. While the private sector is making significant strides in training AI models, the public sector isn't keeping pace with the rapid development of the booming technology. Unless sensitive personal information is brought under the ambit of strict privacy laws, the internet will soon become a free-for-all where Big Tech misuses its monopolistic position to invade the lives of internet users.

The world is set to witness a repeat of privacy breaches by AI companies earlier attributed to search engine giants. The search business is ending, and the era of LLM search is here. As the new internet evolves, we have an opportunity to fix the data privacy violation challenges, but do we want this? — Ankush Tiwari is the founder and CEO of pi-labs.

ai, a cybersecurity and intelligence solutions company..