Large concept models (LCMs) offer some exciting prospects. In today’s column, I explore an intriguing new advancement for generative AI and large language models (LLMs) consisting of moving beyond contemporary words-based approaches to sentence-oriented approaches. The extraordinary deal is this.
You might be vaguely aware that most LLMs currently focus on words and accordingly generate responses on a word-at-a-time basis. Suppose that instead of looking at the world via individual words, we could use sentences as a core element. Whole sentences come into AI, and complete sentences are generated out of AI.
To do this, the twist is that sentences are reducible to underlying concepts, and those computationally ferreted-out concepts become the esteemed coinage of the realm for this groundbreaking architectural upheaval of conventional generative AI and LLMs. The new angle radically becomes that we then design, build, and field so-called large concept models (LCMs) in lieu of old-fashioned large language models. Let’s talk about it.
This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI including identifying and explaining various impactful AI complexities (see the link here ). For my coverage of the top-of-the-line OpenAI ChatGPT o1 and o3 models and their advanced reasoning functionality, see the link here and the link here . Samsung Galaxy S24 Users About To Get A New Free Upgrade, Report Claims Zuckerberg Taps UFC President Dana White To Join Meta’s Board Of Directors Monday, January 6.
Russia’s War On Ukraine: News And Information From Ukraine Same Old Is Same Old There is an ongoing concern in the AI community that perhaps AI researchers and AI developers are treading too much of the same ground right now. We seem to have landed on an impressive architecture contrivance for how to shape generative AI and LLMs and few want to depart from the success so far attained. If it isn’t broken, don’t fix it.
The problem is that not everyone concurs that the prevailing architecture isn’t actually broken. By broken — and to quickly clarify, the issue is more of limitations and constraints than it is one of something inherently being wrong. A strong and vocal viewpoint is that we are hitting the topmost thresholds of what contemporary LLMs can accomplish.
There isn’t much left in the gas tank, and we are soon to hit a veritable wall. As such, there are brave souls who are seeking alternative architectural avenues. Exciting but a gamble at the same time.
They might hit the jackpot and discover the next level of AI. Fame and fortune await. On the other hand, they might waste time on a complete dead-end.
Smarmy cynics will call them foolish for their foolhardy ambitions. It could harm your AI career and knock you out of getting that sweet AI high-tech freewheeling job you’ve been eyeing for the longest time. I continue to give airtime to those who are heads-down seriously aiming to upset the apple cart.
For example, my analysis of the clever chain-of-continuous thought approach for LLMs merits dutiful consideration, see the link here . Another exciting possibility is the neuro-symbolic or hybrid AI approach that marries artificial neural networks (ANNs) with rules-based reasoning, see my discussion at the link here . There is no doubt in my mind that a better mousetrap is still to be found, and all legitimate new-world explorers should keep sailing the winds of change.
May your voyage be fruitful. Doing A Deep Dive On Upheaval The approach I’ll be identifying this time around has to do with the existing preoccupation with words. Actually, it might be more appropriate to say a preoccupation with tokens.
When you enter words into a prompt, those words are converted into numeric values referred to as tokens. The rest of the AI processing computationally crunches on those numeric values or tokens, see my detailed description of how this works at the link here . Ultimately, the AI-generated response is in token format and must be converted back into text so that you get a readable answer.
In a sense, you give words to AI, and the AI gives you words in return (albeit via the means of tokenization). Do we have to do things that way? No, there doesn’t seem to be a fundamental irrefutable law of nature that says we must confine ourselves to a word-at-a-time focus. Feel free to consider alternatives.
Let your wild thoughts flow. Here is an idea. Imagine that whole sentences were the unit of interest.
Rather than parsing and aiming at single words, we conceive of a sentence as our primary unit of measure. A sentence is admittedly a collection of words. No disagreement there.
The gist is that the sentence is seen as a sentence. Right now, a sentence happens to be treated as a string of words. Give the AI a sentence, and you get back a generated sentence in return.
Boom, drop the mic. Sentences Beget Concepts Which Beget Answers Making sense of sentences is a bit of a head-scratcher. How do you look at an entire sentence and identify what the meaning or significance of the sentence is? Aha, let’s assume that sentences are representative of concepts.
Each sentence will embody one or more concepts. If you closely inspect a sentence, perhaps you can ferret out the set of concepts that underlie it. The beauty is that we can then work with concepts and for the moment set aside sentences.
Say what? Yes, the steps are that we will take an entered sentence, computationally squeeze out the implied concepts, and we will then use those identified concepts as essentially our “tokens” (well, they will be numeric constructs that we can relate to other numeric constructs that are also concepts). After playing around with the concepts in their numeric configuration, the goal is to produce an output-bound set of numeric-represented concepts that captures what the AI has derived as an answer or response. People want to see text-based answers, so the outbound set of concepts needs to get turned back into text, consisting of sentences, and subsequently presented to the user.
Give the AI a sentence, the sentence gets juiced into concepts, the concepts are used to do processing, and the resultant answer that is in concepts gets converted back into a sentence. Some More Meat On The Bones All the heavy lifting takes place in something we will anoint as a concepts-only space. This is a mathematical and computational multi-dimensional structure that relates concepts in numeric formats to other concepts in numeric formats.
A quite nifty consequence arises. If this is done carefully, to some degree the language being used for the sentences is merely plug-and-play. Here’s why.
The usual generative AI or LLM data training gets somewhat trapped in the mainstay natural language used for data training, such as using English language content on the Internet as the scanned data. You can at times push the AI toward other languages, which has an interesting twist all its own, see my discussion at the link here . Anyway, since the core of this alternative concepts-based approach has to do with concepts in numeric formats, the sentences coming in and going out can be switched rather readily to whichever language you prefer.
This isn’t a no-brainer though and realize that added work would be required. The general point is that the universality across languages is an intriguing potential bonus. One assumption is that concepts are universal.
Do you think that’s the case or do the language and concepts go tightly hand-in-hand (see my coverage of that debate at the link here )? Unpacking The Suitcase You are now pretty much at the 30,000-foot level, which is sufficient to do some assessment of this innovation. First, we’ve covered that there are presumably three ways to see the world: Second, allow me to introduce to you six major steps entailing the sentences and concepts embodiment: I hope that helps to lay out the overarching precepts. Example Of LLM Versus LCM At Work Allow me to share with you a quick example of how a large concept model or LCM would work.
I’ll make the example simple to concentrate on the crux of things. It is show-and-tell time. Suppose I am aiming to put together a road trip.
Everyone loves road trips. You can log into most of the major generative AI and LLMs to ask for assistance in planning a road trip. Use whichever you like, such as OpenAI ChatGPT, Anthropic Claude, Microsoft CoPilot, Google Gemini, Meta Llama, etc.
If I used a conventional contemporary LLM, this is what it might look like: What happened under-the-hood? The LLM would word-by-word convert my prompt “Plan a road trip...
” into a series of numeric tokens covering the words that were entered. The tokens would be run through various architectural transformers, decoders, encoders, and so on. The tokens generated would eventually get converted back into words.
The result as you can see is the response by the AI that says “Day 1: Drive 4.5 hours to..
.” Internal Processing By An LCM In contrast, let’s show what the LCM’s steps would consist of (I’ll include the steps midstream, so they are easier to envisage): The output of the LCM could be the same as the output produced by the LLM. I’m not saying this is always the case, and in fact, it would rarely be the case.
You can say the same about different LLMs, namely that the output produced by a given LLM such as ChatGPT won’t likely be identical to the output from Claude. Handy AI Research About Large Concept Models I hope that you find this to be a fascinating topic. For the nitty-gritty details of the LCM approach, take a good look at a newly released research study entitled “Large Concept Models: Language Modeling in a Sentence Representation Space” by Loïc Barrault, Paul-Ambroise Duquenne, Maha Elbayad, Artyom Kozhevnikov, Belen Alastruey, Pierre Andrews, Mariano Coria, Guillaume Couairon, Marta R.
Costa-jussà, David Dale, Hady Elsahar, Kevin Heffernan, João Maria Janeiro, Tuan Tran, Christophe Ropers, Eduardo Sánchez, Robin San Roman, Alexandre Mourachko, Safiyyah Saleem, Holger Schwenk, arXiv, December 12, 2024, which made these salient points (excerpts): This is the kind of thinking outside the box we need to have for venturing beyond the norm of current times generative AI and LLMs. Where We Boldly Must Next Go Is a posited LCM approach the winner-winner chicken dinner? Don’t know, no one can say for sure at this time. Should we be avidly coming up with new ideas and stretching the boundaries of what we are doing at the keystones or base roots of AI? Absolutely.
Innovation is the watchword. It is too early to toss in the towel and declare that existing AI architectures are the end-all. We must be more open-minded.
If the prevailing conventional approach to AI is a potential dead-end (trolls – I’m not saying it is, just asking a reasonable question, thank you), I think we would all rather prefer that we have backup approaches already underway. Sitting around with a blank stare and saying woe is us just ought not to be a strategy. Multiple paths and a Darwinian battle involving creative ideas and novel AI approaches would seem a more prudent route.
Final thoughts for now. Albert Einstein famously said this: “Creativity is intelligence having fun.” Let loose with your creativity if you are in the AI field.
Don’t be myopic. Reach for the sky. As you do so, please have a modicum of practicality and do not go off the deep end.
The last word on creativity goes to Einstein once again in this equally popular quote: “Creativity is contagious. Pass it on.” Okay, enough said, the creativity baton is being passed along to you, so let’s all get to work.
.
Technology
AI Is Breaking Free Of Token-Based LLMs By Upping The Ante To Large Concept Models That Devour Sentences And Adore Concepts
Generative AI based on LLMs might be old hat. New approaches are brewing. One is the advent of large concept models (LCMs). Here's the inside scoop on the future of AI.