
How we went from legacy NLP to modern-era generative AI. In today’s column, I reveal how modern-day generative AI and large language models (LLMs) process natural language fluently via statistical pattern-matching versus how the old-fashioned AI systems, such as the mainstay of Siri and Alexa, relied on semi-clunky grammar rules instead. I bring this up because one of the most common questions I get asked is why prior AI did such a half-baked, irksome job of interacting with humans.
Indeed, conversations with those older natural language processing (NLP) systems were noticeably stilted and exasperating, while contemporary generative AI seems almost humanlike in carrying on conversations. What were the changes in AI that led to a tremendous uplift in NLP from being scratchy to becoming smooth sailing? Let’s talk about it. This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here ).
There are two main approaches to how natural language processing is devised: (1) Legacy NLP: The Rules-based Approach . AI developers set up AI that makes use of grammar rules so that the AI computationally examines sentences to figure out the syntactical and semantic elements of a sentence based on conventional natural language rules. (2) Modern NLP: The Data Patterns Approach.
AI developers set up generative AI and LLMs by doing data training on a vast array of human-written sentences, which then statistically and mathematically identify computational patterns underlying human writing, allowing the AI to mimic or parrot natural language. Some judicious unpacking on those two approaches might be insightful. The legacy approach to NLP consists of parsing sentences based on the fundamental grammar rules that you learned about in elementary school.
I’m sure you remember those rules vividly. You examine a sentence to figure out where the subject is, where the verbs are, where the nouns are, and so on. Step by step, you identify the sentence structure.
This ultimately enables you to interpret what the meaning of the sentence is. Your effort involves an analysis of the syntax or syntactic elements of a sentence. In addition, you undertake a semantic analysis, seeking to grasp the underlying message of the words that have been strung together in a particular formulation (technically, this includes the use of lexicons, ontologies, and other linguistic theories and apparatus).
This is conventionally known as the rules-based or symbolic method to interpret sentences. The modern approach to NLP consists of generative AI and LLMs leveraging large-scale pattern-matching of human writing, typically scanned across the Internet. The AI statistically ascertains how sentences are generally composed.
Some words we use more than other words. We use words in certain parts of sentences and follow them with other words in a detectable pattern. If this pattern matching is done in a large enough manner, the patterns provide a relatively reliable computational means of mimicking how humans write.
A special internal data structure in the LLM captures the mathematical mappings of how words tend to associate with others (this data structure is known as an artificial neural network or ANN; see my explanation at the link here ). Based on this elaborate and extensive computational pattern matching, a sentence that you feed into generative AI can be seemingly responded to with response sentences that generally conform to what a human might have said to the query or question that you’ve asked. This is conventionally known as data-patterning or a sub-symbolic method.
A quick comparison will highlight the similarities and differences between the two NLP approaches. First, in the rules-based approach, a nifty aspect is that grammar rules are easy for humans to comprehend. AI developers can program AI with those rules.
Using those rules, the AI cranks through sentences and takes them apart, piece by piece. It is almost like when a teacher tells you to take apart a sentence and indicate what the elements of the sentence are (note that I’m not implying that AI and the human mind are on par, which to clarify, they aren’t). Unlike the grammar rules angle, the generative AI LLM approach is simply associating data in the form of text and words with other such data.
No rules per se are needed or used. Likewise, AI developers don’t need to feed generative AI the rules of natural language. The AI developers rely instead upon the massive scale pattern matching to automatically find patterns in how humans compose sentences.
Therefore, a crucial aspect involves the AI developers sourcing a large enough amount of writing that can be sufficiently patterned on. Too little data probably won’t be enough to land on useful and usable patterns. With the rules-based approach, there isn’t a similar need to feed in tons of writing samples.
You only need enough to be able to test that the parsing rules are working the way you expect them to function. A downside of the pattern-matching approach is that the mathematical and computational patterns tend to be so complex that there isn’t a straightforward way to be able to pinpoint how the generative AI is figuring out the responses that are being generated. Sure, you can trace that this number became this number, and that number became that number, but there aren’t readily apparent grammar rules that you can point to in the AI.
In that manner, the rules-based approach is somewhat easier to debug and see what the AI is doing under the hood. The rules-based approach is considered more predictable or deterministic. The pattern matching approach is less predictable and said to be non-deterministic because it uses statistics and might at times veer off course (this leads to so-called AI hallucinations, which I discuss in detail at the link here , which is when the AI wackily emits sentences that seem correct but are not grounded in facts or truths).
Okay, so in a head-to-head battle, which of the two approaches is the winner? It partially depends on what you are using as the criteria for being the winner. If your aim is fluency, the data patterning approach comes out ahead of the game. But if you want to have preciseness and high predictability, you might opt to stick with the rules-based NLP.
That’s partially why the legacy Siri and Alexa didn’t overnight switch over to generative AI as the underlying NLP. The concern of the vendors was that if they made the switch and their AI started doing oddball things, people would get darned upset (rightfully so). It made more sense to keep the legacy NLP in place, ensuring dependability, and meanwhile be gradually moving cautiously toward the modern-era NLP.
For a deeper analysis of the NLP differences, see my coverage at the link here . I will give you a brief example of how a sentence would be handled in each of the two respective approaches. The sentence I will use is this: “The cat chased the mouse before hiding under the couch.
” Take a close look at the sentence. Put on your grammar rules hat. Do you remember enough of your grade school English classes to parse the sentence? I’m sure that you quickly came up with these parsed elements by examining each word in the sentence: "The" → Determiner.
"cat" → Noun. "chased" → Verb (past tense). "the" → Determiner.
"mouse" → Noun. "before" → Subordinating conjunction. "hiding" → Verb.
"under" → Preposition. "the" → Determiner. "couch" → Noun.
"." → Punctuation. The sentence structure can be depicted this way: Subject: [The cat] Verb: [chased] Object: [the mouse] Subordinate Clause: [before hiding under the couch].
The semantic interpretation would be something like this. There are two agents involved: a cat and a mouse. Cats typically chase mice.
That’s nothing out of the ordinary. Hiding under an object is a common spatial relation. This also seems to be a relatively common or expected activity.
That’s about all we can say. We are limited to identifying any bigger picture of the meaning, such as an emotional tone, since we aren’t informed whether the cat is playing or hunting. Let’s go ahead and have a user enter the same sentence into a contemporary generative AI app.
The sentence still is: “The cat chased the mouse before hiding under the couch.” The first step consists of the AI turning the words into numbers. The numbers are referred to as tokens, and the conversion of words into tokens is known as tokenization.
Sometimes, words are split into sub-parts, and more than one token is used to represent the given word. See my discussion at the link here for a step-by-step indication of how tokenization works. Here is an example of the words being converted into their numeric token values.
"The": 464 " cat": 9226 " chased": 3372 " the": 262 " mouse": 19530 " before": 960 " hiding": 23478 " under": 818 " the": 262 " couch": 10550 ".": 13 Those numbers have no particular meaning for you and me. They are merely internal numbers in the AI that will be used to statistically associate these tokens with other tokens based on the initial overall data training and mathematical and computational pattern matching that was undertaken.
The tokens are mapped into the internal structures of the AI. This is referred to as using the tokens in a high-dimensional vector space that will show their associations with other tokens. For example, the token 9226 (which represents “cat”) would undoubtably be closely statistically associated with token 19530 (which represents “mouse).
This makes sense since if you looked at tons of sentences on the Internet you would certainly discover that the word “cat” and the word “mouse” are often used in the same sentence or sentences that are very near to each other. Likewise, the token 3372 (representing the word “chased”) would be closely associated with both the token 9226 (“cat”) and 19530 (“mouse”). After doing that inspection and look-up, generative AI is devised to respond to a prompt that a user enters.
Thus, after mapping the entered sentence, the generative AI would assemble tokens that would respond to the prompt. Those tokens are then converted back into words. The generated response might be this: "The sentence describes a cat pursuing a mouse and then seeking cover under a couch, possibly after the chase ended.
" Notice that this mimics the kind of reply that a human might have said to that same sentence. Again, this is based on pattern-matching of human writing. As you’ve perhaps gleaned, the NLP rules-based or symbolic approach is somewhat rigid due to being dependent upon programmed grammar rules.
The AI developer might have inadvertently left out some needed grammar rules or failed to specify all of them. Sad face. The data patterning or sub-symbolic approach tends to be more fluent, flexible, and context-aware.
Happy face. However, as noted, it is less predictable and can even produce confabulations. Ugh.
It seems that you are darned if you do and darned if you don’t. Right now, there is little question that the data patterning approach is edging out the legacy NLP approach. People crave fluency.
On the other hand, if you are providing an NLP in a life-critical setting, such as a doctor’s usage for medical care, you might lean toward predictability over fluency. I’ve got an “Aha!” moment for you. Please do not fall into the trap that many do, namely thinking that we ought to dump the rules-based approach into the deep blue sea.
That would be a mistake. The rules-based approach is quite handy in circumstances where you want the NLP to be highly predictable. The data patterning approach can get you into trouble if it veers off into never-never land.
Accordingly, some advocate for a hybrid approach (see my in-depth discussion on the neuro-symbolic hybrid methods at the link here ). Yes, fortunately, you can combine the rules-based NLP with the data patterning NLP. If done successfully, you get the best of both worlds.
I should note that if done poorly, you are bound to get the worst of both worlds. It’s a dual-edged sword that way. Congrats on having learned about legacy NLP versus modern-era NLP.
You are decidedly in the know. Good for you. As per the great words of Benjamin Franklin: “An investment in knowledge always pays the best interest.
”.