In today's column, I explore an innovative design approach that rejiggers the inner workings of generative AI and large language models (LLMs). The deal is this. Suppose that instead of being hyper-focused on words and language, we leveraged the internal mathematical representations within AI that seem to reflect a form of computational reasoning. By leaning heavily into those internal mechanisms, AI might be speedier and possibly improve in novel ways that could be quite remarkable.
The catchy moniker given to this unusual path is that we would aim to firmly establish a so-called chain of continuous thought.
Is this a solid approach or will it be unable to move the needle?
Let's talk about it.
This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI including identifying and explaining various impactful AI complexities (see the link here). For my coverage of the top-of-the-line ChatGPT o1 model and its advanced functionality, see the link here and the link here.
Before I dive into the AI particulars, there are facets concerning human thinking and the crucial role of natural languages that we ought to discuss.
Some contend that the human mind is formed by the nature of everyday language. In fact, there is a well-known line that language is everything when it comes to intelligence and the advent of humankind. Humans outdo animals due to our advanced thinking processes, which in turn are claimed to arise due to the adoption of language. Language is what spurs our brain and mind to formulate into intelligence.
A famous oft-cited example consists of the assertion that Eskimos purportedly have a multitude of words for snow beyond the vocabulary of non-Eskimos. By having numerous ways to refer to variations and subtleties of snow, overall thinking processes regarding snow are seemingly more in-depth. You could say that language is then the means by which we shape our views of the world. This is a popular linguistic relativity theory and is generally referred to as Whorfianism or the Sapir-Whorf hypothesis (partially associated with anthropologist Benjamin Whorf).
All of that seems quite compelling and serves as a prevailing belief about the integral intertwining of mind and language.
Not everyone necessarily buys into that notion.
Some studies of the brain via the use of MRIs have argued that the language areas of the brain are at times not actively engaged when we are amid deep thinking. The language elements remain inactive. For example, if you are thinking deeply about solving a difficult math problem, only the reasoning portion of your brain is at work (according to some research studies). This raises questions about the linkage between thinking and the crutch of language.
It is a proverbial chicken-or-egg conundrum.
One viewpoint is that language drives our ability to think. A contrasting perspective is that our thinking is essentially independent of language and that we only end up using language as a form of communication. Language isn't a driver; it is just along for the ride.
The topic is a highly complex and hotly debated consideration. I'm not going to try and settle the contentious issue here. My aim merely was to raise the topic so that we can see how this might in some analogous manner be applied to the design of AI.
Let's do that.
I'd like to shift gears and move into the realm of AI.
You might be aware that contemporary generative AI and LLMs are usually based on the processing of words. This type of AI is based on scanning a wide swath of data across the Internet, such as essays, narratives, poems, and the like. Major generative AI apps including OpenAI ChatGPT, Anthropic Claude, Meta Llama, Google Gemini, Microsoft Copilot, and others rely upon the pattern-matching of human writing found online.
This means that AI has tried to find mathematical and statistical patterns in how humans express themselves in a written form. The AI then computationally seeks to mimic that writing and does so with a remarkable imitation of fluency (additional details of how AI works are discussed in my posting at the link here).
When you enter words into a prompt, the words are turned into numeric values that are subsequently used during the internal processing within the AI. This is known as tokenization, namely converting words into tokens or numeric values. After the AI conducts its internal machinations, the resulting devised response is initially in token format and has to be detokenized back into words. For my detailed coverage of the tokenization process, see the link here.
You could reasonably claim that the whole kit-and-kaboodle is being driven by words. In that manner, it is equally reasonable to say that contemporary generative AI and LLMs are shaped around natural language. They tend to take in natural language, process the natural language, and output natural language.
As an aside, this isn't quite always the case since there are uses of AI that involve text-to-image, text-to-audio, and text-to-video, see my coverage at the link here. The acknowledgment is that AI is not limited solely to language per se and can be devised to incorporate visual and auditory data too.
I want to get another crucial element about AI onto the table so that we can piece things together. Hang in there.
The latest advances in generative AI have gone whole-hog into the use of a chain-of-thought (CoT) as a key component inside the AI. A notable example consists of OpenAI's latest ChatGPT o1 advanced AI model, see my in-depth analysis at the link here. A core approach underlying the design and construction of o1 integrally exploits CoT techniques.
Allow me a moment to explain why chain-of-thought is so notable.
Chain-of-thought is an overall common phrase often used when discussing human thinking and reasoning. A person playing a chess game might contemplate their next move. Rather than rashly moving, they are likely to also contemplate their subsequent move. In their mind, they create a kind of chain of thoughts about some number of moves and countermoves that might arise. Based on that chain of imagined steps or moves, they decide what actual move to next make.
Rinse and repeat.
In the AI field, these same concepts have been applied to AI systems of various kinds. For example, an AI system that plays chess will look ahead at many moves. That is partially why chess-playing games are so good these days at playing chess. Whereas a human might mentally be limited to assessing a handful of moves ahead, an AI system could look at a much greater depth by utilizing vast computational resources.
Please be cautious when using the chain-of-thought phrase in the context of AI since it implies that AI can form human thoughts, which is not the case at this time. Referring to chain-of-thought in an AI context represents an unfortunate anthropomorphism of AI. Anyway, despite that qualm, the idea is that if humans think in a series or chain of thoughts, perhaps it is prudent to devise AI to work in a chain-like fashion too.
Aha, you are now ready to get to the grand reveal.
First, here is the bring-it-all-together setup. I will walk you through a highly simplified sketch of how conventional generative AI's inner workings tend to take place. Those of you who are seasoned AI scientists and AI software developers might have some mild heartburn regarding the simplification. I get that. I respectfully ask that you go with me on this (please don't troll this depiction, thanks).
Envision this. A person enters a prompt into generative AI. The words are converted into tokens. Those tokens begin to flow down a type of processing assembly line. So far, so good.
At various stopping points along the processing line, tokens are fed into specific processing components.
Each processing component performs various calculations. Once the result of each component has been arrived at, the component provides as output more tokens. Those tokens then flow along the processing line to the next component. On and on this goes.
Assume for the sake of discussion that each of those processing components is devising a said-to-be chain-of-thought. In other words, the component receives tokens, processes the tokens, formulates an AI-based chain-of-thought, and then uses that CoT to produce more tokens. Those tokens are emitted from the component so that the tokens can continue along to the next component for further processing.
It goes this way overall:
Let's give some reflective thought to this process. The entire process is fully dependent upon tokens. Tokens are going into components and coming out of components. They permeate the processing.
Tokens, tokens, and more tokens.
Meanwhile, within the components, there are chain-of-thought formulations. Those CoTs are essentially being discarded after each component does its business. The CoT produces tokens, and those tokens are what keep things going. Once the tokens come out of a component, the CoT inside the component no longer has any use to us. It just sits there and languishes.
You might be saying to yourself, yes, I get this, but why is this worth pointing out?
I'm quite happy that you asked.
Remember that we earlier talked about the intertwining of thinking and natural language. Recall that some believe the two are integral. Others say that maybe thinking can sit on its own and merely use language when necessary.
Well, in a broad way, you could roughly suggest that the generative AI internal processing is utterly immersed in language. Those tokens, which are essentially words, drive the entire processing endeavor. We go to a component and feed in language. The component undertakes its internal mathematical and computational effort and then goes out of its way to purposely produce tokens so that the rest of the assembly line functions based on language.
Maybe we can rejigger this.
Put on your thinking cap and let's rework this assembly line.
Suppose that instead of dealing with tokens flowing here and there, we took the chain-of-thought and moved that down the line. It goes like this. A component would receive a chain-of-thought from some other component. The component receiving the CoT uses that as the grist for doing whatever the component is supposed to do. The result from the component is yet another newly devised chain-of-thought that then flows further along the line.
At some point, perhaps toward the end of the entire processing cycle, we could revert to using tokens. Those tokens then are finally converted back into words to display a readable response to the user. Voila, tokens are sparingly utilized versus overwhelmingly utilized.
This doesn't necessarily need to be done to an extreme. For example, maybe we decide that just some components will use this approach, while others will continue to rely on the processing of tokens. That's perfectly fine. We can make changes to some areas of the processing and leave other areas as they customarily function.
What's the payoff to this refurbishment?
First, the computational effort required for constantly producing tokens throughout the processing cycle is reduced significantly. Simply stated, if you aren't continually using tokens then there isn't any need to generate them per component. We have streamlined that aspect.
Second, by having less processing of tokens this tends to suggest potential savings on the time and cost of processing overall. It would presumably be cheaper since there would be fewer computational charges piling up. It would presumably be speedier since we have eliminated to some degree a middleman step that didn't seem to be providing noteworthy added value.
Third, maybe we might discover that by relying on chain-of-thought rather than tokens, we could improve or embellish the chain-of-thought approach on a grander scale. AI scientists and AI researchers might be freed up from a mental preoccupation or distraction regarding tokens and become especially laser-focused on chain-of-thought.
The mind-bending idea is that rather than being constrained within the language space, we alter our mindset to focus on the so-called reasoning space. The reasoning space is reflected in the chain-of-thought elements. Tokens are reflective of the language space. The chain-of-thought as a root consideration is what we will now flow along the assembly line.
It is a continuous use of chain-of-thought in the sense that one chain-of-thought after another is moving along and has become a core processing substance. We might refer to this as a chain of continuous thought (side note: this naming choice admittedly still has that anthropomorphic feel, but we'll go with the zest).
Conventionally, a chain-of-thought is a hidden state, embedded within a component. Those chain-of-thought instances do not normally see the light of day. They now become the star of the show, somewhat, internally at least. The beauty of concentrating on CoTs is that we could then apply other clever techniques. For example, we might be able to optimize using methods such as a gradient descent technique (see my discussion at the link here).
Remember too that we aren't eliminating the use of tokens and instead will be converting back and forth to intermittently use the language space or the reasoning space, as appropriate. Sometimes the language space will be the best course of action. Sometimes the reasoning space will be the best course of action.
We can pick and choose.
A heads-up note is that as mentioned earlier, this is just a macroscopic high-level sketch to give you a semblance of what this conception consists of. There are tons more details involved.
Those of you that are wanting the AI nitty-gritty should consider reading a fascinating paper entitled "Training Large Language Models To Reason In Continuous Latent Space" by Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason Weston, Yuandong Tian, arXiv, December 11, 2024, which made these salient points (excerpts):
I applaud these kinds of outside-the-box AI research efforts.
I'll offer a few final thoughts and then conclude.
Believe it or not, in some respects, the AI field at times is stale.
That probably seems shocking. We daily get announcements about this new AI or that new AI.
Here's the rub. We need to avoid falling into a mental trap of doing the same things over and over. To some degree, I liken this to the movie business. Having built lots of AI systems for the entertainment industry, one aspect I observed overall is the willingness to keep producing movies that are part of the mainstream. There is immense risk aversion at hand, including not venturing readily toward independent films or taking chances on creative plots.
Don't let AI become stagnant in the sense of dogmatically pursuing AI in only one particular way. Put all options on the chalkboard. Continuously evaluate and reassess. Try new things.
I am reminded of a famous remark by Albert Einstein: "We cannot get to where we dream of being tomorrow unless we change our thinking today."
A truism worthy of tokenizing and thoughtfully processing on a continuous basis.