The Hippocampus of AGI
Are LLMs AGI or just a small piece of a bigger system? Written 9 days ago.
The Imperial March
We’ve built the hippocampus of AGI. The LLM. It’s 2024 and the state of the art is quite amazing and will undoubtedly continue to revolutionise many knowledge-based industries. LLMs will force humans to rethink our relationship with tokens (tokens being subword units). How many professionals lose > 20% of their time simply parsing tokens? Doctors analysing medical records, lawyers interpreting legislation, and software developers glaring at documentation. Since its inception, the internet’s net token count has increased exponentially and shows no signs of slowing down. Net token verbosity might share characteristics similar to the total entropy in the universe. Verbosity when weaponised against you serves as an excellent smoke screen. Have you ever read Instagram’s privacy policy? Probably not, nor will you. Ponder for a moment on how Meta’s lawyers (or perhaps Llama 3) inject thousands of words into those documents to obfuscate the number of ways they’re decimating our privacy in the quest for a stronger performance in Q3. Somewhere in that document lies the sentence “you are our product”—just written with different tokens.
Okay, so LLMs are very powerful and great. But does ChatGPT passing the Turing test warrant the imperial march towards AGI that’s currently happening in Silicon Valley? Probably, but the predictions are way too far off. AGI is not close. The error bars on the best predictions are insanely wide.
I’ve learned to take predictions from optimists like Altman and Musk with a boulder of salt because their timelines are contracted significantly (estimated AGI 2025 and 2026 respectively). The real purpose of an optimistic expectation is to cultivate a sense of maniacal urgency to magnetise the world’s best engineers toward the right problems early on. The expectations of when consumers will get their hands on AGI exist on vastly different timescales. The same applies to fusion. For those tirelessly working toward infinite clean energy, the promise is always just 5 years away! However, to consumers, commercial fusion won’t become a reality until reactors can be miniaturised and mass-produced at scale. That sounds like more of a 10+ year project. Unless we had quantum computers to unlock a new frontier of materials science, but that’s a topic for another post.
So how can we call the omnipotent being, the great o1, AGI when it’s imprisoned on Apple Silicon? How can The AGI help pack your groceries when it has no concept of touch? The definition of Artificial General Intelligence is very murky and not well standardised but for the sake of completeness, I hope all experts are including successful physical emulation of human activity in the real world. Otherwise what good would The AGI be?
My prediction is that whatever form AGI takes, it will incorporate some type of LLM architecture as a substructure to its superstructure. LLMs have incredibly fast information retrieval capabilities and, combined with their deep attention, could serve as excellent memory banks. The AGI’s reasoning cortex can generate higher-order plans whilst communicating to the hippocampus for memory retrieval to better understand the context of the environment. For example, a Roomba iRobot is roaming the floors of a hotel and is about to enter room 13 through the doggy door but it queries its recently updated memory bank before finding out there are guests scheduled to be there, skipping that room. While the memory bank was an LLM, the path planning came from a different system.
Being a Scale Maximalist
It will most likely pay off to be a scale maximalist. The $100 billion Microsoft data centre intended for OpenAI will produce a neural net of unimaginable scale that could result in emergence. If GPT-4 has 1.7 trillion parameters and the average human brain has 100 trillion) (oversimplifying a synapse to be equivalent to a parameter) then even our the state of the art LLMs are underparametrised by a factor of almost 100. That’s if Elon doesn’t get there first with the Colossus cluster. It’s already an amazing feat to make 100,000 NVIDIA Hoppers coherent so imagine the results with 10x that.
If scaling laws hold, only a few companies—Google, OpenAI, and xAI—might capitalise 1 million interconnected GPUs. I’m tempted to keep OpenAI on that list since they’re constrained by their relationship with Microsoft whereas Google owns a physical money printing machine and Elon has infinite resources and a bone to pick. However, the scaling laws won’t matter if we run into the data wall first.
Hitting the Data Wall
Yes, we will eventually hit the public data wall probably by 2032. There is a lot of data but it is not infinite and despite the scale of internet training data, it is unlikely to contain the tokens necessary to discover a smooth solution to the Navier-Stokes equation and advance cancer drug discovery.
At worst, we end up producing a machine that perfectly emulates human behaviour and at best the 10 GW data centres of the future produce an LLM with emergent capabilities—but this we don’t know for sure. It’s still just a huge bet.
If we stick to our current data-inefficient but compute-efficient paradigm (no reason to deviate yet) then will have to come up with a way to traverse past the public data wall. There is a reason I’m calling it the public data wall. We live in the Zettabyte era ( bytes) so it’s hard to imagine that our AI models are chewing up a significant portion of that exponent term. But our private data reserves are equally impressive. JP Morgan alone holds 450 petabytes in the chest—locked away from all non-staff. There’s an argument to be made that private data could hold more value in unlocking cancer-curing AGI over scraping all of Wikipedia and Reddit. Who knows, maybe a clunky blog written by a conspiracy theorist from his aunt’s basement could hold the answer to unlocking the Riemann Hypothesis—but probably not.
Even the slightest chance of emergence is worth the investment. Many astute individuals claim it is self-evident that LLMs cannot supersede human-level intelligence since a transformer-encoder-decoder array of matrix-multiplying machines cannot reason. But what is reason? Do we understand how humans can reason? LLMs are already showing inklings of reasoning by learning overarching structures such as language syntax and using that to bidirectionally translate low resource languages.
So maybe what’s needed is for JP Morgan, Meta and Pfizer to sign licensing deals with AI labs so we can barrage through the data wall without having to turn to lame and uncreative compromises like synthetic data generation. Seriously, whoever thinks it’s a good idea to let these hallucinogenic models train themselves hasn’t listened to a musician endlessly practice a song on a poorly tuned instrument. Without access to higher-quality input, they’ll only reinforce the flaws, no matter how much they practice. There might be an exception with easy-to-verify training sets like NP problems and computer programs. Certainly not natural language.
What About The Rest of the Superstructure?
As amazing as LLMs are (and will continue to be), the race to build the pre-frontal cortex is more vital.
If AGI can be a network of different models that have been designed to communicate with each other, over extremely low latencies to transmit and receive signals, each model can specialise in certain tasks, similar to how our brain has been architected. The occipital lobe specialises in vision, the parietal lobe in spatial awareness etc. The billion-dollar question is: How will different models built by different tech companies communicate with each other to produce a stronger super system? Could this be the TCP/IP moment of AI?