n0tls

Linguistic Musings


Everyone on the internet could be a dog

I have no idea what I'm talking about, this is all very gut feelings I'm having about the tools I'm working with and what I have learned about the systems in the brain that are in and around how memory works, my understanding the of the brain, my understanding of the mathematics behind the llms, and then what I understand of cellular biology. So I could be completely wrong but I thought I'd put this out into the world and see if anyone bites.

Memory MCP servers

I remember earlier this summer when coding agents were starting to pick up steam, that there was about a million and half MCP servers to manage long term memory for your LLM. Immediately when I saw the idea, I thought to myself, "Oh yeah duh, you can only hold so much in your working memory, you need to offload it to long term storage, and then be able to access it later". In general some system like that must occur within our own brains, the beauty of the human brain is surprisingly consistent in some aspects of thought, cognition. We get to learn about some of these systems because of things known as Aphasias.

Aphasias are a impairment of the ability for someone to speak/comprehend/write/sing. Very fascinating stuff, I recommend reading Oliver Sacks' works, Musicophilia is one of my faves. From that research along other diagnosis paired with some kind of speech impairment has unlocked understanding some of the systems and their behaviors. This is where I start to read wikipedia while I'm writing this and remembering how cool the field is.

But what this comes back down to is that I get the vibes that when things go wrong with LLMs they are exhibiting some behavior that has affected working memory, short term memory, and long term memory. I know that is silly to think of the mathematics of the LLMs and say that they map to what is happening in the brain, but let's play with that idea.

Some fun observations over time

When I was first playing around with the ADK from Google I noticed that we weren't able to reference something from earlier in the conversation. It was fun to do some testing in and around that bug. The LLM tackled it quick once it was described, but what really struck me was the way that LLM was either bullshitting me or being very confused about a secret word I had exchanged with it previously. However it has now made me think that a lack of conversation history somewhat maps to Anterograde amnesia.

When context compacts, and a summary is made, and the summary is faulty feels a lot like my memory at times.

When you get to rewind a conversation and go down a different path of logic and thinking feels like my brain mapping out all the possibilities of situations, and partially why I really enjoy rogue likes (I argue Claude Code is the second best video game out there).

The difference in the quality of responses when you are using the right vocabulary for a problem space feels like Priming, triggering unconscious thoughts for something that's about to happen.

I know a lot of these observations are heavily tainted by my background in linguistics, but I do think that the key to building something resembling AGI will be by mimicking the systems of the brain.

My Basic Understanding of LLMs and the Brain

I understand that the billions of parameters are basically being run through a whole bunch of matrix multiplication to eventually create a model that provides the most likely next letter as a response is being generated. What this means to most people is that these are just probability machines, not actually something capable of thought.

From there I think to myself, what is so special about the Human brain if we think it can create novel ideas with the same building blocks of life in comparison to other creations that we might not think exhibit thought?

If we were to say that the LLMs are modeling the idea that as the brain makes stronger and stronger connections over time that some 'tokens' or neural connections will become the path of least resistance to generating the next thought/token that a human brain is working through.

Conclusion

I think you can start to mimic more cognition in LLMs if you use them as a processing component of a bigger system that mimics human cognition. I hope that there is research going down this harebrained path, but I haven't heard much if anything about it.


Further Reading

Claude reviewed this post and surfaced the following research that is directly relevant to the ideas above.

Cognitive architectures for AI agents
- Cognitive Architectures for Language Agents — Sumers, Yao, Narasimhan & Griffiths (2023). Proposes exactly the framework described in the conclusion: LLMs as a component inside a larger, brain-inspired cognitive system.
- A Path Towards Autonomous Machine Intelligence — Yann LeCun (2022). Argues for a world-model architecture that draws heavily on dual-process (System 1 / System 2) thinking.

Memory systems
- Why There Are Complementary Learning Systems in the Hippocampus and Neocortex — McClelland, McNaughton & O'Reilly (1995). The foundational paper on how the brain splits memory labor between fast (hippocampal) and slow (neocortical) systems — the biological basis for the working memory / long-term storage intuition in this post.

Consciousness and Global Workspace Theory applied to AI
- Consciousness in Artificial Intelligence: Insights from the Science of Consciousness — Butlin, Long, Elmoznino, Bengio, Shanahan et al. (2023). Surveys Global Workspace Theory and other neuroscience frameworks as evaluation criteria for AI systems.
- Talking About Large Language Models — Murray Shanahan (2023). A careful look at what it means (and doesn't mean) to attribute cognitive properties to LLMs.

The probabilistic brain
- The Bayesian Brain: The Role of Uncertainty in Neural Coding and Computation — Knill & Pouget (2004). If the brain is also a probability machine, "just a probability machine" stops being a dismissal.
- The Free-Energy Principle: A Unified Brain Theory? — Karl Friston (2010). Frames the brain as a prediction engine minimizing surprise — a useful lens for thinking about next-token prediction at scale.