Ever wonder how you can navigate a new neighborhood fairly easily, or figure out a complex project at home? You probably manage it without breaking much of a sweat, finding your way or lining up the steps without mapping every single option. Now, think about artificial intelligence. While AI can crush specific games or crunch numbers, building an AI that navigates the messy, partially known real world like we do is still a huge challenge. Why are we so good at this complex planning, often finding solutions that seem impossibly hard for computers? And why do lab tests sometimes show us taking paths that aren’t technically the absolute ‘best’?
This puzzle is key to understanding intelligence, both ours and the artificial kind. Standard AI often sees planning as exploring a giant branching tree of choices and results. The bigger the tree, the tougher the problem. But humans clearly don’t operate that way. We don’t seem to carry around a perfect, detailed blueprint of the world. A team of researchers from Dalhousie University, University of Waterloo, MIT, and Cornell University has a fascinating alternative idea. What if our mental maps aren’t like static pictures, but more like flexible computer programs?
Marta Kryven, Cole Wyeth, Aidan Curtis, and Kevin Ellis suggest our knack for planning comes from a core belief: the world usually follows predictable patterns. Instead of memorizing every last detail, maybe we build mental models using compact programs that capture repetition, symmetry, and reusable chunks. Think of recognizing the standard layout of office floors or the way streets often form grids. This “concepts as programs” idea pictures our brains as constantly looking for the world’s underlying code to navigate efficiently. Let’s dive into their study.
Why blueprints and brute force fall shortWhy is thinking about maps as programs potentially a game changer? Look at how typical AI handles planning, especially when it doesn’t have all the information. This situation is often modeled as a POMDP, or Partially Observable Markov Decision Process. Finding the best solution usually involves calculating odds for every possible scenario and planning across all that uncertainty. This approach quickly becomes overwhelmingly complex, even for fairly simple environments. It just doesn’t feel like the smooth way humans get around.
Plus, there’s that weird disconnect. We handle the structured complexity of real life really well. Think city grids, modular furniture, trails in a park. But put people in simplified lab tasks designed without clear structure, and they often don’t follow the mathematically ‘optimal’ path. Researchers used to chalk this up to mental limits, like only thinking a few steps ahead. But Kryven and her colleagues think that might miss the point. Maybe we aren’t flawed planners. Maybe we’re just incredibly good planners specifically for the structured kind of world we actually live in. We look for patterns, and we use them.
AI researchers have tried to tackle complexity with strategies like hierarchical planning (breaking big problems into small ones) or recognizing similar game states. But automatically learning and using the kind of “common sense” structural knowledge we have remains a major hurdle.
Meet GMP: Planning like a coderTo put their idea to the test, the researchers built a computer model called Generative Modular Planning, or GMP. This model works on the principle of cognitive maps as programs. It doesn’t store an exact picture of a place. Instead, it figures out a simple program that captures its basic structure.
GMP has two main parts:
This way of planning is smart within each module. It finds the best path inside that recognized piece. But connecting these smart local paths might lead to a global route that’s slightly longer than if a planner looked at the whole map perfectly. This possibility of clever, efficient, yet maybe slightly indirect routes was exactly the kind of human-like behavior the researchers were watching for.
So, do people actually plan like the GMP model? The team used a Maze Search Task to find out. Thirty participants navigated 20 different mazes on a computer, seeing the world from a first-person view. Parts of the maze were hidden until they moved close enough. Their goal: find the hidden exit, marked by a red tile.
These weren’t just any mazes. They were specifically designed with clear, repeating structures. They had modular layouts made of distinct pieces, like certain room shapes or hallway sections. This setup was perfect for seeing if people would naturally explore module by module, or if they’d take shortcuts cutting across modules if that seemed mathematically shorter, as traditional optimal planners might predict.
The team compared people’s paths to three different models:
The mazes were designed so the traditional models would usually suggest non-modular paths, making it easy to see which strategy people preferred.
We are modular plannersThe findings were pretty clear. People overwhelmingly used modular strategies. They explored the structured mazes chunk by chunk, moving systematically from one recognized section to the nearest next one. This wasn’t just a fluke; it was the consistent pattern across different maze designs and most participants.
The researchers looked closely at “discriminating decisions”. These were points in the maze where the GMP model suggested a different move than the traditional models. In these key moments, GMP did a significantly better job predicting what people would actually do. People weren’t just being randomly inefficient; they were being systematically modular. Their behavior lined up beautifully with the strategy you’d expect if they were using program-like mental maps.
One of the really neat parts of this study is how they used the LLM. It wasn’t making decisions. It was acting like a stand-in for human structural perception. Because LLMs are trained on mountains of human writing and code, they seem to absorb common ways humans structure things, including spaces. When asked to write a program for the maze, GPT-4 came up with structural breakdowns, the chunks and rules, that matched how people later navigated.
This hints that LLMs might be useful for more than just generating text. They could potentially help us understand the built-in assumptions and mental shortcuts, the “inductive biases,” we humans use to make sense of everything. Here, it helped translate a visual maze into a useful, code-like structure perfect for efficient planning.
Research: A periodic table for machine learning
Changing how we think about mental maps and AIThis research challenges the old idea of cognitive maps as simple, static pictures in our heads. Thinking of them as active, generative programs makes computational sense. It explains how we handle the complex, uncertain real world with limited brainpower. It explains our efficiency in structured places, and maybe even why we sometimes take paths that aren’t mathematically perfect but are much easier to figure out and remember.
For artificial intelligence, this offers a practical path forward. The GMP model shows the power of finding structure first, then planning modularly. AI agents built this way could potentially navigate complex, partly known environments much more efficiently, needing far less memory and processing power. It points towards AI that plans more like we do, by spotting patterns instead of just crunching possibilities.
Sure, there are still questions. The current GMP model makes simple assumptions about moving between chunks. Future research needs to explore how we might prioritize certain areas based on past experience or current goals. How do we adjust our mental programs when the world doesn’t match our expectations? How much do our goals influence the structures we perceive? Even with these open questions, this study gives us a powerful new way to think about how we find our way.
In the end, it suggests something profound about us. Our amazing ability to navigate and act effectively in our complex world might come down to our brains being expert pattern-finders, constantly spotting the underlying code of the structured reality around us and representing it not just as a scene, but as a program ready to run.