Urban change usually sneaks up on us. A new café here. A painted overpass there. But what if you could see an entire decade of a city’s visual transformation, automatically captured, sorted, and explained by AI?
That is exactly what a new research project called Visual Chronicles set out to do. Developed by researchers from Stanford and Google DeepMind, this system used multimodal large language models (MLLMs) to analyze over 40 million Google Street View images from New York City and San Francisco. It spotted trends humans would not easily notice.
The impossible problem of scaleTracking small changes over time is nothing new in computer vision. But most previous work needed labels or focused on specific things like cars or faces. This project was different. The goal was open-ended: what changed most often in these cities, over a decade?
Simple question. Brutally hard in practice.
Large language models are good at reasoning about images, but they struggle when the dataset grows beyond a few thousand images. Visual Chronicles was dealing with millions. So the researchers designed a bottom-up strategy. First, detect tiny local changes like a new sign or a removed tree. Then, cluster them into broader city-wide trends.
AI’s detective work on the streetsHere is how it worked in action:
This hybrid approach let the system detect subtle changes. Outdoor dining setups after COVID-19. New solar panels on rooftops. All spotted without drowning in data or generating abstract answers like “economic growth.”
So what did it find?In New York City, the AI spotted a dramatic rise in:
In San Francisco, the decade’s signature trends looked different:
The researchers also focused on the pandemic period, capturing how city streets adapted after 2020. Outdoor dining exploded in San Francisco, with 1482 new setups recorded between 2020 and 2022 alone.
And then there was the blue overpass. A freeway section in San Francisco was painted ‘Coronado Blue,’ a detail spotted 481 times in Street View images after 2020.
In New York, the system was also used to track retail store changes. It revealed two opposite trends:
Because why not. Researchers ran a final experiment, asking the AI to look at random images and find “unusual things.”
The winner? Giant abstract sculptures scattered across New York City. Over 200 instances of public art installations, all grouped by the model.
Do AI models trust their regulators?
Why this matters far beyond Street ViewVisual Chronicles shows how future AI tools could let companies, governments, or researchers track changes in any large visual dataset. Satellite images. Factory floors. Any place that changes over time.
It is also a warning. AI does not just “see” images. It explains them back to us in ways that shape what we believe is happening. The more we trust these automated trend reports, the more we need systems that balance AI’s speed with human caution.
Visual Chronicles is an early example of that kind of system. It is precise enough to find real patterns, scalable enough to handle millions of images, and grounded enough to leave the storytelling supported by evidence.