Long-form writing has always been a challenge for AI. While large language models can churn out short, well-structured responses, they falter when faced with lengthy, complex documents—the kind that require reasoning, organization, and adaptability. A new study, published in arXiv by researchers from KAUST and The Swiss AI Lab IDSIA, introduces a framework designed to address exactly this problem: Heterogeneous Recursive Planning (HRP).
Why AI struggles with long-form writingAI-generated text often follows a rigid, predetermined workflow. Most models rely on an outline-first approach, where they create a structured plan before filling in the content. While this method helps with coherence, it lacks adaptability—a critical flaw in creative writing, technical reports, or any form of writing that evolves dynamically.
Think of a mystery novelist introducing an unexpected plot twist mid-chapter. A human writer would adjust the storyline, refine character motivations, and weave the new development into the existing narrative. Traditional AI models stick to the original plan, often producing awkward, disjointed writing when new elements arise.
How HRP mimics human writing adaptabilityThe researchers behind HRP propose a more flexible system. Instead of forcing AI to follow a fixed outline, their model breaks writing into three interdependent tasks:
Rather than outlining first and writing second, HRP interleaves these processes, allowing AI to re-plan dynamically. The model uses recursive task decomposition, meaning it breaks down writing into smaller subtasks and adjusts them as needed—much like a human revising their thoughts while writing.
Traditional writing AI operates on a sequential logic: plan, then execute. HRP treats writing as a fluid process where tasks influence one another in real time. If a new fact emerges mid-writing, the system doesn’t just insert it awkwardly—it recalculates its relevance, revises related sections, and adjusts the composition accordingly.
This is achieved through state-based task scheduling, where tasks exist in an interdependent hierarchical structure rather than a linear sequence. Instead of treating retrieval, reasoning, and composition as isolated stages, the model allows them to interact. If reasoning changes a key argument, retrieval adjusts, and composition updates accordingly.
Can AI help us understand what animals feel?
To test HRP, researchers evaluated it on two writing tasks:
HRP outperformed state-of-the-art models like GPT-4o, STORM, and Co-STORM in every key metric, including coherence, logical depth, and adaptability. In fiction writing, HRP-generated stories showed better plot development and character consistency. In technical reports, HRP improved fact accuracy, organization, and citation integration.
Instead of treating AI like a content generator, HRP models act more like a human writer—constantly researching, refining, and reorganizing ideas.
HRP brings us closer to AI that doesn’t just follow instructions but actively refines its own thought process. The days of AI writing rigid, pre-structured content may soon be over. Instead, we’re entering an era where AI can think like a writer—analyzing, revising, and adapting as it goes.
AI writing has always been stuck in a weird loop—great at spitting out coherent paragraphs, terrible at knowing when to stop, when to rethink, when to adjust. That’s where HRP is genuinely exciting. Instead of treating writing like a fill-in-the-blanks exercise, it lets AI think while it writes, adjusting mid-stream like a human would. The results? More coherent storytelling, better argumentation, and writing that doesn’t collapse the moment a new idea needs to be introduced. The ability to bounce between retrieval, reasoning, and composition means AI-generated reports might finally stop reading like a collection of stitched-together Wikipedia excerpts. And in fiction, AI could move past formulaic templates and actually craft something that feels alive.
But let’s not get carried away. More flexibility means more complexity. Recursive planning sounds great until you realize it introduces a ton of computational overhead—meaning it’s slower, harder to optimize, and more expensive to run at scale. And there’s another issue: where does human intent fit in? If an AI is constantly tweaking its plan as it writes, how do we ensure it’s still following the original goal?
We’ve seen AI go off the rails before, and more decision-making power means more chances to lose sight of what matters.
Featured image credit: Kerem Gülen/Imagen 3