I’ve spent the last few weeks building and testing a RAG-based workflow for structural novel editing—specifically, trying to use a local LLM setup to analyze a full manuscript (100K+ words, 100+ scenes, multiple POVs) in a way that could meaningfully assist a Draft 1 -> Draft 2 revision.
The idea was not to generate prose or rewrite scenes, but to use AI as an analytical tool: something that could reconstruct structure, identify pacing issues, trace character arcs, and surface real editorial problems so that I, as the writer, could make better and faster decisions.
In theory, this is exactly what Retrieval-Augmented Generation (RAG) should enable.
In practice, it does not work. This post will explain why and what I plan to do next.
The Goal: Structural Editing, Not Writing
To be clear, the goal here was never “AI writes the book.” That’s not interesting and not useful. I want to write my book. Some will say that even going this far is dangerous, that allowing LLMs access to my work poisons it. There’s too much mysticism and fear-mongering there for my taste. At the end of the day, I’ve cleared my own bar: no LLM is writing for me, no public server has access to my written text, and every single final decision of what to change or keep or write rests squarely on my own head.
The Setup (RAG + Local LLM)
The stack was straightforward, at least on paper.
Local models running through Ollama. A few different options tested, mostly Llama and Mistral variants. Scene-based chunking, with each scene as its own document, plus a short synopsis to improve retrieval. A RAG pipeline sitting on top, embedding everything and pulling top-K chunks per query. A growing library of prompts aimed at editorial tasks.
If you’re newer to this space, the core idea is simple:
You can’t fit a 100K-word novel into a model’s context window. So instead, you break it into pieces and retrieve the “relevant” ones when you ask a question. The assumption is that if the model sees the right pieces, it can reason about the whole.
That assumption is where everything breaks.
Why It Breaks
At first glance, the system looks competent. It summarizes cleanly. It can talk about themes. It produces feedback that sounds like something an editor might say.
But the moment you push into real structural questions, it falls apart.
Retrieval Bias
RAG retrieves what is relevant, not what is necessary. That distinction kills it.
Character arcs, pacing, thematic development—these are not local problems. They are distributed across the manuscript, and more importantly, they live in the relationships between scenes, not the scenes themselves.
What the system actually does is pull a handful of strong moments and build an answer from those. A turning point here, a climactic scene there, maybe an early setup if you’re lucky.
The result is clean, confident, and incomplete. You “allow” the RAG to select, at maximum, a certain number of scenes. If your query is split across more than that number, or is more subtextual, you’re in for a bad time.
More Context Doesn’t Save It
The obvious fix is to pull more. Increase top-K so it can retrieve additional scenes for analysis. Expand the context window. Feed the model more of the book per query.
I tried all of it. It doesn’t solve the problem. It just changes the way it fails.
With limited context, the answers are biased. With expanded context, they become bloated and unfocused. The model starts flattening distinctions, hedging its claims, and defaulting to vague generalities. You can no longer restrict the context appropriately.
At a certain point, more context stops helping and starts interfering. You don’t get better insight. There doesn’t appear to be an “ideal” spot where every query is guaranteed to return the precise level of detail that you need. What’s more, it is easy to imagine that even if you could find such a sweet spot for a specific query (“Trace the character arc and growth of CHARACTER X over the full novel”), another query that covers far more or far fewer scenes would be unoptimized and give you the vague answers or too much literal focus. You’re damned, no matter what.
The Real Problem: Novels Are Systems
This is the core issue. RAG works by atomizing the manuscript—breaking it into chunks and retrieving them independently so that the LLM can perform the word- and sentence-level analysis of the text.
But crucially, a novel is not a collection of independent chunks.
It’s a system where scenes depend on prior states, where character decisions accumulate over time, where setups and payoffs can be separated by tens of thousands of words, and where pacing is defined by contrast, not proximity.
If there is one insight you take away from this series, it should be this:
Structural editing requires you to hold a system in your head and evaluate how it behaves as a whole. RAG never actually reconstructs that whole. It approximates it from fragments. And that approximation is not good enough.
It Misses What Actually Matters
This is where the experiment really fails.
Because I know the draft of this novel and have previously edited it “the old-fashioned way”, I know what it should be catching. The weak arcs. The dragging middle. The redundant sequences that needed to be cut entirely. Beta readers have already identified weaknesses, which I fixed and ultimately ended up in a nice first novel.
But my AI toolbox doesn’t find the issues, not reliably, not in a way you could trust. It finds something. But there is a massive difference between sounding like an editor and being useful as one. To trust even a system even more sophisticated than mine is just far too much of a leap of faith for me.
I know “professional” editing tools exist that promise what I’ve been trying to build here. But I’ve seen underneath the hood. I’ve battled with the constraints. And though I fully appreciate that engineers far more clever than I may have come up with solutions to these problems, I am no longer interested in such pursuits. Oz the Great and Powerful has become the schmuck from Kansas.
The Core Conflict
After enough testing, the conclusion is straightforward: A 100K-word novel is too large, too interconnected, and too dependent on global coherence for a RAG-based system to analyze structurally in a reliable way. This isn’t just about model quality or hardware limits. It’s a mismatch between method and problem.
RAG is local. Structural editing is global.
Trying to bridge that gap by increasing retrieval scope doesn’t fix the issue. It just hides it.
Lessons Learned
A few things became very clear through this process.
LLMs are strong at interpreting text, but weak at understanding systems composed of multiple pieces of text.
Retrieval is not understanding. Pulling the “right” chunks does not mean the model understands the structure those chunks belong to, and even getting it to identify what’s “right” was challenging.
More context does not guarantee better insight. Beyond a certain point, it degrades it.
Structural editing is fundamentally a global and relational task. It requires persistent awareness of interactions across the entire manuscript.
And maybe most importantly: plausible output is dangerous. The system sounds right far more often than it actually is.
Where This Leaves Me
The original goal was to find a way to accelerate structural revision without sacrificing quality.
This approach doesn’t do that.
So I’m going back to the process that actually works: reading, thinking, reverse-outlining, rewriting, and eventually putting the manuscript in front of other human beings who can do the same.
That’s the plan for LANTERN over the next few months.
Final Thought
There’s a broader point here that I think is worth stating clearly.
The human mind is still the best tool we have for reading, interpreting, and revising long-form narrative. I’ve seen the kinds of things that LLMs and RAG can do in many different contexts, and it is always important to understand where it brushes up against limits and why. I feel convinced that I have discovered one of those limits for myself, in my own workflow.
That doesn’t mean tools like this are useless. They’re not. They’re powerful in specific contexts, especially at smaller scales or for localized analysis. But they are not shortcuts to understanding a novel. They are not shortcuts for your writing, and to treat them as such compromises far too much.
And right now, they cannot replace the act of holding a story in your head, seeing how it moves, and deciding what it lacks.
That part is still ours.
I am glad I went through this process. It is attractive to think that a new tool can optimize your workflow and allow you to focus on the ideas and tasks that matter most as a writer. The entire point was to explore the possibilities of LLMs, RAGs, and these new AI tools, and although I found them to be completely insufficient, that’s an important conclusion to have reached with my own work.
