Introduction
This post is part of Editing Novels Faster, a series documenting an experiment in using AI as an editorial aid rather than a writing tool. The aim is not automated prose generation, but reducing the cognitive drag of large-scale revision, especially the structural and developmental work that becomes difficult once a manuscript outgrows working memory.
This episode is about preparing the tools, and explaining why each is a necessary part of the process. What follows is not an optimal workflow. It is the workflow I actually built, as a novice programmer and tool user, with the explicit intention of learning where it breaks and how it might be improved.
This series documents what I did, not what is ideal. One of its goals is to expose the friction points clearly enough that they can be improved, either by me over time or by others approaching the same problem with deeper technical skill. If this process feels manual in places, that is because it was.
Why Not Just Use a Hosted LLM?
Before getting into the stack itself, it’s worth addressing an obvious question: why do any of this at all? Why not simply paste sections of a manuscript into a hosted LLM and ask for feedback?
Anyone who has tried to do serious revision work this way already knows the answer: online models are poorly suited to long-form editing. Context windows are small relative to a novel, which forces constant summarization and re-pasting. Earlier material drops out of view. Comparisons across distant scenes become unreliable. The model’s attention shifts toward whatever was mentioned most recently, whether or not it is structurally important.
There are also practical concerns. Privacy is nontrivial when working with unpublished manuscripts. Token limits discourage exploratory questioning. Conversations fragment across sessions. Even when the model performs well locally, the workflow itself becomes the bottleneck.
Most importantly, hosted LLMs are not designed to let you talk to your whole manuscript. They are designed to respond to prompts, not to persistently hold, retrieve, and cross-reference a body of text over time.
Hosted LLMs are not the right tool for a principled approach to developmental editing. Full stop.
The stack described here exists to attempt to solve that specific problem.
What Can We Actually Change? Local LLMs and RAG
So what can we use? And how does that solve the aforementioned problems.
We host our own LLM on a local machine, and we make use of a technique called Retrieval-Augmented Generation (RAG).
RAG fundamentally alters the interaction model. Instead of forcing the manuscript into the prompt, the manuscript lives alongside the model and is pulled in selectively as needed. Questions can be asked at the level of structure rather than excerpts. Patterns can be surfaced without manual curation. The system can retrieve what is relevant without pretending to remember everything.
Running this stack locally compounds those benefits. The manuscript never leaves your own machine. Context persists across sessions. Experiments are not metered by token cost. The workflow encourages iteration rather than discouraging it. Failure becomes cheap, which is essential when the goal is exploration rather than performance.
This is not about power. A smaller local model paired with the right retrieval setup is often more useful for revision than a larger hosted model constrained by interface and context.
The Tools We Need
With that justification in place, the requirements of the stack become clearer. At a minimum, this project needs three categories of tools, each serving a distinct role.
The first is word processing. This is where the novel lives, where writing and revision actually happen, and where authorial control remains absolute. For me, that tool is Scrivener, with some supporting work done in OpenOffice Writer (or Microsoft Word). It already encodes the structure of the book correctly: parts, chapters, scenes, and their relationships. Nothing in this workflow is meant to replace or augment Scrivener’s role as the canonical home of the manuscript. Any word processing will do here, but some functions will smooth the generation of the reference corpus later on.
The second is a language model. Not as a writer, but as an analytical engine. The model needs to be capable of reading prose, tracking concepts, and responding coherently to questions about structure, repetition, pacing, and emphasis. Its prose quality is largely irrelevant. What matters is reliability, consistency, and the ability to reason over retrieved text. This is one of the “gains” of being deliberate about our use case: smaller models may not cost us very much at all.
The third is retrieval, specifically a Retrieval-Augmented Generation (RAG) layer. This is the connective tissue. Without retrieval, the model is blind to the manuscript as a whole. With it, the model can be asked questions that span scenes, chapters, or thematic threads without forcing the entire book into a single prompt. This is key enabling mechanism. A good RAG system can feasibly store the entire novel in a queryable database accessible by my local LLM of choice.
Providing the Data: Scenes as the Atomic Unit
With the tools in place, the most important question became: what, exactly, are we giving them?
A novel, as it exists in a writing application, is not a dataset. It is a hierarchy optimized for composition, not interrogation. Treating it as if it were already analyzable leads to confusion later, when it becomes unclear whether a problem lies in the model, the query, or the structure of the text itself.
The correct unit of analysis for structural revision is the scene. Chapters are too coarse. They often contain multiple functions and internal shifts. Paragraphs are too fine. They lose narrative intent when separated from context. Scenes are where decisions happen. They have purpose, momentum, and consequences. They are the units that repeat, mirror, stall, or accelerate over the course of a novel.
Arriving at that conclusion was easy. Executing it was not.
What we want is clear: Each and every scene of the novel as its own separate file. These will form the database that will help the RAG function as intended.
Several approaches that looked promising on paper failed in practice. Others technically worked but introduced enough fragility to be unusable. Eventually, it became clear that cleverness was the problem. Every attempt to preserve elegance was adding failure modes.
The solution was reductive: one scene, one file. No synthesis. No transformation beyond what was strictly necessary. The end result was unambiguous: 122 scenes, each exported as its own text file.
I went through many attempts to get at this, but ultimately had to do something really tedious: go back to my original OpenOffice Writer document and use the “master document” feature to export every scene (after some manual style edits). I would like very much to have a better solution for my second novel, LANTERN. But, crucially, it got the job done.
Now I can play around and see if any of this is worthwhile.
Taking Stock Before Asking Questions
With the scenes externalized, the novel could finally be treated as a database. Each scene was now an independent object that could be embedded, retrieved, and compared without reference to its neighbors.
No additional metadata was added at this stage. Point-of-view, thematic labels, and structural tags were all intentionally deferred.
Once exported, the corpus was frozen. From that point forward, the files were treated as a dataset. Any change would invalidate comparisons. Every scene had a simple header:
Scene #
Chapter #
Part #
TEXT
This is where the episode intentionally stops. The stack exists. The data exists.
At this point, the questions finally change. Which local LLM makes sense for this task? How should a RAG system actually be used? What does ‘embedding’ mean in practice, rather than in theory? And which tools are necessary to make any of this usable day to day?
All of that is for Episode 2.
That pause is deliberate. Indexing text does not produce understanding. The moment queries begin, interpretation enters the system, along with the risk of confirmation bias. Stopping here preserves a clean boundary between preparation and experiment.
What Comes Next
The next episode begins where this one ends. I will lay out the specific choices I have made for language models, programs for interacting with them, decisions on how to handle the data processing, everything.
And, spoiler alert, I already know the answer.
Every control and countermeasure I created and tried to implement to control for those thorny LLM issues couldn’t rescue the effort from a single simple contradiction: LLMs analyze a narrow context, whereas a novel is a complete work spanning tens of thousands of words.
I was not able to reconcile that contradiction. LLM+RAG failed to deliver as a useful too.
Episode 2 will lay our this failure and the lessons learned. If you want to follow along, you can subscribe for updates when it goes live.
