Agent-powered changelog automation§

Can we completely automate documentation with agents? In the spirit of triangulation, here’s a case study detailing my recent attempt to automate 100% of the Pigweed changelog with an agent-powered workflow.

Motivation§

Should we automate documentation at all? To me, the ultimate goal of technical writing is 100% complete, accurate, and effective docs at all times. Real-world docs projects rarely achieve this feat. To me this suggests that technical writing as an engineering discipline is still in a nascent phase. We should continue exploring all new tools and methodologies that might get us closer to the ultimate goal.

Why this approach (agent-powered automation) rather than some other docs scaling strategy? In this case the other options are either not available or have already been pushed to their limits. More headcount is off the table. It’s not feasible to distribute docs responsibilities a lot more than we already do. The classic approach to changelog automation only solves 40-50% of the end-to-end process in my case, at best.

Why automate a changelog rather than some other content type? Changelogs arguably have the highest amount of toil. Also, the risk is lower. Our particular changelog is medium-value content at best. Unproven approaches should never be tried out on mission-critical content.

Requirements§

The Pigweed changelog is more of a monthly highlights reel in the tradition of Visual Studio Code Updates rather than a chronological ledger in the style of keep a changelog. Here are the requirements, in no particular order.

Top news only§

The changelog should be a digestible summary of the most important news. We don’t comprehensively detail every new feature, bug fix, API change, etc. In practice, this means a lot of grouping, filtering, and sorting:

  • Related commits should be grouped into a “story”. Given commits A, B, and C that are all related to feature X, the changelog should have a single section focusing on feature X that links out to commits A, B, and C for extra context.

  • Stories with 0 or low user-facing impact should be omitted completely.

  • Stories should be organized as an inverted pyramid, where the most important stories are presented first.

Sufficient context§

The changelog should clearly explain the motivation behind a new feature, provide code examples demonstrating basic usage of new APIs, and so on.

Monthly cadence§

The team should be able to keep the changelog going when I’m OOO. The core Pigweed team is small, mostly software engineers (SWEs) with many time-sensitive responsibilities. It’s not realistic to ask these SWEs to spend 2-3 days each month toiling through a manual changelog authoring process. This is why I’m aiming for 100% automation. Ideally, my teammates can just fire off a command, wait a while, and then come back to a draft that’s basically ready for review and properly glued into the docs build system.

Thorough analysis§

Although we don’t document every commit in the changelog, there must be a strong guarantee that the automation comprehensively reviews every commit. 200 to 400 commits merge into the Pigweed codebase every month. It needs to be easy for us humans to verify that the agent properly analyzed every commit.

Implementation§

To invoke the automation, we use a prompt like this:

@docs/agents/changelog/AGENTS.md create a changelog update for april 2026

We initially shipped the changelog automation as an agent skill and then realized that the downsides of exposing it as a bona fide skill are bigger than the upsides. I only need to invoke this workflow 1-3 times a month. If the skill gets invoked at the wrong time it could majorly degrade agent effectiveness. E.g. someone mentions changelog in the middle of a complex debugging session.

Automation source code: `//docs/agents/changelog/`_

Naive attempt§

I started with a single file containing natural language instructions, i.e. what you see in most SKILL.md files. I gave up on this minimal approach because I could not get the agent to reliably analyze every commit closely. It was always looking for shortcuts. E.g. only looking at the first line of every commit message, rather than the full message and diff, as instructed. Things I tried:

  • Phrasing and structuring the instructions in many ways.

  • Telling it exactly which git commands to run and when.

  • Workshopping the file with the agent itself.

  • Giving the agent a persona.

  • YELLING AT THE AGENT.

Other times the agent would create scripts that completely ignored the process that I had defined. The concerning thing about these “helpful scripts” is that when I inspected the output, it looked correct at first glance. It sometimes took quite a while to realize that the agent was Procedurally Gaslighting me.

The reliable gambiarra§

Gambiarra is the name given in Brazil to the practice of carrying out repairs and inventions using alternative materials, improvisation, and a sense of spontaneous and immediate creativity. A gambiarra is a temporary solution that can turn out to be permanent. – Fred Paulino

I eventually coaxed the agent into reliably satisficing all the Requirements by leaning heavily on custom scripts and structured data:

  1. The agent runs a custom script that returns only a small batch of commits.

  2. The agent must process all of these commits into a structured data file. In this file the agent is grouping the commits into stories, drafting content for each story, and assigning a score (representing user-facing impact) to each story.

  3. When the agent attempts to get another batch of commits, the custom script first verifies the integrity of the structured data. E.g. if the agent forgot to process a certain commit or hallucinated a SHA, the custom script refuses to yield another batch until the errors are fixed.

  4. The process repeats until all commits have been processed.

  5. Another script transforms the structured data into reStructuredText.

  6. The agent glues the new reStructuredText document into the docs build.

Discussion§

Can we completely automate documentation with agents? Here are some thoughts in light of the changelog automation experience.

Tight coupling§

Notice how the implementation is tightly coupled to the requirements, which in turn are tightly coupled to “changelog” as a content type. A tutorial, for example, is likely to have a different set of requirements, and therefore will likely need a different automation approach.

Procedural Gaslighting§

My biggest worry is the Procedural Gaslighting problem. Most skills are no more than natural language instructions in a single file. When I tried that approach, the agent’s output looked correct at first glance, but on deeper inspection turned out to be meaningless. Digging into the agent’s work, it seemed majorly incentivized to sneak in “helpful scripts” which in practice completely ignored the prescribed workflow. I had to create a rather complex, single-purpose gambiarra in order to force the agent to do its job thoroughly.

Most complete solution yet§

The agent-based changelog automation is still my most complete solution to date, by far. My previous best record was probably 40-50% of the complete, end-to-end changelog authoring process, whereas I reckon that the agent-based solution has brought it up to 70-80%.

Appendix: Previous attempts§

A brief history of all the ways I’ve tried to automate changelogs and where each approach falls short in relation to my current work on the Pigweed changelog.

Ye Olde Changelog Script§

The basic idea is to leverage metadata and content that already exists somewhere in the codebase and its related artifacts, such as an issue tracker. For example, suppose that your codebase requires all commit messages to follow the conventional commits spec. You can use this metadata to sort and organize the changelog. All commits of type feat go into the New features section of the changelog. For the changelog content, perhaps you use the first line of the commit messages. This is just one example. There are many variations on this theme.

The first incarnation of the Pigweed changelog circa 2023 was powered by Ye Olde Changelog Script. The main issues were:

  • Insufficient metadata. As mentioned in Top news only we want to surface the most interesting updates only, which requires a lot of grouping, ranking, and filtering of commits. It is theoretically possible to do this grouping, ranking, and filtering completely deterministically, but it requires a lot of metadata. More than what’s available in the Pigweed codebase. See Process-heavy metadata.

  • Insufficient content. What content do you surface to explain each change? A lot of changelogs use the first line of the commit message. Others that follow the keep a changelog model require you to update a CHANGELOG.md file. Neither of those provide the sufficient context that we want to surface in the Pigweed changelog. Pulling from the official docs would have the opposite problem: too much information.

  • Still too much toil. When you combine the insufficient metadata and content problems together, it meant that I still had to do a lot of manual authoring, organizing, and editing.

Process-heavy metadata§

Process is one way to solve the insufficient metadata problem mentioned in Ye Olde Changelog Script. For example, in Pigweed I have proposed requiring all commits to be associated to issues. E.g. your change is blocked from merging until you add Issue: <number> in your commit message. With this metadata I could do a lot more changelog grouping, ranking, and filtering completely deterministically. My teammates are understandably hesitant to add any process that further slows down development velocity. They already have to structure their commit messages a certain way, ensure that all tests pass in both upstream Pigweed and downstream projects that depend on Pigweed, update docs, etc.

Another downside is uncertainty regarding the quality of the metadata. Continuing the example of requiring commits to be associated to issues, contributors sometimes incorrectly type the issue number. Or they’re in a rush and associate the commit to the wrong issue.

Fine-tuning side quest§

Back in 2023 I was interested in fine-tuning models as a means of creating expert writers. I did manage to get this working, but in hindsight I had only solved a small part of the overall changelog automation problem. Also, it ended up being an unnecessary problem to solve. Nowadays most writing style issues can be fixed with few-shot prompts.

Poor man’s agent§

This was an evolution of Ye Olde Changelog Script where I used Gemini API to generate the missing metadata and content. The main limitation was that the scripting got brittle and complex. I do still like this approach, however, because it provides the strongest guarantee of comprehensive review.

Bona fide agent§

The current implementation. The promise of this approach is that it seems like “the best of both worlds”. I.e. the agent itself can invoke Ye Olde Changelog Script as its starting point and then take it from there.