Agent-powered changelog automation§

Can we completely automate documentation with agents? In the spirit of triangulation, here’s a case study detailing my recent attempt to automate 100% of the Pigweed changelog with an agent-powered workflow.

Motivation§

Should we automate documentation with AI at all? It’s a complex topic. Collectively, we’re nowhere close to the ultimate goal of technical writing: 100% complete, accurate, discoverable, and effective docs at all times. I think we need to keep an open mind to new tools and methodologies that might get us closer to the ultimate goal.

Why agent-powered automation rather than some other docs scaling strategy? In this case the other options are either not available or already pushed to their limits. More headcount is off the table. We already distribute docs responsibilities across the team a lot and it’s not realistic to ask teammates to do a lot more. The classic approach to changelog automation only solves 40-50% of the end-to-end process in my case, at best.

Why automate a changelog rather than some other content type? Changelogs arguably have the highest amount of toil. Also, the risk is lower. Our particular changelog is medium-value content at best. I wouldn’t use any unproven approach on mission-critical content.

Requirements§

The Pigweed changelog is more of a monthly highlights reel in the tradition of Visual Studio Code Updates rather than a chronological ledger in the style of keep a changelog. Here are the requirements, in no particular order.

Top news only§

The changelog should be a digestible summary of the most important news. We don’t comprehensively detail every new feature, bug fix, API change, etc. In practice, this means a lot of grouping, filtering, and sorting:

  • Related commits should be grouped into a “story”. Given commits A, B, and C that are all related to feature X, the changelog should have a single section focusing on feature X that links out to commits A, B, and C for extra context.

  • Stories with 0 or low user-facing impact should be omitted completely.

  • Stories should be organized as an inverted pyramid, where the most important stories are presented first.

Sufficient context§

The changelog should clearly explain the motivation behind a new feature, provide code examples demonstrating basic usage of new APIs, and so on.

Monthly cadence§

The team should be able to keep the changelog going when I’m OOO. The core Pigweed team is small, mostly software engineers (SWEs) with many time-sensitive responsibilities. It’s not realistic to ask these SWEs to spend 2-3 days each month toiling through a manual changelog authoring process. This is why I’m aiming for 100% automation. Ideally, my teammates can just fire off a command, wait a while, and then come back to a draft that’s basically ready for review.

Thorough analysis§

Although we don’t document every commit in the changelog, there must be a strong guarantee that the automation comprehensively reviews every commit. 200 to 400 commits merge into the Pigweed codebase every month. It needs to be easy for us humans to verify that the agent properly analyzed every commit.

Examples§

The following changelogs were generated with the agent-powered automation and mostly satisfice all of the Requirements.

Implementation§

We initially shipped the changelog automation as an agent skill and then realized that the downsides of exposing it as a bona fide skill are bigger than the upsides. I only need to invoke this workflow 1-3 times a month. Usually nobody else needs it. If the skill gets invoked at the wrong time it could majorly degrade agent effectiveness. E.g. someone mentions changelog in the middle of a complex debugging session and it kicks off the whole changelog authoring process.

Naive attempt§

I started with a single file containing natural language instructions, i.e. what you see in most SKILL.md files. I gave up on this minimal approach because I could not get the agent to reliably analyze every commit closely. It was always looking for shortcuts. E.g. only looking at the first line of every commit message, rather than the full message and diff, as instructed. Things I tried:

  • Phrasing and structuring the instructions in many ways.

  • Telling it exactly which git commands to run and when.

  • Workshopping the file with the agent itself.

  • Giving the agent a persona.

  • YELLING AT THE AGENT.

Other times the agent would create scripts that completely ignored the process that I had defined. The concerning thing about these “helpful scripts” is that when I inspected the output, it looked correct at first glance. It sometimes took quite a while to realize that the agent was Procedurally Gaslighting me.

The reliable gambiarra§

Gambiarra is the name given in Brazil to the practice of carrying out repairs and inventions using alternative materials, improvisation, and a sense of spontaneous and immediate creativity. A gambiarra is a temporary solution that can turn out to be permanent. – Fred Paulino

I eventually coaxed the agent into reliably satisficing all the Requirements by leaning heavily on custom scripts and structured data. Here’s the workflow.

  1. I invoke the automation with a prompt like this.

    @docs/agents/changelog/AGENTS.md create a changelog
    update for april 2026
    
  2. The agent reads AGENTS.md and sees that it’s supposed to run a custom script. This black box script provides only a small batch of commit data to the agent.

  3. The agent must process each batch of commits into a structured data file. At this stage the agent is grouping the commits into stories, drafting content for each story, and assigning a score (representing user-facing impact) to each story.

  4. When the agent attempts to get another batch of commits, the custom script first verifies the integrity of the structured data. E.g. if the agent forgot to process a certain commit or hallucinated a SHA, the custom script refuses to yield another batch until the errors are fixed.

  5. The process repeats until all commits have been processed.

  6. Another script transforms the structured data into a document.

  7. The agent glues the new document into the docs build.

  8. I leave feedback as TODOs inside the structured data file and manually iterate with the agent a couple times. I may also rewrite some of the content manually at this point, too.

Automation source code: //docs/agents/changelog/

Discussion§

Revisiting the question from the post opening. Can we completely automate documentation with agents? Here are some thoughts in light of my changelog automation experience.

Tight coupling§

Notice how the implementation is tightly coupled to the requirements, which in turn are tightly coupled to “changelog” as a content type. A tutorial, for example, is likely to have a different set of requirements, and therefore will likely need a different automation approach.

Procedural Gaslighting§

My biggest worry is the Procedural Gaslighting [1] problem. Most skills are no more than natural language instructions in a single file. When I tried that approach, the agent’s output looked correct at first glance, but on deeper inspection turned out to be meaningless. Digging into the agent’s work, it seemed majorly incentivized to sneak in “helpful scripts” which in practice completely ignored the prescribed workflow. I had to create a rather complex, single-purpose gambiarra in order to force the agent to do its job thoroughly. To be clear, ceteris parabus, docs will get created and updated more in a world with agents than a world without. But to achieve the ultimate goal of 100% complete, accurate, and effective docs, we will probably need tools that follow prescribed processes rigorously and reliably. agents need control flow, not more prompts resonated with my changelog automation experience.

Most complete solution yet§

The agent-based changelog automation is still my most complete solution to date, by far. My previous best record was probably 30-40% of the complete, end-to-end changelog authoring process, whereas I reckon that the agent-based solution has brought it up to 70-80%.

Appendix: Previous attempts§

A brief history of all the ways I’ve tried to automate changelogs and where each approach falls short in relation to my current work on the Pigweed changelog.

Ye Olde Changelog Script§

The basic idea is to leverage metadata and content that already exists somewhere in the codebase and its related artifacts, such as an issue tracker. For example, suppose that your codebase requires all commit messages to follow the conventional commits spec. You can use this metadata to sort and organize the changelog. All commits of type feat go into the New features section of the changelog. For the changelog content, perhaps you use the first line of the commit messages. This is just one example. There are many variations on this theme.

The first incarnation of the Pigweed changelog circa 2023 was powered by Ye Olde Changelog Script. The main issues were:

  • Insufficient metadata. As mentioned in Top news only we want to surface the most interesting updates only, which requires a lot of grouping, ranking, and filtering of commits. It is theoretically possible to do this grouping, ranking, and filtering completely deterministically, but it requires a lot of metadata. More than what’s available in the Pigweed codebase. See Process-heavy metadata.

  • Insufficient content. What content do you surface to explain each change? A lot of changelogs use the first line of the commit message. Others that follow the keep a changelog model require you to update a CHANGELOG.md file. Neither of those provide the sufficient context that we want to surface in the Pigweed changelog. Pulling from the official docs would have the opposite problem: too much information.

  • Still too much toil. When you combine the insufficient metadata and content problems together, it meant that I still had to do a lot of manual authoring, organizing, and editing.

Process-heavy metadata§

Process is one way to solve the insufficient metadata problem mentioned in Ye Olde Changelog Script. For example, for Pigweed I have proposed requiring all commits to be associated to issues. E.g. your change is blocked from merging until you add Issue: <number> in your commit message. With this metadata I could do a lot more changelog grouping, ranking, and filtering completely deterministically. My teammates are understandably hesitant to add any process that further slows down development velocity. They already have to structure their commit messages a certain way, ensure that all tests pass in both upstream Pigweed and downstream projects that depend on Pigweed, update docs, etc.

Another downside is uncertainty regarding metadata quality. Continuing the example of associating commits to issues, contributors sometimes incorrectly type the issue number. Or they’re in a rush and associate the commit to the wrong issue.

Fine-tuning side quest§

Back in 2023 I was interested in fine-tuning models as a means of creating expert writers. I did manage to get this working, but in hindsight I had only solved a small part of the overall changelog automation problem. Also, it ended up being an unnecessary problem to solve. Nowadays most writing style issues can be fixed with few-shot prompts.

Poor man’s agent§

This was an evolution of Ye Olde Changelog Script where I used Gemini API to generate the missing metadata and content. The main limitation was that the scripting got brittle and complex. I do still like this approach, however, because it provides the strongest guarantee of comprehensive review.

Bona fide agent§

The current implementation. The promise of this approach is that it seems like “the best of all worlds”. I.e. the agent itself can invoke Ye Olde Changelog Script as its starting point and then take it from there.