Changelog automation with agent skills§

I am the docs lead for Pigweed. We aim to publish a changelog every month, which covers about 200-300 commits. It is toilsome work. Even with substantial automation, it still takes me 1-2 days each month. When I’m out, it’s not feasible for Pigweed’s small core team to keep up the changelog.

I am on my 3rd attempt to automate this changelog. The latest attempt is powered by an agent skill. This blog post chronicles my first experience with skill-driven docs automation.

Summary§

  • The minimal approach to skills-based docs automation, where entire workflow and requirements are described in a natural language SKILL.md file, was not a viable path for the Pigweed changelog automation. Agents did not follow the SKILL.md workflow reliably or prove that they were meeting the requirements.

  • Making the automation robust and repeatable required leaning heavily on scripts and structured data.

  • My hunch is that agent skills will enable extensive docs automation, but there won’t be a 1-size-fits-all solution. E.g. changelog automation will require a different strategy than docs set rewrites. Some of these strategies will require more verification (scripts, structured data, tests, etc.), others less.

Requirements§

When a changelog is just a chronological dump of all commits over a given timeframe, it’s easy to automate everything with a script. Automating the Pigweed changelog is more challenging because we want a high signal-to-noise ratio:

  • Individual commits are usually too small to be newsworthy. To identify the most important news, I need to find [1] a collection of related commits and group these together as a “story”.

  • Stories should be organized as an inverted pyramid, with the most important stories somehow floating to the top.

  • Stories with 0 or low user-facing impact should be omitted completely.

  • The changelog should be comprehensive in the sense that every commit that merged during the target month is properly analyzed and accounted for.

  • As previously mentioned, the core team needs to be able to keep the changelog going when I’m out. I.e. the end-to-end process needs to be thoroughly automated.

Minimal approach§

Agent skills are a mechanism for creating repeatable agent-driven workflows. At minimum, you create a SKILL.md file with YAML frontmatter containing name and description fields, followed by the workflow instructions. When a user’s prompt looks related to the skill’s name or description, the agent automatically runs the workflow described in SKILL.md.

This minimal approach, where the entire workflow is defined as natural language in a SKILL.md file, was not a viable path for the Pigweed changelog automation. Even if the minimal approach leads to a high-quality document, how can I quickly verify that the agent analyzed all commits? That it didn’t accidentally filter out some important story?

In practice, I found the minimal approach to be too unreliable. The agents were always looking for ways to shortcut the SKILL.md workflow. For example, they would ignore an instruction that told them to fetch only 5 commits and look at each commit’s message and diff in detail. Instead, they would fetch all commits via git log --oneline --since=<YYYY-MM-DD> --until=<YYYY-MM-DD> and make wild guesses about each commit based on only the first line of the commit messages.

Robust approach§

The solution, in my case, was to lean heavily on structured data and scripts:

  1. The SKILL.md file instructs the agent to fetch commit data from a script (next) that only provides 5 commits at a time.

  2. The agent records its analysis of the commits into a TOML file. The agent iteratively builds up all of the potential stories in this file. In addition to each story’s changelog content, the agent also creates a score for each story representing the amount of user-facing impact.

  3. When the agent attempts to fetch another batch of commits via the next script, the script verifies that the TOML data is still valid. If it finds errors, it informs the agent of the errors and refuses to provide another batch of commits until the errors are fixed.

  4. The agent repeats this workflow until all commits are processed.

That’s the core of the automation. I’m glossing over a lot of details. See Implementation.

The agent’s job is done after that. It’s fairly easy for me to manually edit the changelog content within the TOML data. When I need to rearrange the ordering of stories, I tweak the scores. I can run the next script myself to verify that all commits are accounted for in the TOML data.

During the Sphinx build, a Sphinx extension transforms the structured data into a valid document.

Results§

This workflow processes each commit in about 9 seconds. E.g. 200 commits in 30 minutes. I believe the process is reliable enough, and the output is good enough, that the Pigweed changelog automation is now mostly a solved problem. My teammate’s haven’t reviewed this new approach (or the generated content) though, so it’s still possible that I will have to move on to a 5th godforsaken attempt.

Implementation§

The rest of this post walks through the implementation in detail.

Appendix: Previous attempts§

TODO