Automating a changelog with an agent skill§

Can we automate all documentation through skills? What does skill-based docs automation look like in practice? When should I prefer skills over other automation techniques?

Now that the Pigweed changelog [1] automation is powered by a skill, I have some firsthand experience and opinions to share. On one hand, skills have enabled previously impossible levels of changelog automation for me. On the other, the implementation that satisfices my long list of requirements is rather complex and single-purpose. My biggest gripe so far is that it’s hard to guarantee that the agent is doing thorough and comprehensive research.

Requirements§

The changelog automation has to satisfice all of the following content and process requirements.

Content requirements§

  • Our main goal is to quickly inform Pigweed users of important or interesting updates. A highlights reel [2], not a comprehensive log.

  • Related commits should be grouped [3] into a “story”. E.g. when commits A, B, and C are all related to a new foo feature, the changelog should have a single section about foo that links out to commits A, B, and C for extra context.

  • Stories should be informative and engaging. They should properly explain the motivation behind a new feature, provide code examples demonstrating a new API, etc.

  • The changelog should be organized as an inverted pyramid, with the most important stories somehow floating to the top.

  • Stories with 0 or low user-facing impact should be omitted completely.

Notice how all of these content requirements are tightly coupled to “changelog” as a content type. E.g. the content requirements for a guide is completely different than the changelog content requirements.

Process requirements§

  • I should be able to easily publish a changelog update every month with very little toil.

  • It must be feasible for my teammates to keep the changelog going when I’m on vacation, paternity leave, etc.

  • Although we don’t document every commit in the changelog, there must be strong guarantees that the agent comprehensively reviews every commit. 200 to 400 commits merge into the Pigweed codebase every month. If I’m not confident that the agent has reviewed every commit, I will have a nagging worry that the changelog automation is missing some important story.

Decision§

Why did I decide to use a skill over other automation techniques?

  • I’ve already tried a few other techniques and hit the limitations of each. See Appendix: Previous attempts.

  • There’s currently a lot of leadership support to explore skill-based docs automation.

  • With the skill approach it seems realistic for teammates to keep up the changelog when I’m out. They just send a command like update the changelog for march 2026 to an agent and the agent supposedly can take care of the rest.

  • My previous changelog automation attempts fell short because they were too rigid. The ability for the agent to complete fuzzy tasks was exactly the kind of new capability that I needed.

Implementation§

Let’s talk about the Minimal approach to skills first and then I’ll share my Working approach.

Minimal approach§

Most skills on GitHub implement the minimum required by the skills spec. There’s a name for the skill, a description of when the skill should trigger, and natural language instructions:

---
name: <name>
description: <description>
---

<instructions>

This minimal approach did not work for me. I could not get the agent to reliably analyze every commit closely. I tried phrasing the instructions in many ways. I gave explicit instructions on exactly which commands to run. I asked the agent for advice on how to make it reliably analyze each commit. I even tried YELLING AT THE AGENT. It quickly became clear that I could not trust the agent to look at every commit, let alone juggle all the other requirements.

Working approach§

After a lot of trial-and-error, I did eventually coax the agent into reliably satisficing all the requirements via this rather complex and single-purpose workflow:

  1. The agent runs a start script that initializes the data file (data.toml) where the agent will do all its work.

  2. The agent runs a next script that dumps a batch of commit data into a temporary JSON file. 25 commits per batch.

  3. The agent processes the batch of commits. In data.toml it starts grouping the commits into stories, assigning a score for each story (an integer between 0 and 1000 representing the amount of user-facing impact), and drafting the changelog content for each story.

    When you watch the agent work in real-time, you see the data.toml file gradually evolve as new commits get processed. For example, at first the agent might decide that these commits are unrelated and therefore should be separate stories:

    [stories.category_a.foo]
    title = "New foo API"
    # Omitting a few fields here
    score = 600
    
    [stories.category_a.foo.commits."8ba2b1c"]
    summary = "Implement foo class"
    
    [stories.category_b.bar]
    title = "New bar API"
    # …
    score = 500
    
    [stories.category_b.bar.commits."00b1121"]
    summary = "Start bar method"
    

    And then some more commits get processed and it becomes clear that these 2 commits (as well as some others) are actually all related to each other:

    [stories.category_x.ai]
    title = "Something AI something give us money"
    # …
    score = 950
    
    [stories.category_x.ai.commits."8ba2b1c"]
    summary = "Implement foo class"
    
    [stories.category_x.ai.commits."00b1121"]
    summary = "Start bar method"
    
    [stories.category_x.ai.commits."189d382"]
    summary = "Cram AI into everything via the foo and bar APIs"
    
  4. The agent runs the next script again to get another batch of commit data. The next script first does a bunch of data integrity checks. When next detects a problem in data.toml (e.g. the agent hallucinated a SHA), the script refuses to yield another batch of commits until the agent fixes the problem. After all problems are resolved next yields the next batch. The process repeats until all commits are processed.

  5. After all the commit data has been processed, the agent runs an end script that transforms data.toml into reStructuredText.

  6. The agent glues the new reStructuredText file into the docs build.

Here’s the full source code of the skill and its associated scripts: //.agents/skills/changelog/

Discussion§

Now that we’ve toured the sausage factory, let’s revisit those questions that I started the post with.

Can we automate all documentation through skills?§

The skill-based approach is the most complete solution to changelog automation that I’ve achieved yet. So that’s promising. We should definitely keep exploring whether skills can enable step change advances in other types of docs automation.

If I had to predict the Achilles Heel of skill-based docs automation, it would be the thoroughness problem. In the changelog automation it manifested as difficulty in guaranteeing that the agent was properly reviewing every commit. For other types of docs automation, the thoroughness problem is going to manifest in different ways. For example, suppose you use a skill to automate the process of updating docs in light of new features. How do you guarantee that the agent has updated all relevant docs? That it’s not duplicating docs?

What does skill-based docs automation look like in practice?§

My hypothesis is that skill-based docs automation is going to look quite different depending on the content type that you’re trying to automate. E.g. tutorial automation may require a completely different approach than changelog automation.

When should I prefer skills over other automation techniques?§

My golden rule is to always attempt a deterministic approach to automation first (e.g. Python script) and then only reach for fuzzy AI tools when I hit the limits of deterministic tools. In the case of the changelog automation I reached for the agent skill because there was no other way for me to automate the process of grouping commits into stories and stack-ranking stories against each other.

While perusing skills on GitHub I noticed that a lot of the docs-related skills handle peripheral, cross-cutting concerns. E.g. how to build the docs and debug the docs build system, or how to edit documentation to adhere to the company style guide. Perhaps skills are best-suited for situations where the skill instructions themselves can be structured as how-to guides. In other words, if your automation requires a tutorial structure (like the changelog automation does) perhaps you’re going to have similar struggles as my changelog automation.

Appendix: Previous attempts§

A brief history of my previous attempts to automate the Pigweed changelog:

  1. Ye Olde Changelog Script. Grab all commits. Extract metadata from the commit messages, file diffs, paths of changed files, etc. Use this metadata to group the commits. This kind of deterministic scripting is helpful for basic preprocessing, but by itself it could never satisfice all the previously mentioned Requirements. It may still be worthwhile to do this kind of deterministic processing as a first step, before handing off the fuzzy tasks to LLMs.

  2. Wacky Fine-Tuning Experiment. Back in 2024 I was very interested in fine-tuning models as a means of creating expert writers. I did manage to create fine-tuned models that matched my writing style very closely, but that’s really only one small part of the overall changelog automation equation.

  3. Poor Man’s Agent. This one was similar to the current agent skills approach described in Working approach. The main difference was that I attempted to orchestrate everything myself. A Python script would fetch the commits, then use Gemini API to group the commits into stories, determine the user-facing impact of each story, etc. I still like this approach because it provides the strongest guarantee of comprehensive commit review, but the story grouping and scoring was hard to get right. The scripting was also very brittle and complex.

  4. Agent skill. The current attempt as described in Working approach.