Keep Autogenerated Files Synced

This article assumes you’re familiar with Git, code reviews, and continuous integration. If you’re not, you might want to familiarise yourself with those concepts first, as they are great tools for writing and maintaining good software.

The Problem

Since you’ve made it this far, I bet this situation will sound familiar to you: You’ve been working on your feature branch for a few hours, and you’re almost ready. The UX is slick, and the tests are green. It’s time to create a pull request! But before doing so, you make sure your branch contains the latest changes merged to master. And as the diligent developer you are, after rebasing your branch on top of the latest changes, you rerun your test suite and check your app to make sure everything still works. As you try to start your app, you notice you need to run the latest migrations or maybe update your dependencies. No biggie. You run rails db:migrate, or yarn install, and all is well. Except for git status, which now shows changes to db/schema.rb or yarn.lock. That’s never a great feeling.

What if you could avoid having to feel that again by just adding a few lines of code to your CI pipeline? Well, you totally can.

Why the Problem Happens

Two things can cause the problem I mentioned above:

  1. An autogenerated file was committed, but the code that generates it was not. For example, someone edited package.json and ran yarn install locally and then added the changes on yarn.lock but forgot to add the changes on package.json to their commit.
  2. The code that generates the file was added, but the autogenerated file was not. As in when someone committed a new migration but forgot to add the updated database schema.

We’ve all made mistakes like these in the past, or missed these aspects during a code review. And that’s ok. In my opinion, if a computer generates a file, then a computer should be in charge of making sure that file is up to date. For more on the topic, read further considerations.

The Solution

Add a step to your CI pipeline that will fail the build if any autogenerated files change.

There are multiple ways of doing so, but the key step is having your CI pipeline generate all autogenerated files. You can then fail the pipeline by detecting changes (maybe using Git) or by removing write permissions on the autogenerated files before running the commands that update those files.

Removing Write Permission

Let’s say you have a file that keeps track of your database schema called db/schema.rb. We can start by locking that file against writes:

chmod 0444 db/schema.rb

Next, we try to generate a fresh version of that file. If the schema changes, the script will try to rewrite to the schema file and will generate an error that will fail our build.

bundle exec rails db:create db:migrate

This example uses a standard Ruby on Rails app to showcase the technique, but it applies to any language or framework.

Checking for Changes

With this technique, we first generate a fresh version of our file:

yarn install

And then we check for changes and exit with an error code if we find any:

if [ "$(git diff)" ]; then
  echo "Oops, something changed!"
  exit 1
fi

Fancy CLIs

Some tools have this functionality built-in. Bundler has a deployment mode, while Yarn has a frozen mode.

Further Considerations

Benefits

  • Consistency across environments: This means the code running on your machine is more likely to resemble the code running on your colleague’s machine, and the code running in production. That’s nice! Have you ever tried to find a bug that only exists in staging, and no one knows why? That’s not nice at all.
  • More focused code reviews: I believe code reviews are great for sharing knowledge and discussing ideas and concepts. I think it sucks for checking the right bytes were placed in the right place. Get yourself a linter, and generate your autogenerated files, of course!

Drawbacks

Generating all files with each CI build means your builds will take longer to finish. Running rails db:migrate, for example, is much slower than recreating your database from the schema dump.