Human vs Machine: the Bug

https://thoughtbot.com/blog/human-vs-machine-the-bug

I’m an AI skeptic. I’ve resisted using AI in my work and haven’t installed any AI tools before. I have asked ChatGPT questions a few times, but I end up yelling at a robot more often than I end up with the answer I was looking for. I have concerns about the effect on the environment, the ethics of how LLMs are trained, and the way our brains are changing as we hand over more and more thinking tasks to software owned and controlled by billionaires.

I’m also a realist. AI is everywhere and I can’t make informed decisions about it if I refuse to even consider it. Some people think developers who don’t use AI will be left behind. So I’m going to give it a try. Developer productivity is difficult to measure and an N=1 experiment doesn’t prove anything, but I want to compare working with Claude to working without it anyway.

The problem

A bug was found in our internal project tracking tool around subscribing/unsubscribing clients from project feedback emails. I decided this was a great way to compare myself to Claude, so I assigned the bug to myself, took a deep breath, and decided to attempt my first coding with Claude session. But first, I would try to fix the bug myself.

My approach

To start, I needed to reproduce the bug. Email updates can be subscribed/unsubscribed in two places: on the project show page and on the project edit form. On the edit form, everything seems to be working. I can subscribe all or some of the contacts on a project and can also unsubscribe all or some of the contacts. On the show page, I can subscribe any contacts successfully, and can unsubscribe some of the contacts, but if I’m trying to unsubscribe all the currently subscribed contacts, nothing changes. They remain subscribed. I wrote a system spec that fails in that context and verified that the bug was reproducible.

Since this is an edge case that only happens in some situations, I wanted to investigate what is actually different between the way unsubscribe works on the show page and the edit form. First, I checked to see if both forms submit to the same controller and action. They do, and there is no special handling of the updates. It does the standard @project.update(project_params) in both cases, so the difference must be in the parameters that get sent.

Next, I opened the network tab in my browser and looked at the payload for the different contexts. When updating subscriptions in the show page where at least one contact will be subscribed, project_params looks like #<ActionController::Parameters {"users_receiving_weekly_feedback_email_ids" => ["319"]} permitted: true>. When removing all subscriptions, project_params looks like #<ActionController::Parameters {} permitted: true>. The users_receiving_weekly_feedback_email_ids key is omitted completely, so the call to update doesn’t change anything.

When updating subscriptions in the edit form, parameters are different. Here, many project attributes can be changed. When updating subscriptions in the edit form where at least one contact will be subscribed, project_params looks like #<ActionController::Parameters {..., "users_receiving_weekly_feedback_email_ids" => ["", "318"], ...} permitted: true> (other attributes are omitted here). When removing all subscriptions, project_params looks like #<ActionController::Parameters {..., "users_receiving_weekly_feedback_email_ids" => [""]...} permitted: true>.

The difference is clear. When unsubscribing all users from the show page, no parameters are included in the request. When unsubscribing all users from the edit form, the users_receiving_weekly_feedback_email_ids key is included with an array that only contains an empty string, so when the project is updated, all users receiving feedback are removed.

I examined the implementation of each form and found the difference: the edit form uses SimpleForm’s form.association :users_receiving_weekly_feedback_email with the set of available users as the collection to render the users and checkboxes, which includes a hidden empty input option. When all the options are unchecked, including this hidden empty option causes all the associations that are unchecked to be removed when updating the project.

The show page iterates through each available user and creates a check_box_tag. Without the empty hidden option, when all the options are unchecked, the attribute key is omitted and the project is unchanged.

The form.assocation approach looked cleaner, so I copied that implementation from the edit form to the show page (along with some slight tweaks to the label). Now the system spec I wrote passes and the code is a bit tidier.

Claude’s approach

Now it was Claude’s turn. I checked out a fresh branch from main and started a Claude session. I gave it the bug, as reported by the user, and it started working. It correctly identified the cause of the problem and proposed adding a hidden field with an empty array to the form as the solution. The project’s rules for Claude indicate that it must use TDD, so Claude attempts to write a failing test. It writes a request spec that makes a patch request with the parameters project: {users_receiving_weekly_feedback_email_ids: [""]} and expects the project’s users_receiving_weekly_feedback_email to be empty. It runs the tests, notes that it passes, applies the fix, and notes that it passes and declares the bug to be fixed.

A core piece of TDD is writing a test that fails, then writing the code to fix the test. Although Claude’s proposed code change would fix the bug, the test it added didn’t fail without the fix, so it isn’t verifying that the fix works. I pointed this out and Claude agreed that I was right and removed both the invalid test and the fix. It wrote a new system spec, very similar to the one I wrote before, and verified that the test failed. It then re-applied the fix.

This fix works and the test verifies the right thing, but the form part of the show page is less readable than it was before. I ask Claude for a more elegant solution and it proposes using form.collection_check_boxes:

<%= form.collection_check_boxes :users_receiving_weekly_feedback_email_ids,
      project.project_owner.users, :id, :name do |b| %>
  <div class="flex items-center">
    <%= b.check_box %>
    <%= b.label do %>
      <%= b.text %>
      (<%= b.object.feedback_frequency_label %>)
       
    <% end %>
    <span class="text-xs text-gray-600 dark:text-night-white">
      <%= link_to "Edit frequency",
            edit_client_user_path(project.project_owner, b.object) %>
    </span>
  </div>
<% end %>

I notice that the block takes one argument - b. I loathe single letter variable names. It makes it difficult to understand what is happening and it doesn’t save enough time to be worth the obfuscation. I ask Claude what it represents and it explains how a block works, which isn’t really what I was asking. So then I tell it to use a name that actually represents what the argument holds and it renames it to checkbox.

Results

Claude and I both landed on solutions that included code changes to fix the bug and a test to verify the fix. I didn’t track how much time I spent fixing it on my own and how much time it took Claude to fix it, but each took less than 30 minutes.

I prefer my solution to Claude’s. There’s a lot of bias going into that opinion: I am skeptical of generative AI overall and I am pretty confident in my Rails knowledge and debugging skills. But there are a few things that I think are objectively better about my approach:

  • I followed an existing pattern in the codebase, so there’s less variation across the two workflows. It’s possible this could be extracted to a shared partial, but there’s just enough difference in the UI presentation that it feels ok to leave them separate, but similar.
  • Simple Form’s association helper requires a little less code than collection_check_boxes, which I think is more readable.
  • association form helpers are already used widely in the project, but there is no other use of collection_check_boxes.
  • My system test verifies that a box is checked before submitting the form and that it isn’t checked after submitting. Claude’s test only checks after. I feel more confident in a test that verifies something changed rather than verifying the end state. If there were a UI or testing setup error that wasn’t correctly identifying whether a box is checked, Claude’s test might pass without the bug fix.

I also just enjoyed solving this problem without Claude a lot more than with Claude. Working without AI felt like solving a mystery. I got to dig through the code, find the problem, and come up with a fix that aligns with the rest of the project. There were lots of fun “aha!” moments of discovery, which is what I like about doing this job.

Working with AI felt a lot more like parenting a small child. We got to the same result, but most of my contributions were corrections to the work Claude was doing. Instead of making something, I was giving feedback on something made by software. At many points it felt like it would just be faster to do it myself than to keep explaining to Claude what it was doing wrong. I’m happy to teach rather than do when another human is learning from the experience, but it doesn’t feel worth the frustration to keep correcting software that is supposed to be making my life easier.

This isn’t a perfect comparison. It was a relatively small bug and straightforward solution. I had already solved it on my own before asking Claude to solve it, so I had an advantage when giving Claude feedback. I also can’t remove my personal bias from the situation. I expected Claude to do a bad job and I expected that I wouldn’t enjoy the process and as much as I tried to keep an open mind, I am human.

I tried to figure out how much time and how many tokens it took for Claude to do this work, but I wasn’t able to determine that since I’m a Claude novice. It is important to consider the resources Claude used (water, energy, etc) and the resources I used (half a cup of cold coffee and a fig bar) when evaluating Claude. Ultimately, Claude’s output wasn’t better, the experience was less pleasant, and neither was worth the environmental and other costs of using AI.

I plan to do similar comparisons for other development tasks in the future. Will I find a way to love working with Claude? Will Claude find a way to run on cold coffee and fig bars? Only time will tell.

About thoughtbot

We've been helping engineering teams deliver exceptional products for over 20 years. Our designers, developers, and product managers work closely with teams to solve your toughest software challenges through collaborative design and development. Learn more about us.