Watch where you walk by planning experiments

Software delivery plans that consist purely of a list of tasks assume that learning is done before any code is written. This belief is false and leads to software project failure.

Let’s imagine a plan that consists of five milestones. The organisation that wants this work knows far more about the project by the time they reach item three than they did when they wrote the plan. Tackling items three, four, and five exactly as they were specified in the plan is a choice to act based on outdated knowledge.

Experiments: choose your steps with open eyes

There is a better way to plan: we can explicitly acknowledge the vital role of learning by including experiments that we know will influence the plan. Experiments can range from something tiny like swapping between equivalent libraries for pagination, to something really big like trying different distributed architectures.

The benefits of experimenting include:

revealing unanticipated challenges
moving from theoretical discussion to concrete understanding
comparing multiple potential solutions in a short space of time
settling distracting debates between developers
building knowledge for when implementation begins on a larger scale

Example experiments

Below are some examples of real software-delivery experiments performed as small parts of a larger plan, taken from a variety of projects over the last ten years.

Suitability assessment by implementing Elasticsearch for one search feature

When a team found SQL-based search to be unforgiving for end users and very slow across multiple tables, Elasticsearch seemed like a solution worth evaluating.

Implementing it on one page was enough to reveal challenges about the infrastructure needed, client libraries, and the knowledge necessary for useful index and query design.

The experiment quickly demonstrated that the trouble to add new infrastructure and learn the necessary skills was greater than the urgency to solve the search problem.

Quick consensus on the performance impact of a pagination library change

Given company-wide adoption of Kaminari for pagination in Rails apps and a postgresql performance problem related to pagination, it seemed that the pagy gem (which advertises performance benefits over competitors like Kaminari) might help.

Adding it to only the problem page was faster than trying to swap it on every page. This led to the quick discovery that it offered no performance improvements.

The lack of improvement might have been predictable: the problem was related to database query performance, while pagy’s advertised benefits are about the application process and memory usage. However, trying pagy was faster and more convincing than engaging developers in a theoretical conversation.

Implementing small features to assess suitability of a new platform

When one mobile platform became unsuitable due to maintenance concerns, several alternatives were considered. The theoretically most promising option was evaluated by replicating small features in an app for a new platform.

The features were sufficient to prove the benefits the team hoped for (ease of porting existing code) and explore risks like how deployment would change.

Learning happened far faster than if they had attempted to implement every feature in the new platform before thinking about deployment. If the experiment had shown the new platform to be unsuitable, they could have stopped and evaluated other options instead.

Safely building team capacity on a new architecture by starting with a small audience

A Ruby on Rails monolith became overwhelming for a team who needed to act with extreme caution in order to avoid database outages. A distributed architecture with integration via message broker was considered. The team moved clients used by logistics users to the new architecture, but other users such as customers and operations continued to be served by the original architecture.

Lessons from the experiment included how to host the message broker, how to monitor and debug, how it should behave when application processes restarted, and how to integrate with Rails. It was possible to wait until the single audience was well-served by the new API before considering any further adoption of the message broker.

Accelerating lessons about a high risk feature with a “test” button

In a web app a requirement for integration with a bluetooth printer was identified as an early risk. The feature that needed to print was far down the roadmap, but the developer built a “test print” button as one of the first actions in the project. This validated the high risk part of the feature (“print anything via bluetooth”) as early as possible. If the developer had waited to build the entire feature, the lessons from the printing experiment would only have been gained after completion of low risk items like assembling the correct data, UX for initiating printing, or print formatting.

We can’t afford not to experiment

Sometimes teams believe that due to a plan with a tight schedule, they have no time to experiment. This is self-defeating. The reality is that risks will materialise, and with experiments a team can face them quickly and with time to adjust. Without experiments, the team faces the same risks, but only when it is too late to do anything about them.

Skills for successful experiments

This article’s focus is largely on demonstrating the benefits of changing attitudes to planning - plans should be constantly updated based on what we learn during implementation - but there are some skills necessary to make this approach successful.

It’s useful to understand common patterns for experimentation that the tech industry follows, including:

Vertical (or full stack) slices. Planning work as a series of vertical slices is essentially turning the entire product into consecutive experiments, assuming that learning from one slice is applied to the next slice. Possible lessons include saving time by cancelling some slices because they are proven unnecessary by observing the users interacting with previous slices.
Spikes are about investigating feasibility before committing to an approach. They are typically short efforts associated with throwaway code.
Pilot projects imply the testing of solutions on a smaller scale or with a smaller audience (e.g. one team, or one partner organisation) in order to identify lessons earlier than they could if the team waited for a full solution.
Prototypes are a representation of a solution in a form that helps decision-makers to learn about their proposal by observing potential use. They are “more real” than a description of the solution, but take less effort than a full implementation.

It’s also important to be able to evaluate the success of experiments. Some businesses may have business intelligence teams and tools which can be enlisted for this, but should not become a crutch or bottleneck. Developers outside of those teams (or where no such team exists) can also learn how to evaluate the impact of their own work¹.

Conclusion

Experiments help us to evaluate risks fast, choose between competing solutions, build consensus, and spread knowledge. They can be as small as a few hours, or take months to demonstrate how to work effectively with a new message broker. Building learning into a plan via experiments is far better than learning the same lessons at a time when it is too late to make different choices.

the linked article includes practical advice for Ruby on Rails developers, but the mindset it encourages is universal. ↩