Lightweight data for business decisions AKA hacky answers for Rails developers

Fritz Meissner

In a startup, the developer is closest to both code and data. There’s no DBA, no business intelligence team, no data warehouse. There might not be any analytics software. Fortunately, we can do pretty well with developers providing “hacky answers” to improve business decisions.

Here are some business questions a developer could plausibly address:

  • are there dips in sales at certain times of month?
  • how many customers each month make two or more purchases?
  • is there an improvement in sales since we released a new feature?
  • what impact did an outage have on our business?
  • is it worth building a version 2 of a feature, or does version 1 achieve all our goals?
  • does v1 usage actually show that no one wants this feature and we should build something completely different?
  • which feature takes the longest to use and is therefore the best candidate for improved UI or automation?

Read on for some tips on how to write code to gather this kind of data.

Investigative code is different

Code that is “good enough to learn something about customers” is different from other code you might write in that it:

  1. has no need for computational efficiency
  2. can run for minutes or even hours (as long as it doesn’t interfere with running production processes)
  3. is not intended for long-term use
  4. does not need to be used by more than one person
  5. does not need to handle edge cases
  6. should be written quickly and changed as fast as thinking and typing allows

All of this informs our choice of tool. We’re looking for something we’re very familiar with that gives quick feedback for fast iteration. For most Rails developers that’s the Rails console.

Generating data with the Rails console

Let’s look at an example written for the Rails console:

puts (1..24).map { Date.today - _1.months }.map { [_1.beginning_of_month, _1.end_of_month] }.map { |start, fin| [start.strftime("%Y-%m"), Order.where(created_at: start..fin).count].join(",") }

This code is intentionally written in one line. A single line can be re-run in the Rails console by just pressing the up arrow. You can edit it before you hit ENTER to run it again. If it were multiple lines, the up arrow becomes much more complex to use.

Sometimes it’s not easy to write everything in a chain of methods like this. If you need to run multiple lines, you can separate them by ; instead of starting a new line.

Let’s understand the example with the sort of formatting that you would use in your editor instead of the console:

puts (1..24)
  .map { Date.today - _1.months }
  .map { [_1.beginning_of_month, _1.end_of_month] }
  .map { |start, fin|
    [
      start.strftime("%Y-%m"), Order.where(created_at: start..fin).count
    ].join(",")
  }

For each number in 1..24 we construct a date representing that many months before the current date. Then we find the beginning and end of the month. Then we use those to output a YYYYmm representation of the start date, followed by the result of a query between the start and end date (Order.where...count).

The query itself could be any number you want to know. How many stores (Store.count?) opened each month, or number of users added, for example.

Here’s how the output looks:

=>
2024-10,9293
2024-09,9765
...
2022-11,5467

Some variations on this you might want:

  • a different time frame, e.g. every hour over 24 hours, or every day over 30 days
  • multiple variables, by adding more queries to the result array (I suggest keeping the same timeframe over all queries)
  • constrain a query by other values with where clauses, e.g. only orders created at a certain store
  • values for a specific time period only, e.g. if there was a change at 10am yesterday, how did the orders at 11am compare with 10am? If 11am always has higher traffic, that’s not a good comparison. What about 11am on the same day in the previous week?

You’ll get more value here the better you can work with dates, loops, and ActiveRecord queries. It’ll help to know Ruby’s Enumerable methods such as .map, .select, .inject, and so on which can all be chained without hiding meaning. Constructs that map to a single line, for instance the ternary operator or endless method declarations may also be helpful.

You’ll get more value here the better you can work with dates, and ActiveRecord queries. It’ll help to know Ruby’s Enumerable methods such as .map, .select, inject, and so on which can be chained without hiding meaning.

A word on plain SQL

Someone who is good at SQL may prefer to get this data in a single query from a database client instead of generating it with Ruby and ActiveRecord. It will probably run faster, but this is only a good option for people who can think in SQL better than they could if they involved Ruby. Fast iteration to a useful answer is the most important goal.

The performance caveat: be aware of production server impact

If your Rails console is running on the same hardware or cloud instance as your production server, start small and keep an eye on overall system performance. You don’t want the code you run to overwhelm the hardware that is suppose to be available for real customers.

If you do see some impact, there is probably some optimisation you can do to your query. Avoid the temptation to optimise as much as you do for production code.

Visualising the data with spreadsheets

It’s easy to compare two numbers with each other on the Rails console, but what about the case where we have pages of numbers? You can increase your understanding by moving your results into a spreadsheet.

Copy CSV to a spreadsheet

The output from the example above can be copy/pasted into a spreadsheet as Comma Separated values. The UI for pasting CSV will differ between spreadsheets, but all are able to split the text output into separate rows, each containing a cell for the date and a cell for the count.

Note that if you’re exporting data to another process (via a copy/paste or writing a file), for legal and privacy reasons it’s best to keep personal identifiers out of your results.

For legal and privacy reasons it’s best to keep personal identifiers out of your results

Useful spreadsheet features

Once your data is in a spreadsheet, you can use filters to see results matching certain patterns only. For instance to compare the values for November 2023 with previous years, or everything after a certain point in time:

Filtering a header row in a spreadsheet

You can also generate a chart. Here you sacrifice some detail on the actual numbers in order to understand a group of numbers in relation to each other. Now, it’s possible to see sudden spikes up or down, or maybe a sustained period where the values are different, or a repeating pattern.

A chart showing a period of higher spikes in 2023, but no similar spikes in 2024.

In the above chart it’s possible to see higher spikes in 2023 than we saw in 2024. If higher is better, we might want to ask what changed that we no longer have those high points. If higher is worse, this could be evidence that we did something right in 2024.

What you’ll want to do to go further here is to grow your understanding of spreadsheets. Here’s an entertaining talk aimed at developers on how to use Excel which includes the above points in more depth and many others.

Growing companies and long-term use

In the beginning, you’ll use the Rails console to generate data, investigate a problem, take action quickly, and then throw the query away. You were never trying to build something sustainable.

Eventually someone will need that data as a regular part of their job. The rules will change:

  • the expectation is now that there is very little change to the code and format of results
  • many people will need to run it
  • someone else might need to understand and change it
  • the history of the query becomes important

This should feel a lot like production application code, which gives a clue about where it should live: in version control, with your application code.

At first you might execute it as a rake task, maybe wrapped in a cron job to run the same query regularly. If the people who need the data are not technical, you might transition to a screen in the app, a link to download a CSV file, or even data written to and from a cloud spreadsheet API.

Don’t skip to this step too fast, and once you get there don’t apply it to every question! Some questions will be asked over and over (move those into the app), but others will remain in hacky territory (stick with the Rails console). Get the data and see if it is usable for lots of people before you turn it into something more stable.

When the Rails console is no longer acceptable

If your startup is really lucky, you may one day find that you’re in a position where the kind of data under discussion does not fit into a Rails app. Alternatively security experts may tell you that the Rails console is a risk and they’re taking it away.

In the absence of the Rails console (or database access), you’ll be left with software for gathering real-time metrics, or “real” business intelligence software. Hacky answers will be harder to come by, but the trade-off is acceptable to the business. As a developer, you still stand a good chance of learning enough of the new data tools to answer questions quickly. The power of spreadsheets will still be helpful.

Never forget hacky answers

A hacky attitude will always be relevant. Enterprise tools may arrive, but there’ll be parts of the organisation where they aren’t appropriate. Get in there with your rails console or scripts and CSV! Where you have no option but enterprise, there’s usually some way of writing custom queries and scripts. Keep the hacky answer spirit alive: there will always be new questions to explore that need quick iteration.