---
title: 'Handling external API errors: A transactional approach'
teaser: 'Clarifying a few possible ways to implement transactional workflows when
  working with external APIs.

  '
tags: error handling,api,distributed systems,fault tolerance,postgresql
author: Thiago Araújo Silva
published_on: 2024-01-23
---

Error handling and fault tolerance are often neglected aspects of
development. How much does it cost to fix errors due to a poorly
implemented error handling strategy or a complete lack thereof? How
many API integrations are poorly put together, disregarding what
can go wrong? How much data do we have to fix due to catastrophic
events of cascading errors that could have been prevented with
well-thought-out code?

Let's get down to the basics. This post is about building integrations
with any system, third-party or not, over the network. In a previous
post, we discussed the [resumable error handling strategy] and in
[what situations it can be helpful]. Now, let's discuss the
transactional strategy.

## When to choose a transactional strategy?

Let's start with some recommendations to make the distinction between
strategies clear. Choose a transactional error handling strategy when:

- The workflow is composed of steps that need to be committed
  together - it's all or nothing;
- There is tight coupling between the steps;
- You can't bear temporary inconsistency;
- The external service allows undoing or rolling back side effects;
- The workflow has just a few steps and API requests (this is common
  but not a hard requirement).

## The example

In our example, we're logging a list of order line items to the
[Modern Treasury API], where we have a "ledger account" for a "buyer".
Logging a line item creates a ledger transaction object from the
buyer's ledger account to the vendor's ledger account.

<aside class="warn">
  <strong>Important</strong>
  <p>
    "Ledger transaction" is a local object within Modern
    Treasury that represents a transaction between two ledger
    accounts. It has no relation whatsoever with the
    transactional approach described in this article.
  </p>
</aside>

Let's imagine the following code:

```rb
def log_order
  total_amount = order_line_items.sum(&:amount)

  if has_enough_funds?(total_amount) # Issues a synchronous HTTP request
    order_line_items.each do |line_item|
      result = log_line_item(line_item) # Issues a synchronous HTTP request
      line_item.update!( # Local database update
        external_transaction_id: result.transaction_id
      )
    end
 
    :ok
  else
    :not_enough_funds
  end
end
```

This code is a rough draft of what we need to do, with no regard for
an error-handling strategy. It's not unusual to have multiple types of
API requests in a transactional workflow, but for simplicity's sake
we're using a single API request inside a loop.

<aside class="info">
  <strong>Note</strong>
  <p>
    Due to obvious limitations, the code snippets will
    get more and more dense as we apply fixes and add
    new features. We will discuss means of abstraction
    in a future article, so bear with me!
  </p>
</aside>

## Transactional or resumable?

When designing client API code, the first question to ask is "should
it be transactional or resumable"? That can be determined by looking
at the concept being modeled within. Can we _partially_ log an order
with one or more line items? The answer is _no_. We can't afford
temporary inconsistency because the customer shouldn't look at their
order and not see all their line items, which is true at any arbitrary
point in time. It's all or nothing. In resumable workflows, however,
temporary inconsistency is bearable, and eventual consistency is
reached through multiple retries in the worst-case scenario.

## ACID database transaction

For our code to be transactional, we must submit our database commands
to an [ACID transaction], as it sends `UPDATE` statements to the
underlying database connection. We want to guarantee all database
commands are rolled back if something goes wrong.

```rb
def log_order
  # If the Ruby code raises an exception, the database
  # issues a `ROLLBACK` statement
  ApplicationRecord.transaction do
    # Code goes here 
  end
end
```

<aside class="info">
  <p>
    Our example has a single `UPDATE` statement but there could be
    several ones. Even with a single update, a transaction is still
    useful.
  </p>
</aside>

Database transactions are not a solution to all data consistency
problems, but they are the proper solution to our "all or nothing" use
case.

Are we done yet? No! ACID transactions are only concerned about local
database commands. We must still roll back the _external_ state from
HTTP requests.

## External transactions

The [Sagas pattern] instructs on how to roll back external
transactions. Most Google results for "sagas pattern" will mention
event-based microservices communicating through message brokers, which
is not the case here. We're referring to any code that interacts with
external APIs.

The core concept, however, still applies: if the transaction
orchestrator (our code) detects an error condition, compensating HTTP
requests must be emitted to undo the changes made by the preceding
HTTP requests. Let's apply this improvement to our code:

```rb
def log_order
  ApplicationRecord.transaction do
    total_amount = order_line_items.sum(&:amount)

    if has_enough_funds?(total_amount)
      begin
        order_line_items.each do |line_item|
          result = log_line_item(line_item) # Issues a synchronous HTTP request
          line_item.update!(external_transaction_id: result.transaction_id)
        end
      rescue => e
        order_line_items.each do |line_item|
          if line_item.external_transaction_id.present?
            rollback_line_item(line_item.external_transaction_id)
          end
        end

        raise e
      end
  
      :ok
    else
      :not_enough_funds
    end
  end
end
```

We've introduced a method call, `rollback_line_item`, to roll back the
logged line items so far when encountering an error condition. For
simplicity's sake, the loop that logs the line items is assertive, and
there's no specific error condition to check other than rescuing
exceptions. That's a significant first step, but we must be mindful of
API semantics, leading us to our next topic.

### Designing external rollbacks

What should the implementation of `rollback_line_item` look like? That
depends on our API features, which should be carefully assessed. In
Modern Treasury, we can't delete a ledger transaction, but we can
_archive_ it, which seems like an excellent way to revert our
operation and make its implementation more robust.

To roll back a ledger transaction, we must ensure it's created in a
`pending` state because `posted` ones are immutable and can't be
rolled back. Also, we need to add a commit step that will move
`pending` ledger transactions to `posted`. Let's change our
orchestrator code, renaming `log_line_item` to
`log_pending_line_item` and adding a commit step:

```rb
def log_order
  ApplicationRecord.transaction do
    unless all_logged?(order_line_items)
      if has_enough_funds?(total_amount)
        begin
          order_line_items.each do |line_item|
            result = log_pending_line_item(line_item) # Issues a synchronous HTTP request
            line_item.update!(external_transaction_id: result.transaction_id)
          end
        rescue => e
          # ...
        end
      else
        return :not_enough_funds
      end
    end
  end

  # Commit step here
  order_line_items.each do |line_item|
    commit_line_item(line_item) # Issues a synchronous HTTP request
  end
  
  :ok
end
```

The commit step should run after all ledger transactions are logged,
apart from the ACID transaction. We can't commit ledger transactions
as they are logged because earlier ones wouldn't be allowed to roll
back if the current one results in an error.

Also, we added an `unless all_logged?(order_line_items)` check for
idempotency's sake to avoid double logging.

<aside class="info">
  <p>
    If you can replace a bunch of API requests with a single bulk request,
    by all means do it! A single request is generally safer and more atomic
    than multiple requests, and it also simplifies client code. At the time
    of writing, Modern Treasury (our example API) does not have a ledger
    transaction bulk API.
  </p>
</aside>

Requirements will vary from API to API, so the main takeaway is to
look up your API docs and carefully plan your implementation with
error handling and fault tolerance in mind.

## Handling concurrency

There's a critical path in our code subject to race conditions. Note
the following `if` condition:

```rb
  if has_enough_funds?(total_amount) # Issues a synchronous HTTP request
    # ...
  end
```

Let's assume our buyer has a $10 balance. What if a web request that
spends the full $10 is issued twice, and both make it to the `if`
condition simultaneously? Yes, both would resolve to truthy and run
the same code. The user would be spending what they don't have -- $20
instead of $10 -- which would result in a negative balance of -$10.

The first question to ask is "does my API have concurrency handling
features?". In the case of Modern Treasury, the answer is yes. When
logging a ledger transaction, we can submit balance check parameters
to lock on what the current balance should be after the operation.
With that, the API simulates the operation and returns an error code
if the after-balance is different than provided; otherwise, it goes
ahead and performs the operation. We can send the following parameters
along with our JSON payload:

```json
{
  "available_balance": {
    "eq": WHAT_THE_BALANCE_SHOULD_BE_AFTER_THE_OPERATION
  }
}
```

This feature renders our `has_enough_funds?` check useless because now
the balance would be checked implicitly when logging each ledger
transaction. If we raise an exception when the balance check fails,
our code already knows how to roll back. Therefore, our code can be
simplified, and we can also detect the specific exception to return
the appropriate error condition:

```rb
rescue InsufficientFundsError
  return :not_enough_funds
end
```

If the API doesn't provide concurrency features, a possible solution
is to use [row-level locking] to throttle concurrency:

```rb
order_line_items.sort.first.with_lock do
  # All code goes here
end
```

`order_line_items.sort.first.with_lock` would replace
`ApplicationRecord.transaction`, as it has the same functionality but
with [row-level locks] on top.

## Making commit and rollback fault-tolerant

The API portion of our code now has commit and rollback steps, but
they are unreliable. Be mindful that any code can fail, especially
when making network calls. If either commit or rollback fails, our
data would be inconsistent, and rerunning the code wouldn't correct
it. Designing commits and rollbacks as units that can be independently
retried solves our problem, so let's offload both steps to background
jobs.

```rb
def log_order
  failed_ledger_transaction_ids = []
  result = :ok
  
  ApplicationRecord.transaction do
    unless all_logged?(order_line_items)
      # ...
    
      begin
        order_line_items.each do |line_item|
          result = log_pending_line_item(line_item) # Issues a synchronous HTTP request 
          line_item.update!(external_transaction_id: result.transaction_id)
        end
      rescue => e
        order_line_items.each do |line_item|
          if line_item.external_transaction_id.present?
            failed_ledger_transaction_ids << line_item.external_transaction_id
          end
        end

        result = e.is_a?(NotEnoughFundsError) ? :not_enough_funds : :error
      end
    end
  end

  if failed_ledger_transaction_ids.any?
    # Rollback step here
    failed_ledger_transaction_ids.each do |external_ledger_transaction_id|
      # Issues an asynchronous HTTP request
      rollback_line_item_async(external_ledger_transaction_id)
    end
  else 
    # Commit step here
    order_line_items.each do |line_item|
      # Issues an asynchronous HTTP request
      commit_line_item_async(line_item)
    end
  end

  result == :error ? raise(e) : result
end
```

That makes our code more reliable, given that most failures on both
steps would likely be transient errors that would succeed in a few
retries. We're, of course, assuming the background solution to have
retries baked in.

## Takeaways

Working with external APIs takes a lot of work. The more critical our
workflow is, the more important it is to have a solid error
handling/fault tolerance strategy.

- Pay close attention to the concept being modeled to decide whether
  to go with a transactional or [resumable error handling strategy];
- The transactional strategy requires API calls to be synchronous
  because we need to decide whether to commit or rollback everything;
- How we design and implement external rollbacks will always depend on
  particular API features and semantics;
- Always have at least a rollback step for external API interactions.
  The commit step may sometimes be implicit and will depend on API
  semantics;
- External API Commit and rollback steps can generally be offloaded to
  background jobs for increased fault tolerance;
- Idempotency is important for critical workflows;
- Properly handling and limiting concurrency is also important;
- See if your API provides features to handle concurrency; otherwise,
  try local solutions such as [row-level locks] or [advisory locks].

[resumable error handling strategy]: https://thoughtbot.com/blog/handling-errors-when-working-with-external-apis
[what situations it can be helpful]: https://thoughtbot.com/blog/handling-errors-when-working-with-external-apis#when-to-choose-a-resumable-strategy
[Sagas pattern]: https://blog.bernd-ruecker.com/saga-how-to-implement-complex-business-transactions-without-two-phase-commit-e00aa41a1b1b
[Modern Treasury API]: https://docs.moderntreasury.com/platform/reference/ledger-transaction-object
[ACID transaction]: https://en.wikipedia.org/wiki/ACID
[row-level locks]: https://www.postgresql.org/docs/current/explicit-locking.html#LOCKING-ROWS
[row-level locking]: https://www.postgresql.org/docs/current/explicit-locking.html#LOCKING-ROWS
[advisory locks]: https://www.postgresql.org/docs/current/explicit-locking.html#ADVISORY-LOCKS
