---
title: Hoptoad, the Cloud, and the Pond Ahead
teaser: How we're making Hoptoad scale by using MongoDB.
tags: web,airbrake,performance,rails
author: Harold Giménez
published_on: 2011-03-09
---

For the last couple of months, we've seen considerable growth on Hoptoad
accounts and traffic. Thank you all! But this introduced new traffic patterns
and challenges. During this time we've been mostly keeping up with this growth
and making sure we can provide as reliable a service as possible. There have
been some bumps along the way. This is what has happened, what we've done about
it, and what is yet to come.

![Hoptoad](http://images.thoughtbot.com/ui/hoptoad-logo.png)

### The error process queue

For over a year, Hoptoad has stored exception details as a gzipped <abbr
title="Extensible Markup Language">XML</abbr> on Amazon S3. When an error is
POSTed to our <abbr title="Application Programming Interface">API</abbr>
endpoint, we validate it, group it with similar errors, and store it on the app
server's file system. Every five minutes there was a cron job that would upload
all these <abbr title="Extensible Markup Language">XML</abbr> files to S3. These
details were only available for viewing on the UI after they made it to S3. This
is why, more often than we had liked, you would see the dreaded message "Details
for this error are still being processed". This served us well for some time,
but we knew it was time to rethink this architecture.

There were many problems with this approach. The most obvious was that this
"still processing" error was becoming more and more common, and this degraded
the experience of viewing error messages for our users (us included). The first
thing we did to improve that experience was rather simple and did not require
wholesome architectural changes: Instead of trying to display the last notice
that we got for that error group, we showed you the last _processed error_ for
that group. So therefore, instead of seeing the processing message, you would
see actionable data for that exception so that you can get back to work fixing
bugs.

Even though this helped the situation and the number of support requests greatly
decreased, we always knew this was a temporary solution and we could do better.
We needed a way to store error details in the life cycle of the request, in such
a way that it was available immediately afterwards for viewing. Uploading to S3
became too slow for our needs.

Furthermore, this was not the only problem with this architecture. The larger
problem is that because of our high traffic, we started running into all sorts
of issues with either disk space filling up before our workers were able to push
notice details to S3, or even worse, an application instance failing completely
thus losing any unprocessed details. In those rare cases, another application
instance would be automatically provisioned, and the <abbr title="Extensible
Markup Language">XML</abbr> on that filesystem would be lost.

### Enter MongoDB

In order to display exception details quickly, we decided to make use of
MongoDB, removing temporary file system and S3 storage alltogether. When an
exception hits our API, we do the same processing we've always done but store it
in a MongoDB collection instead. The three main advantages to you are:

* _Error details are always available_, immediately after we receive them.
  Therefore you can click on the error <abbr title="Uniform Resource
  Locator">URL</abbr> that you receive on the notification emails and start
  seeing details for the error with no delay.
* A more robust storage approach, where _app instance failures will never cause
  details to be completely lost_. With careful planning, disk space is not an
  issue either.
* _Better response times_: A nice by-product of this change has been that both
  storing and reading the data has improved the response time of the application
  by roughly 30%.

### A hybrid future

We can't stop here. We have encountered numerous problems with our current
environment, and we are working to improve our infrastructure. This has been our
primary focus for the last couple of months.

We plan on migrating our application to a more traditional hosting environment.
While we will continue to use virtualization for application servers and other
utilities, our databases will now run on bare metal. We are confident that this
will increase our overall performance even more, and provide a predictable path
for growth. Among other things, this solves:

* The bad neighbor problem, where other instances in the cloud steal precious
  CPU cycles. For high traffic applications like Hoptoad the cost of this
  problem is very real. On our planned setup, our app servers will run under our
  own hypervisor, so it is impossible for other applications to steal our CPU.
* I/O contingency - while most apps can run just fine on the cloud, it is
  underprovisioned for an application like Hoptoad. We will gain superior I/O
  bandwidth by designing an infrastructure with faster disks that can support
  our needs.

### Looking forward to a brighter pond

We have been forced to focus our efforts on performance improvements and
architectural changes that can support the growth we've seen. We are very sorry
for the bumps on the road along the way. We are also tired of feeling
apologetic. Enough is enough. We have made changes to improve your experience as
a customer, and we will continue to do so. Please bear with us until we've
migrated our infrastructure. We'll keep you updated as to the timeline for the
hosting move. We look forward to being able to stop worrying about performance,
and start worrying about how to improve the service by providing better features
that make more use of the data, and help you handle your app's bugs efficiently.

FYI: _Hoptoad/Airbrake was sold to RackSpace and is now called [Airbrake Bug Tracker](http://airbrake.io)_.
