---
title: Splitting an open source project in two with Git submodules
teaser: How and why we split the doorkeeper project into different rubygems.
tags: open source,good code
author: Tute Costa
published_on: 2015-06-02
---

[doorkeeper] is an OAuth provider rubygem for Ruby applications. It historically
solved at least two problems:

1. Handling the data and logic of an OmniAuth server (set up the resource and
   authorization server as defined in [the spec]).
2. Knowing how to persist its data in different ORMs and databases: SQL-like
   through ActiveRecord, and MongoDB through MongoMapper and Mongoid.

[doorkeeper]: https://github.com/doorkeeper-gem/doorkeeper
[the spec]: https://tools.ietf.org/html/rfc6749#section-1.1

There were a few issues with having both responsibilities in the same codebase:

* For the past two years, I've been the only consistent maintainer, and I've
  been using only ActiveRecord. I can't guarantee the features I add or bugs I
  fix work well with MongoDB.
* Configuring the test matrix was time-consuming. Not all versions of Mongoid
  run well with all versions of Ruby and Rails.
* Setting up dependencies and running the test suite locally was complex.
* At its peak, our test suite took 40 minutes to run in Travis CI. Feedback loop
  felt too slow for us.
* Adding features that required model changes was harder than needed: we needed
  to make sure the changes to the gem would work in every single ORM version
  (and across Ruby and Rails versions).
* Users of other ORMs would try to extend doorkeeper with their own, following
  current architecture: adding yet another ORM into the repository.

It has been in our roadmap to extract ORM specifics into their repositories for
a long time. But we couldn't find a way to test both projects guaranteeing they
would always integrate with each other well, and they would keep at least as
healthy of test coverage and reliability as it already had.

## Sweeping cruft under the rug doesn't solve all issues

Splitting the core doorkeeper functionality and its ORM adapters might solve
most of the previous caveats, but it's not free. A set of libraries is harder to
work on, run integration tests on, and to release than a single one.

Our primary issue was testing:

* Relational databases differ from each other, and any relational database works
  very differently from NoSQL databases. Unit tests that spec out the interface
  between doorkeeper and data stores are not reliable for us.
* Test coverage and integration tests are already good in the original test
  suite, and we don’t want to lose that.
* Copying specs from the main project into the ORM repository would result in
  verbatim duplicates that get out of sync as soon as there's a commit changing
  any project's specs, effectively forking doorkeeper's test suite.
* Including doorkeeper as a gem dependency didn't work because it doesn't allow
  us to run its tests as part of the extension's suite.

The best we could come up with during these discussions was to organize ORMs in
subdirectories in doorkeeper's repository. It resulted in an acceptable
compromise: we wouldn't split doorkeeper, but boundaries between shared models
code and ORM specifics were explicit, and doorkeeper was reasonably decoupled
from the ORM of choice. The project was open for extension, with the ability to
accept new ORMs without needing to change existing files. I didn't take
advantage of this fact though and rejected new ORMs, due to the reasons detailed
above.

We still needed to to give developers a way to extend doorkeeper with the
ORM they want.

## The best of both worlds: `git submodule`

We knew we wanted a `doorkeeper-mongodb` project, but we didn't know how to test
it. [`git submodule`] was the tool we needed.

[`git submodule`]: http://www.git-scm.com/book/en/v2/Git-Tools-Submodules

As described in the [git-submodule man page], submodules allow other
repositories to be embedded within a subdirectory of the current repository,
always pointed at a particular commit. Submodules are meant for different
projects you would like to make part of your source tree while the history of
the two projects stay independent.

[git-submodule man page]: https://www.kernel.org/pub/software/scm/git/docs/git-submodule.html

Submodules are composed of a file in the root of the main repository that refers
to a particular SHA within the inner repository. A record in the `.gitmodules`
file at the root of the source tree assigns a logical name to the submodule and
describes the default URL the submodule shall be cloned from.
`doorkeeper-mongodb`s contents are:

```gitmodules
[submodule "doorkeeper"]
    path = doorkeeper
    url = https://github.com/doorkeeper-gem/doorkeeper.git
```

We can initialize and update submodules with the `git submodule init` and `git
submodule update` commands:

    doorkeeper-mongodb master % git submodule init && git submodule update
    Submodule path 'doorkeeper': checked out
    'b62dcad046564a0e535e6ac17226fc33778a2cde'

It checks out the reference the submodule was committed with, in that case,
the latest commit to doorkeeper’s master branch. We can checkout another
reference. Step by step details follow:

Go into the submodule's directory:

    doorkeeper-mongodb master % cd doorkeeper

We are in the `doorkeeper` repository; we can checkout another reference in that
project:

    doorkeeper HEAD % git checkout 2.2-stable
    Previous HEAD position was b62dcad... Release version 3.0.0.rc1
    Switched to branch '2.2-stable'
    Your branch is up-to-date with 'origin/2.2-stable'.
    doorkeeper 2.2-stable %

We come back to `doorkeeper-mongodb`, and check the difference with latest
commit:

    doorkeeper 2.2-stable % cd ..
    doorkeeper-mongodb master % git diff
    diff --git a/doorkeeper b/doorkeeper
    index b62dcad..9c8ba77 160000
    --- a/doorkeeper
    +++ b/doorkeeper
    @@ -1 +1 @@
    -Subproject commit b62dcad046564a0e535e6ac17226fc33778a2cde
    +Subproject commit 9c8ba7705a0af17b76990f4fbd83f5fbe5c3f9bf

If we were to commit in `doorkeeper-mongodb`, the only change we commit
is that SHA reference difference and not all the changes that happened between
`master` and `2.2-stable`. The next time we update the submodule it will be at
that revision.

To run the specs as part of the extension's suite, before the `spec` task a new
`load_doorkeeper` task is run. We make that happen with these additions to the
`Rakefile`:

```ruby
task :load_doorkeeper do
  `git submodule init`
  `git submodule update`
  `cp -r -n doorkeeper/spec .`
  `bundle exec rspec`
end

task spec: :load_doorkeeper
```

After the submodule initialization, it copies doorkeeper's specs into the
extension's root path. The copy happens with the `-n` flag, which prevents `cp`
from overwriting files that already exist, allowing overrides. The User model
from the dummy test app, for example, needs to stay configured with MongoDB
rather than upstream's ActiveRecord.

## See the code

The two Pull Requests for this project split are:

* [doorkeeper-gem/doorkeeper#648](https://github.com/doorkeeper-gem/doorkeeper/pull/648)
* [doorkeeper-gem/doorkeeper-mongodb#2](https://github.com/doorkeeper-gem/doorkeeper-mongodb/pull/2)

Both have several hundred lines of deletions: ORM specifics from the former, and
the preexisting `spec/` from the latter.

## What's next

New doorkeeper (version `3.0.0.rc1` as of today) works in the same way for
ActiveRecord and MongoDB projects, with a slightly different code loading
behavior for MongoDB users. If you would like to upgrade and use ActiveRecord,
just bump the major version! If you are a MongoDB user, append `-mongodb` to the
`doorkeeper` gem in your Gemfile, like:

```diff
diff --git a/Gemfile b/Gemfile
index b23e48a..84a4dac 100644
--- a/Gemfile
+++ b/Gemfile
@@ -12,7 +12,7 @@ gem "bourbon", "~> 3.2.1"
 gem "clearance", "~> 1.8.0"
 gem "coffee-rails"
 gem "paperclip", "~> 4.2.1"
-gem "doorkeeper", "2.0.0"
+gem "doorkeeper-mongodb", "~> 3.0.0.rc1"
 gem "dynamic_form", "~> 1.1.4"
 gem "flutie"
 gem "font-awesome-rails"
```

Please let us know if you run into any issues, so we can release a stable `3.0.0`
version. You can check the [NEWS] file to check other changes you might need to
make to run on the latest version. It should be a seamless upgrade for most
users.

[NEWS]: https://github.com/doorkeeper-gem/doorkeeper/blob/master/NEWS.md

doorkeeper is now (really) open to extension: to the default ActiveRecord
choice, we add the preexisting MongoDB ORM code as a plugin, which in turn sets
an example for how to add new non-Omniauth features to doorkeeper. Looking
forward to seeing and helping with new doorkeeper extensions!
